If the amount of data is less than the upper limit of a single insertion (256 MB), batch insertion is much more efficient than a single insertion.
The following parameters in the system configuration file have an impact on the insertion performance:
This parameter is used to enable or disable the Write Ahead Log (WAL) function (enabled by default). The processes of inserting data when write ahead log is enabled or disabled are as follows:
- When write ahead log is enabled, the write ahead log module writes data to the disk, and then turns to the insert operation.
- When write ahead log is disabled, the data insertion speed is faster. The system directly writes the data to the mutable buffer in the memory and immediately turns to the insert operation.
delete operations are faster when write ahead log is enabled. We recommend that you enable write ahead log to ensure reliability of your data.
This parameter (1 second by default) refers to the interval time of the data flushing task in the background. Increasing this value can reduce the number of segment merges, reduce disk I/O, and increase the throughput rate of insert operations.
The value of
segment_row_limit determines the maximum number of entities a segment can hold. You can set
segment_row_limit when creating a collection, its default value being 512 × 1024 rows and maximum value being 4 × 1024 × 1024 rows. The value of
segment_row_limit has an impact on the insertion performance: The greater the value of
segment_row_limit, the more time it takes to merge data to the size set by this parameter, which affects the throughput rate of the insert operation. The smaller the parameter, the more data segments are generated. This may worsen query performance.
Besides software-level elements, network bandwidth and storage media also play a role in the insertion performance.
Factors that affect query performance include hardware environment, system parameters, indexes, and query scale.
- When CPU is used for calculations, query performance depends on the CPU's frequency, number of cores, and supported instruction set.
- When GPU is used for calculations, query performance depends on the GPU's parallel computing capabilities and transmission bandwidth.
This parameter (4 GB by default) refers to the size of the cache space used for resident query data. If the cache space is insufficient to hold the required data, the data will be temporarily loaded from the disk during the query, which seriously affects query performance. Therefore,
cache_size should be greater than the amount of data required by the query.
- The data size of the floating-point original vector can be estimated by "total number of vectors × dimension × 4".
- The data size of the binary type original vector can be estimated by "total number of vectors × dimension ÷ 8".
After the indexes are created (FLAT is not included), the index files require additional disk space and the query only needs to load the index files.
- The data volume of the IVF_FLAT index is basically equal to the total data volume of its original vectors.
- The data volume of the IVF_SQ8 / IVF_SQ8H index is equivalent to 25% to 30% of the total data volume of the original vectors.
- The data volume of the IVF_PQ index changes according to its parameters, which is generally lower than 10% of the total data volume of the original vectors.
- The data volume of HNSW/RNSG/ANNOY index is greater than the total data volume of the original vectors.
get_collection_stats, you can get the total amount of data required to query a collection.
In the GPU version, GPU is enabled for query when the number of target vectors is greater than or equals to the
gpu_search_threshold (1000 by default).
The performance of GPU queries depends on GPU and the speed at which the CPU loads data to the graphic memory. The advantages of parallel computing with GPUs cannot be fully utilized when processing a small number of target vectors. Only when the number of target vectors reaches a certain threshold, the query performance on GPUs will be better than on CPUs. In practice, the ideal value of this parameter can be obtained based on experimental comparison.
Specifies the GPU devices used for querying. For scenarios with a large number of query target vectors, using multiple GPUs can significantly improve query efficiency.
Specifies the GPU devices used for indexing. For scenarios where data insertion and querying are concurrent, you can use GPUs to build indexes to avoid the index building task competing for CPU resources with the query task.
To choose the right index, you need to trade off between multiple indicators such as storage space, query performance, and query recall rate.
- FLAT index
FLAT is a brute-force search for vectors. It has the slowest search speed, but has the highest recall rate (100%) and takes up the smallest amount of disk space.
As the number of target vectors increases, the time spent on using CPUs to perform FLAT queries increases linearly. On the other hand, using GPU to perform FLAT queries guarantees the high efficiency of batch queries and little effect on the query time from the increasing number of target vectors.
- IVF Indexes
IVF indexes include IVF_FLAT, IVF_SQ8 / IVF_SQ8H, and IVF_PQ. The IVF_SQ8 / IVF_SQ8H and IVF_PQ indexes perform lossy compression on vector data to reduce the disk space occupied by index files.
All types of IVF index have two parameters:
nprobe. See Milvus Indexes for more information about these parameters.
You can use the following methods to estimate the amount of calculation when using IVF indexes for queries.
- The amount of calculation of a single segment = the number of target vectors × (
nlist+ (the number of vectors in a segment ÷
- The number of segments = the total amount of aggregate data ÷
- The total amount of calculation of a collection = the amount of calculation of a single segment × the segment number
The larger the estimated total amount of calculation, the longer the query takes. In practice, you can get reasonable parameters through the above formulas, which provides high query performance under the premise of an acceptable recall rate.
segment_row_limit, it uses brute-force search as the query method. The amount of calculation can be estimated by multiplying the number of target vectors by the total number of segment vectors.
- HNSW / RNSG / ANNOY index
The index parameters of HNSW, RNSG, and ANNOY have a more complex impact on query performance. For more information, see Index Introduction.
- Result set
The size of the result set depends on the number of target vectors and
topk. The size of
topk has little effect on the calculation. However, when the number of target vectors and
topk are large, the time spent on serializing the result set and network transmission will increase accordingly.
Milvus uses MySQL as a Metadata backend service. When querying data, Milvus accesses MySQL multiple times to obtain Metadata information. Therefore, the response speed of the MySQL service greatly influences the query performance of Milvus.
When querying data for the first time, the system needs to read the data from the disk and write the data to the cache. This is time-consuming. To avoid loading data during the first query, you can call the
load_collection API in advance, or use the system parameter
preload_collection to specify the segment to preload when starting Milvus.
- Compact segments
To filter deleted entities, Milvus reads delete_docs into memory when querying data. You call
compact to clean up deleted entities and reduce filtering operations, thereby improving query performance.
- Compact segments
Deleted entities do not participate in the calculation and takes up disk space. If a large number of entities have been deleted, you can call
compact to free up disk space.
Why is my GPU always idle?
It is very likely that Milvus is using CPU for query. If you want to use GPU for query, you need to set the value of
gpu_search_threshold in milvus.yaml to be less than
nq (number of vectors per query).
You can use
gpu_search_threshold to set the threshold: when
nq is less than this value, Milvus uses CPU for queries; otherwise, Milvus uses GPU instead.
We do not recommend enabling GPU when the query number is small.
Why the search is very slow?Check if the value of
cache.cache_sizein milvus.yaml is greater than the size of the collection.
Why GPU-enabled query is sometimes slower than CPU-only query?
Generally speaking, CPU-only query works for situations where
nq (number of vectors per query) is small, whilst GPU-enabled query works best with a large
nq, say 500.
Milvus needs to load data from the memory to the graphics memory for a GPU-enabled query. Only when the load time is negligible compared to the time to query, is GPU-enabled query faster.
Why sometimes the query time for a small dataset is longer?If the size of the dataset is smaller than the value of
segment_row_limitthat you set when creating a collection, Milvus does not create an index for this dataset. Therefore, the time to query in a small dataset may be longer. You may as well call
create_indexto build the index.