This is because, after restarting, Milvus needs to load data from the disk to the memory for the first vector search. You can set
preload_collection in milvus.yaml and load as many collections as the memory permits. Milvus loads collections to the memory each time it restarts.
Otherwise, you can call
load_collection() to load collections to the memory.
Check if the value of
cache.cache_size in milvus.yaml is greater than the size of the collection.
- Ensure that the value of
cache.cache_sizein milvus.yaml is greater than the size of the collection.
- Ensure that all segments are indexed.
- Check if there are other processes on the server consuming CPU resources.
- Adjust the values of
- If the search performance is unstable, you can add
-e OMP_NUM_THREADS=NUMwhen starting up Milvus, where
NUMis 2/3 of the number of CPU cores.
See Performance tuning for more information.
It depends on your scenario. See Performance tuning > Index.
If the size of the dataset is smaller than the value of
segment_row_limit that you set when creating a collection, Milvus does not create an index for this dataset. Therefore, the time to query in a small dataset may be longer. You may as well call
create_index to build the index.
It is very likely that Milvus is using CPU for query. If you want to use GPU for query, you need to set the value of
gpu_search_threshold in milvus.yaml to be less than
nq (number of vectors per query).
You can use
gpu_search_threshold to set the threshold: when
nq is less than this value, Milvus uses CPU for queries; otherwise, Milvus uses GPU instead.
We do not recommend enabling GPU when the query number is small.
This is because the data has not been flushed from memory to disk. To ensure that data can be searched immediately after insertion, you can call
flush. However, calling this method too often creates too many small files and affects search speed.
Milvus processes queries in parallel. An
nq less than 100 and data on a smaller scale do not require high level of parallelism, hence the CPU usage stays low.
See Performance Tuning > Index for more information.
When the client and the server are running on the same physical machine, it takes about 0.8 second to import 100,000 128-dimensional vectors (to an SSD disk). More specifically, the performance depends on the I/O speed of your disk.
- If the newly inserted vectors have not grown to the specified volume to trigger index creation, Milvus needs to load these data directly from disk to memory for a vector search.
- As of v0.9.0, if Milvus has started creating indexes for the newly inserted vectors, an incoming vector search interrupts the index creation process, causing a delay of about one second.
Generally speaking, CPU-only query works for situations where
nq (number of vectors per query) is small, whilst GPU-enabled query works best with a large
nq, say 500.
Milvus needs to load data from the memory to the graphics memory for a GPU-enabled query. Only when the load time is negligible compared to the time to query, is GPU-enabled query faster.