Register now open for the virtual Milvus Community Conf2020!Join us on Oct.17th, 2020.

Create and Drop an Index

This article provides Python sample codes for creating or droping indexes.

Create an index

Currently, a collection only supports one index type. When you change the index type of a collection, Milvus automatically deletes the old index file. Before creating other indexes, a collection uses FLAT as the default index type.

create_index() specifies the index type of a collection and synchronously creates indexes for the previously inserted data. When the size of the subsequently inserted data reaches the index_file_size, Milvus automatically creates indexes in the background. For streaming data, it is recommended to create indexes before inserting the vector so that the system can automatically build indexes for the next data. For static data, it is recommended to import all the data at first and then create indexes. See the index sample program for details about using index.
  1. Prepare the parameters needed to create indexes (take IVF_FLAT as an example). The index parameters are stored in a JSON string, which is represented by a dictionary in the Python SDK.

    # Prepare index param
    >>> ivf_param = {'nlist': 16384}
    
    Different index types requires different parameters to create indexes. You must assign values to all index parameters.
    Index Type Index Parameter Exmaple Parameter Range
    IVFLAT / SQ8/ SQ8H nlist: The number of clusters to perform clustering operations on vector data files during index building. To facilitate later search, the index file records the results of the clustering operation, including the type of index, the center vector of each cluster, and the vectors in cluster. {nlist: 16384} nlist: [1, 999999]
    IVFPQ nlist: The number of clusters to perform clustering operations on vector data files during index building. To facilitate later search, the index file records the results of the clustering operation, including the type of index, the center vector of each cluster, and the vectors in cluster.

    m: The compression rate during index building. The smaller the m, the higher the compression rate.
    {nlist: 16384, m: 12} nlist: [1, 999999]

    m: a value in {96, 64, 56, 48, 40, 32, 28, 24, 20, 16, 12, 8, 4, 3, 2, 1}
    NSG search_length: The larger the value, the more nodes searched in the graph, the higher the recall rate, but the slower the speed. search_length should be less than candidate_pool and within [40, 80].

    out_degree: The larger the value, the greater the memory usage and the better the search performance.

    candidate_pool: The value affects the index quality and should be within [200,500].

    knng: The value affects the index quality and should equal to out_degree + 20.
    {search_length: 45, out_degree:50, candidate_pool_size:300, knng:100} search_length range: [10, 300]

    out_degree: [5, 300]

    candidate_pool_size: [50, 1000]

    knng: [5, 300]
    HNSW M: The value affects the build time and index quality. The larger the M, the longer it takes to build indexes, the higher the index quality, and the greater the memory footprint.

    efConstruction: The value affects the build time and index quality. The larger the efConstruction, the longer it takes to build indexes, the higher the index quality, and the larger the memory footprint.
    {M: 16, efConstruction:500} M: [5, 48]
    efConstruction: [100, 500]
    ANNOY n_trees: The value affects the index building time and index size. The larger the value, the more accurate the search results, but the larger the index file. {"n_trees": 8} [1, 1024]

    See Milvus Index Type for details.

  2. Create index for the collection:

    # Create index
    >>> milvus.create_index('test01', IndexType.IVF_FLAT, ivf_param)
    

Drop an index

After deleting the index, the collection uses the default index type FLAT again.

>>> milvus.drop_index('test01')
Edit
© 2019 - 2020 Milvus. All rights reserved.