插入、删除向量

你可以在集合或集合的分区中进行向量操作,本页提供以下内容:

在集合中插入向量

  1. 随机生成 10000 个 Entity:
>>> import random
# Generate 10000 entities.
>>> list_of_int = [random.randint(0, 255) for _ in range(10000)]
>>> vectors = [[random.random() for _ in range(128)] for _ in range(10000)]
  private static List<List<Float>> randomFloatVectors() {
    SplittableRandom splitCollectionRandom = new SplittableRandom();
    List<List<Float>> vectors = new ArrayList<>(10000);
    for (int i = 0; i < 10000; ++i) {
      splitCollectionRandom = splitCollectionRandom.split();
      DoubleStream doubleStream = splitCollectionRandom.doubles(128);
      List<Float> vector =
          doubleStream.boxed().map(Double::floatValue).collect(Collectors.toList());
      vectors.add(vector);
    }
    return vectors;
  }
  1. 插入向量列表。
# Insert embeddings.
>>> hybrid_entities = [
        {"name": "duration", "values": list_of_int, "type": DataType.INT32},
        {"name": "release_year", "values": list_of_int, "type": DataType.INT64},
        {"name": "embedding", "values": vectors, "type":DataType.FLOAT_VECTOR}
    ]
>>> client.insert('demo_films', hybrid_entities)
    // Insert three films with their IDs, duration, release year, and fake embeddings into the collection "demo_films".
    List<Long> ids = LongStream.range(0, 10000).boxed().collect(Collectors.toList());
    List<Integer> durations =  /* A list of 1,000 Integers. */
    List<Long> releaseYears =  LongStream.range(0, 10000).boxed().collect(Collectors.toList());
    List<List<Float>> embeddings = randomFloatVectors();

    InsertParam insertParam = InsertParam
        .create(collectionName)
        .addField("duration", DataType.INT32, durations)
        .addField("release_year", DataType.INT64, releaseYears)
        .addVectorField("embedding", DataType.VECTOR_FLOAT, embeddings)

如果在创建集合时指定auto_idFalse, 你也可以自己定义 Entity ID:

>>> entity_ids = [id for id in range(10000)]
>>> client.insert('demo_films', hybrid_entities, ids=entity_ids)
    //Insert three films with their IDs, duration, release year, and fake embeddings into the collection "demo_films".
    List<Long> ids = LongStream.range(0, 10000).boxed().collect(Collectors.toList());
    List<Integer> durations =  /* A list of 1,000 Integers. */
    List<Long> releaseYears =  LongStream.range(0, 10000).boxed().collect(Collectors.toList());
    List<List<Float>> embeddings = randomFloatVectors();

    InsertParam insertParam = InsertParam
        .create(collectionName)
        .addField("duration", DataType.INT32, durations)
        .addField("release_year", DataType.INT64, releaseYears)
        .addVectorField("embedding", DataType.VECTOR_FLOAT, embeddings)
        .setEntityIds(ids)

在分区中插入向量

>>> client.insert('demo_films', hybrid_entities, partition_tag="American")
  //Insert three films with their IDs, duration, release year, and fake embeddings into the partition "American".
  List<Long> ids = LongStream.range(0, 10000).boxed().collect(Collectors.toList());
  List<Integer> durations =  /* A list of 1,000 Integers. */
  List<Long> releaseYears =  LongStream.range(0, 10000).boxed().collect(Collectors.toList());
  List<List<Float>> embeddings = randomFloatVectors();

    InsertParam insertParam = InsertParam
        .create(collectionName)
        .addField("duration", DataType.INT32, durations)
        .addField("release_year", DataType.INT64, releaseYears)
        .addVectorField("embedding", DataType.VECTOR_FLOAT, embeddings)
        .setEntityIds(ids)
        .setPartitionTag(partitionTag);

通过 ID 删除向量

假设你的集合中存在以下向量 ID:

>>> ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

你可以通过以下命令删除向量:

>>> client.delete_entity_by_id('demo_films', ids)
client.deleteEntityByID(collectionName, ids.subList(0, 10));
在调用 delete 接口后,用户可以选择再调用 flush,保证新增的数据可见,被删除的数据不会再被搜到。

常见问题

Milvus 中自定义 ID 有没有长度限制? ID 类型是非负的 64 位整型。
Milvus 可以插入重复 ID 的向量吗? 可以,这样在 Milvus 中会存在相同 ID 的多条向量。
Milvus 是否支持 “边插入边查询” ? 支持。
Milvus 中单次插入数据有上限吗? 单次插入数据不能超过 256 MB。
编辑
© 2019 - 2020 Milvus. All rights reserved.