Distance Metrics

In Milvus, distance metrics are used to measure similarities among vectors. Choosing a good distance metric helps improve the classification and clustering performance significantly.

The following table shows how these widely used distance metrics fit with various input data forms and Milvus indexes.

Distance Metrics Index Types
Euclidean distance (L2)
  • FLAT
  • IVF_FLAT
  • IVF_SQ8
  • IVF_SQ8H
  • IVF_PQ
  • RNSG
  • HNSW
Inner product (IP)

Euclidean distance (L2)

Essentially, Euclidean distance measures the length of a segment that connects 2 points.

The formula for Euclidean distance is as follows:

euclidean

where a = (a1, a2,..., an) and b = (b1, b2,..., bn) are two points in n-dimensional Euclidean space

It's the most commonly used distance metric, and is very useful when the data is continuous.

Inner product (IP)

The IP distance between two embeddings are defined as follows:

ip

where A and B are embeddings, ||A|| and ||B|| are the norms of A and B.

IP is more useful if you are more interested in measuring the orientation but not the magnitude of the vectors.

If you use IP to calculate embeddings similarities, you must normalize your embeddings. After normalization, inner product equals cosine similarity.

Suppose X' is normalized from embedding X:

normalize

The correlation between the two embeddings is as follows:

normalization

FAQ

Why is the top1 result of a vector search not the search vector itself, if the metric type is inner product? This occurs if you have not normalized the vectors when using inner product as the distance metric.
What is normalization? Why is normalization needed?

Normalization refers to the process of converting an embedding (vector) so that its norm equals 1. If you use Inner Product to calculate embeddings similarities, you must normalize your embeddings. After normalization, inner product equals cosine similarity.

See Wikipedia for more information.

Why do I get different results using Euclidean distance (L2) and inner product (IP) as the distance metric? Check if the vectors are normalized. If not, you need to normalize the vectors first. Theoretically speaking, similarities worked out by L2 differ from similarities worked out by IP, if the vectors are not normalized.
Edit
© 2019 - 2020 Milvus. All rights reserved.