FAISS: A quick tutorial to efficient similarity search

Efficient Similarity Search for Large-scale Set of Vectors

7 min readMar 9, 2023

Introduction

If you are a machine learning researcher or a data scientist, more often than not, you have encountered a situation in which clustering and/or searching through a large corpus of high-dimensional numeric vectors is needed. For example, an element from such a set of vectors could be the numeric representation of the image, the latent representation of it generated by a neural network, or any other embedding. FAISS is a great open-source library perfectly suited for this.

(Depending on your need, other options, such as sci-kit learn, or packages such as pytorch metric learning, could be other viable alternatives).

Without further due, let’s start learning FAISS.

FAISS: Facebook AI Similarity Search

What is it?

In short, FAISS is a software library produced by Facebook AI to perform high-performance similarity search and clustering. Here are some important points about it:

It has a nice Python interface, but it is high-speed regardless, given that it's written in C++.
The lowest L2, highest dot product, cosine similarity, and other types of measures for similarity can be leveraged in FAISS.
It supports GPU as well (faiss-gpu and faiss-cpu).
Here is its research paper: Link
It is MIT-Licensed.

Please see the Readme.md from their original repo as it is very informative. Also, more detailed documentation of it, including C++ API, is available in this link.

How does it work?

FAISS is a similarity search tool. Thus, it works on a set of vectors. For example, you could have the set of latent representations of images built by neural networks, token representations in NLP, observed data itself, or any other numerical encoding that your project needs, and use FAISS on them.

FAISS takes these and indexes them, allowing you to do the search you need (for example, finding the closest points). There are different indexing options, each with its own trade-offs in terms of accuracy vs speed.

How do I install it?

If you are using conda , it should be pretty straightforward:

conda install -c pytorch faiss-gpu

Replace faiss-gpu with faiss-cpu if the CPU version is what you desire.

How do I use it?

Well, the first step is to import it:

import faiss

Let’s say we want to work with BERT-small representations as a simple example. I am not going to discuss how to generate such representations as it is not a point of focus for this article; therefore, let’s assume we have already generated such 768-dimensional representations. To this end, let’s generate some mock data using numpy, while adhering to the known dimensions for such embeddings:

import numpy as np
data = np.random.rand(100000, 768)

We are working with 768 dimensional vectors, and let’s say we are interested in the euclidean distance (L2 norm between vectors). Therefore, using FAISS’ indexing schemes, we will generate an L2 index:

INDEX: IndexFlatL2

index = faiss.IndexFlatL2(768)
# CPU times: user 53 µs, sys: 119 µs, total: 172 µs
# Wall time: 90.6 µsv

INDEX: IndexFlatIP

If the inner product is desired, faiss.IndexFlatIP can be used instead.

Before we go on, note that these index objects have a property in FAISS library called is_trained , and this flag is provided, given that some of these index objects require training. For L2, this is not required. Thus, you can check it:

assert index.is_trained. # it is true.

Now we index our own data (the variable data ), which is done as follows:

index.add(data)
# CPU times: user 68.2 ms, sys: 52.6 ms, total: 121 ms
# Wall time: 119 ms

The total number of index vectors is now the same as your input. Namely:

index.ntotal
# prints 100000, as it is = data.shape[0]

Given a set of vectors as a query, let's say you want to find the 10 closest points to each in the dataset.

First, let’s generate some mock data for the query:

query = np.random.rand(128, 768)

Remark: There are many cases in which you could have these types of queries. For instance, let’s say you have projected your observations (e.g., images) to a semantic space (e.g., the latent representations generated by ResNet50), and you want to find examples that are semantically close to your query images. Therefore, you intend to compare them in terms of L2 distance with the rest of the image embeddings.

D, I = index.search(query, 10)
# CPU times: user 15.5 s, sys: 0 ns, total: 15.5 s
# Wall time: 246 ms

Let’s see what the output is:

I is a 128x10 dimensional vector of type int64 , containing the index of those closest points to the query vector. In other words, D[i,j] contains the index to the jth closest point to the ith query vector.
D contains the values of those distances and is of float32 type.

Note that this is an exhaustive search. Therefore, it is computationally expensive (low-speed and not well for scaling) but accurate.

INDEX: IndexIVFFlat

For larger-scale sets of vectors, FAISS provides options such as hierarchical searching (versus the flat methods, an example of which was shown before). An example is breaking down the space into Voronoi cells (an example is shown in Figure 1).

Fig 1. Voronoi cells: The black dots are seeds and the area around each seed includes points that are closer to it than to any other seed. (Picture: courtesy of Wikipedia)

n_seeds = 50
quantizer = faiss.IndexFlatL2(768)
index = faiss.IndexIVFFlat(quantizer, 768, n_seeds)
# CPU times: user 52 µs, sys: 12.1 ms, total: 12.1 ms
# Wall time: 12 ms

The code segment above will create this index for us. Such a composite index can also be made via index factory:

index_f = faiss.index_factory(768, "IVF50,Flat")

For this, we need to train our index on the data first:

assert not index.is_trained  # it is False first
index.train(data)  # finding the Voronoi cells
assert index.is_trained # it is trained now
index.add(data) # adding the data
# CPU times: user 21.3 s, sys: 990 ms, total: 22.3 s
# Wall time: 564 ms

You can search again using this hierarchical search which outputs the approximate results, and you can see a considerable speedup:

D, I = index.search(query, 10)
# CPU times: user 751 ms, sys: 0 ns, total: 751 ms
# Wall time: 14.4 m (compare to 246ms which was output by IndexFlatL2)

To make its result more accurate, you can increase the number of probe points at query time (number of nearby Voronoi cells to include in the search):

index.nprobe = 10 # default is 1
D, I = index.search(query, 10)
# CPU times: user 8.24 s, sys: 0 ns, total: 8.24 s
# Wall time: 135 ms

Let’s pick an index, say I[0,0] , the value it has is 18786. Let’s try to see what that value actually was:

index.reconstruct(18786)

# RuntimeError: Error in faiss::DirectMap::idx_t faiss::DirectMap::get(faiss::DirectMap::idx_t) 
# const at /project/faiss/faiss/invlists/DirectMap.cpp:82: direct map not initialized

The reason why the error above is what you encounter is that the direct mapping between the vectors and positions is not established yet. To do so, you have to first run index.make_direct_map() method:

index.make_direct_map()
index.reconstruct(18786).shape
# (768,)

INDEX: IndexIVFPQ

Imagine you have a larger number of arrays, the method above might still show inefficiencies. Another improvement that is possible (in terms of speed, and at the cost of accuracy of course), is quantization. The index we discuss here is called IndexIVFPQ , and it works like this:

Product Quantization is used to compress the vectors, it allows you to approximate the similarity calculation.

Fig 2. Steps to compress a vector using Product Quantization (image: Pinecone)

As shown in Figure 1, a vector is broken down into several subvectors, each of which is clustered separately, and then each subvector is replaced with the ID of its nearest centroid.

The index can be created as below:

num_voronoi_cells=10
num_segments = 8
num_bits_per_centroid_idx = 8
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(
    quantizer, 
    d, 
    num_voronoi_cells, 
    num_segments, 
    num_bits_per_centroid_idx)
# CPU times: user 1.37 ms, sys: 77 µs, total: 1.44 ms
# Wall time: 908 µs

Locality Sensitive Hashing

Consider Locality Sensitive Hashing (LSH) to be a hashing function that attempts to maximize the collisions for similar items. The idea is to use them on data to find the bucket to which it belongs quickly.

LSH provides an interesting way to reduce dimensionality and make the search more efficient in terms of computational needs.

This index can be created as below:

nbits = vec_dim * 4
index = faiss.IndexLSH(d, nbits)

Hierarchical Navigable Small World Graphs (HSNW)

A very interesting way to perform a highly efficient nearest neighbor search is the use of HSNW graphs (refer to this link about what they are). In a nutshell, it provides a way to zoom in gradually and find the nearest neighbor.

First, you create the HSNW graph by eliminating connections with more severity in each layer. The algorithm then performs a greedy search and then goes to the next layer (to increase the resolution of the search gradually) in order to find the nearest neighbor, resulting in a local minimum which would be found at the bottom layer.

In FAISS, to do so, all you need is:

num_connectionss_per_vertex = 64
layer_depth_construction = 64
layer_depth_in_search = 32

index = faiss.IndexHSNWFlat(d, num_connections_per_vertex)
index.hsnw.efConstruction = layer_depth_construction
index.hsnw.efSearch = layer_depth_in_search
# index is ready!

I hope you have found this post helpful.