How Should We Interpret Vector Database Retrieval at Scale: KNN or ANN by Default?

Raymond_Lau · July 22, 2025, 1:35pm

A data scientist is implementing a vector database for a recommendation system that needs to handle millions of product embeddings. What should they understand about the results returned by the vector database?

However, the question doesn’t specify whether the retrieval system uses brute-force KNN or an Approximate Nearest Neighbor (ANN) algorithm.

If the system uses brute-force KNN, then “query time will increase linearly as the number of vectors increases” is a correct answer.
If the system uses ANN, then it’s more accurate to say “the results might not always be the exact nearest neighbors, but searches can be completed substantially faster.”

Given that the question only mentions “millions of embeddings” and not the retrieval method, should we assume that modern vector databases default to using ANN for scalability?
Or is it fair to answer based on the information given, considering both KNN and ANN possibilities?

How should we approach such questions where the retrieval algorithm is not explicitly stated?

balaji.ambresh · July 23, 2025, 9:11am

Given that the scale of products is in millions,

Yes.

Topic		Replies	Views
I've created vector embeddings using FAISS but now what? AI Discussions ai-discussions	1	219	May 5, 2024
Embeddings, Vector DB, FAQs earch and ranking AI Discussions vector-database	4	160	June 7, 2024
New Course: Vector Databases: from Embeddings to Applications News and Announcements dl-ai-learning-platform	2	420	November 13, 2023
FAISS as Vector Store for Semantic Search with Bert AI Discussions	1	166	September 20, 2023
Chroma vs FAISS vs...for vector store? LangChain for LLM Application Development	10	1361	July 14, 2023

How Should We Interpret Vector Database Retrieval at Scale: KNN or ANN by Default?

Related topics