Question Answering Stuff Documents

In the question answering lecture, it is told that the embeddings of the query is compared with the embeddings in the vector database and the top n most similar embeddings are returned which are passed in the llm as context. Then Harrison had talked about the stuff method in which all the documents are stuffed together and passed as context in the prompt. In this case isn’t the similarity calculated with the vector database? If yes, then why? It has the limitation of the context window. How are the above 2 scenarios different?

Yes, first a query is performed on the vector database for the topK documents with the greatest similarity. When using the stuff method all these topK most similar documents will be inserted in the context to compose your prompt. Other methods like. MapReduce, Refine, Re-Rank use other strategies to manipulate and work with these documents before inserting them into context.

Hi. I have a follow up question. Then can’t we just specify the value of K to a small value such that we never exceed the context window? How do stuff, map reduce give any added advantages?