Embeddings, Vector DB, FAQs earch and ranking

Hi, I’m pretty new, and after trying RAG with LLM for a site with tech questions posed by users (and having the LLM give some fancy answers…) I tried to step back and use only embeddings and a vector database (I’m using Chroma right now). Every question and its answer is a document in Chroma and the embedding has been computed with OpenAI ada-003. When querying the DB with the embedding of the question sent by a user, I get usually as the first result the right FAQ, but I have no way to decide if that result is actually wrong because there are no answers.
Analyzing the distances (I’m using the cosine distance) of the first X results… could it be a way? I thought if I had very close distances for the first results, probably they were all wrong.
From a different point of view, if I extract some content from the database, before using it as RAG I should have a way to decide it is no good to create a context.

A second question: the embeddings stored in Chroma should be computed over the whole question+answer or just the question of our FAQ database?

Probably I missed it, but I have not found (comprehensible) best practices on RAG content identification… any reference would be really appreciated!


I’m trying to understand your use case. Do users post both questions and answers in a forum? Do you want a chatbot to provide an answer to a user’s question using existing answers to similar questions?

Hi, thank you for getting back!

Users post only a question and we already have a chatbot that should answer them. We used rag+chatgpt3, openai assistant, and a third-party chat system (which was rag made by them with chatgtp 3 and 4).

What I’m trying to experiment with is a system to find the answer only using a semantic search :slight_smile: to just present the exact answer from our database, without a chat or text generation by an LLM. I would like to compare the quality we can obtain with a chatbot and just the answer extraction.

Why? In a few cases the LLM “answer” was so convincing but so wrong that users become crazy in try to follow the instructions to get some specific behaviors from a product that cannot be obtained :slight_smile:.

Where does the RAG come in? Do you have a database of answers, or are you having the LLM make up its best guess at an answer?

Hi, I have a database with a question and its answer. Of course, the question is one possible version of all the questions answered by that specific FAQ.

For example, the question can be: “Where can I set the maximum delivery speed?”

Most of the users, we experienced, even when chatting are pretty synthetic and shorten the question to “max delivery speed”, like a common search on a search engine (but of course without reference to the product since they are in the support section for that product).