Embeddings, Vector DB, FAQs earch and ranking

satollo · June 6, 2024, 8:03pm

Hi, I’m pretty new, and after trying RAG with LLM for a site with tech questions posed by users (and having the LLM give some fancy answers…) I tried to step back and use only embeddings and a vector database (I’m using Chroma right now). Every question and its answer is a document in Chroma and the embedding has been computed with OpenAI ada-003. When querying the DB with the embedding of the question sent by a user, I get usually as the first result the right FAQ, but I have no way to decide if that result is actually wrong because there are no answers.
Analyzing the distances (I’m using the cosine distance) of the first X results… could it be a way? I thought if I had very close distances for the first results, probably they were all wrong.
From a different point of view, if I extract some content from the database, before using it as RAG I should have a way to decide it is no good to create a context.

A second question: the embeddings stored in Chroma should be computed over the whole question+answer or just the question of our FAQ database?

Probably I missed it, but I have not found (comprehensible) best practices on RAG content identification… any reference would be really appreciated!

Stefano.

bwhiting2356 · June 6, 2024, 8:24pm

I’m trying to understand your use case. Do users post both questions and answers in a forum? Do you want a chatbot to provide an answer to a user’s question using existing answers to similar questions?

satollo · June 7, 2024, 5:29am

Hi, thank you for getting back!

Users post only a question and we already have a chatbot that should answer them. We used rag+chatgpt3, openai assistant, and a third-party chat system (which was rag made by them with chatgtp 3 and 4).

What I’m trying to experiment with is a system to find the answer only using a semantic search to just present the exact answer from our database, without a chat or text generation by an LLM. I would like to compare the quality we can obtain with a chatbot and just the answer extraction.

Why? In a few cases the LLM “answer” was so convincing but so wrong that users become crazy in try to follow the instructions to get some specific behaviors from a product that cannot be obtained .

bwhiting2356 · June 7, 2024, 5:38pm

Where does the RAG come in? Do you have a database of answers, or are you having the LLM make up its best guess at an answer?

satollo · June 7, 2024, 6:15pm

Hi, I have a database with a question and its answer. Of course, the question is one possible version of all the questions answered by that specific FAQ.

For example, the question can be: “Where can I set the maximum delivery speed?”

Most of the users, we experienced, even when chatting are pretty synthetic and shorten the question to “max delivery speed”, like a common search on a search engine (but of course without reference to the product since they are in the support section for that product).

Topic		Replies	Views
How are different embeddings compatible? LangChain: Chat with Your Data	0	165	December 30, 2023
Great Course, real bootstrap to LLM chats. One question Advanced Retrieval for AI with Chroma	0	136	January 8, 2024
Embedding Model and QA Model LangChain: Chat with Your Data	0	14	September 2, 2024
Make an LLM source new info without RAG? Is it possible? AI Discussions ai-discussions	7	651	April 7, 2024
RAG for college course catelog AI Discussions ai-discussions , chatgpt , langchain , large-language-model , project	1	267	February 1, 2024

Embeddings, Vector DB, FAQs earch and ranking

Related topics