I’m just getting started with large language models (LLMs) and have been learning how to use them to create various products. Recently, I was introduced to the concept of Retrieval-Augmented Generation (RAG).
My main question is when to use RAG and whether it’s even necessary. If I understand correctly, one of the reasons for using RAG is because the context window of an LLM is limited. However, if I’m creating a knowledge base chatbot with a limited amount of text that doesn’t exceed the context window, can’t I simply send the entire text to the LLM directly instead of using RAG?
You could consider using RAG if you think there’s a possibility of expanding the knowledge base.
@lukmanaj correct me If I am wrong but expanding the knowledge base means. I am providing LLM with the knowledge it is not been trained with right?? Like providing excess to my private documents.
Let’s say I am using LLM with 1M context window then Do i really have to use RAG ??
It depends. For example if you have an archive of structured and un-structured data that you would use only for your use case (business, project etc.).
Your understanding is correct regarding why Retrieval-Augmented Generation (RAG) is used. RAG combines LLMs with retrieval systems to handle large or dynamic knowledge bases, especially when content exceeds the context window or requires continuous updates.
In your case, if your knowledge base is small and fits within the LLM’s context window (even with a 1M token limit), directly passing the full text might be more straightforward and sufficient. However, if you anticipate growth or need to maintain up-to-date knowledge (e.g., adding new documents), RAG can be advantageous for scalable and dynamic retrieval.
Even if the complete document is within the context window, there is still an advantage. Your system is more efficient as you don’t need to pass all the information during each query. It all depends on your choice.
@lukmanaj @NightWing correct me if I’m wrong, but I think another big use of RAG has been to try and tamp down on hallucinations ?
I.e. Rather than having to answer a specific query from the totality of the LLM’s ‘memory’ (aka model), it can pull the information from a few, much narrower source documents.
I could… be wrong though.
You’re absolutely correct! One of the significant advantages of using RAG is to reduce hallucinations in LLMs. By relying on a retrieval system to fetch relevant, narrower documents, the model can base its generation on factual, up-to-date information rather than pulling from potentially inaccurate or outdated general knowledge stored in the model. Thanks for bringing this up.
@lukmanaj @Nevermnd thank you.
Can you tell me more about How I can make a robust knowledge based chatbot.
What are the standard practices ??
Large Language Models (LLMs) are trained on vast amounts of data, which influences their ability to generate responses. When you ask general queries, the LLM provides answers based on its training data, which is generally accurate but limited to the knowledge available up to the model’s training cutoff. However, for specific queries related to proprietary information or recent developments, the LLM may not provide accurate or relevant responses due to the lack of up-to-date or specialized data.
In such cases, Retrieval-Augmented Generation (RAG) can be particularly useful. RAG enhances the LLM’s performance by retrieving relevant information from a specific dataset or external sources and integrating this information into the response. This approach allows the LLM to offer more accurate and contextually relevant answers for specialized or current topics.
Therefore, while RAG is not always necessary, it is highly recommended when dealing with queries that require specific, recent, or niche information that the LLM alone might not be able to address effectively.
We are using RAG to build a enterprise chat bot for our sales executives. if they require high level information on any specific product, instead of reaching out to product experts they could simply ask the chat bot.
We cannot do the same with the context window
- No scalability
- We cannot afford any false information (hallucination)
- We cannot keep adding products to the system.
Can i use rag for text data or It can also be used for tabular data as well ?
You can use it for tabular data as well.