I am using the GPT 4 model as my LLM for an application. There is information about my project that is stored in my database and not known to the LLM. I want the LLM to train over that data as well and answer the user’s question with that additional knowledge.
So, far I have tried embedding search. I stored all the information from in a vector database and perform an embedding search when the user asks the question. The result set of my vector search becomes the context for the LLM in the prompt and then, with temperature “0”, the LLM responds to the question. This is giving me 80% accuracy as there are times when my dataset has unique IDs where the embeddings have no meaning and sometimes, the context is generic so even with the context the LLMs hallucinate.
Another approach that I have tried is Fine tuning. The problem here is the extended LLM version always hallucinate and it doesn’t extend it’s knowledge base. I am only able to modify the behavior of the LLM.
What are some other approaches which have worked for you?