Hello, I’m trying to create a chatbot to answer questions about an institution. The data is based on FAQ and private documents of the institution. I have tried generated synthetic-dataset from the private documents and combined it with the FAQ data and fine-tuned a Gemma model with LoRA, turns out after some testing and inference. The model hallucinate a lot, I have tried to provide some context to the prompt, and make the LLM only answer based on the context provided, the result is either forever-looping a single token until it runs out of max token or doesn’t generate anything.
Do you have any suggestion to solve this problem?