Hello fellows,
I am working on a Digital School System where I used the pre-trained Llama-3.1-8B-Instruct model but I am experiencing a response cut-off when teachers’ or student interface sends prompts to generate tasks, it gives incomplete responses. I would appreciate it if someone could help or recommend thing
Hi @mkt11
If you are using LangChain framework, make sure to increase the value of max_new_tokens
passed in model_kwargs
, so you can get complete answers from the llm.
Also, you can take a look at this link, too.
Hope it helps! Feel free to ask if you need further aasistance.
Thank you for the feedback.
I am using s3 to store user chat history and the lambda function to retrieve the chats and send along with user new prompts(API gateway) to the sagemaker endpoint(I don’t think it is a good solution but you can advise accordingly).
Thanks for the reminder!
you can use other methods to remember user chats (e.g. send only last k conversation, or save a summary of the conversation instead of whole conversation).
Btw, you can watch LangChain short courses on DLAI platform to gain overall knowledge how to solve problems with LLMs.
Thank you
I will check the course.
You’re welcome! Feel free to ask if you need help.