Stop generation in between (LLM response)

Team, While using RAG , when response is getting generated, I would like to stop the processing in between and free up any memory LLM is using. Basically “Stop Generation”. is it possible to stop the generation and not just disconnect the call or thread. ? If so , could you advise how ?

Hi @vsharma19 . I have not encountered such a case before. Which LLM are you using? Are you running notebooks or scripts?

Hi,
Well does not matter , simply put - If a RAG is in process there is actually STOP parameter which stops the process, but I want to stop based on user preference. you can use gpt4-32k or 4o

1 Like