Handling large number of tokens

Generating a running summary, as pointed by @Eric_Townes is a good place to start. Here are 2 more approaches where summarization may not be required:

  1. If quality of responses should be high, switch to a bigger model once the token limit is reached. After switching to the largest model your company can pay for, get a human involved.
  2. If earlier parts of the conversation are unimportant, remove them from the chat history to get more space for more recent sentences.