Generating a running summary, as pointed by @Eric_Townes is a good place to start. Here are 2 more approaches where summarization may not be required:
- If quality of responses should be high, switch to a bigger model once the token limit is reached. After switching to the largest model your company can pay for, get a human involved.
- If earlier parts of the conversation are unimportant, remove them from the chat history to get more space for more recent sentences.