Hi,
I wanted to ask a fundamental question. Humans, except savants, cannot entirely reproduce a large data (essay, book, article etc.) by heart. Instead, we have the tendency to reframe and make it short, more a like summary, and produce it.
AI machine on the hand, it can reproduce it as a whole. Instead what if we create a LLM that stores only the summary or the gist of the data with losing meaning of the data, so that you will not have more token in the corpus. That way, we can have a human like memory, just the important information will be stored.
How do you do that? Maybe, every time an answer is produced, an agent can summarize it plus original data and store only the modified data (summary of answer + original data).
What are your thoughts on this? Can we make it better?
Edit: In the implementation of LangChain sort of a thread like architecture, we keep the history of the conversation. The history is kept till either the number of tokens runs out or the history is truncated. Though we don’t not know how the model works and translated into, we can make you of trick like these. But I don’t know, how it will translate to.