So it goes straight down from Compression LLM to the final call but what if after compressing you notice there is some extra context window (can fit more tokens) and also that the original query had a lower k so raising k would return more possibly relevant info.
Also are there ways to combine a summary of the segment plus the compression, by compression they seem to imply just extracting the most relevant passages using the Compression LLM so also getting a summary here to pass to final LLM along with the most relevant passges from each segment.
I am trying to build a reading comprehension question-answer grader for elementary up to high school so my PDF would be of fictional works usually like “To Kill a Mockingbird” or “Animal Farm” which can be difficult to squeeze in all relevant info in the token limit.
Without fine-tuning a model how would I best solve the problem for my task?