how do you maintain the count of a prompt ? is it by reframing the users query to a refined prompt and then saving it in the cache with a count ? so that when we get the similar refined prompt using similarity score, we return the response stored in cache?
Hello Rohan_Devaki.
When we talk about cache management, it’s not necessary to keep a count of prompts; the caches are saved according to what we consider necessary; it’s within our control.
In this case, the idea of using the cache is to improve latency.
It’s suggested to maintain a cache of frequently sent prompts along with their responses. This allows, when a new prompt is received, the similarity with the cached prompts can be calculated; if a similar prompt is found in the cache, the stored response can be returned, which avoids the slower generation process.
In the case you’re suggesting, of “restructuring the user’s query into a refined request”, this can be saved in the cache as a new request, so that when a similar request is made, as you’re suggesting, the response will only be obtained from the cache.
The idea of using cache and similarity helps optimize the system’s response.
Regards
Ronweld B.