Meta-Learning

There’s a lot of research proving that making an LLM think for more time, review its response, critic it, and tune it, improves substantially its output.

A lot of people recommend using another agent (reviewer) who will review the work of the generator LLM.

How often do you see this architecture in practice?
Isn’t it expensive to run every customer query twice (one query for LLM A, and another one for the reviewer LLM B)?