Hi everyone,
I’m hoping to get your thoughts on a problem I’ve encountered with a project.
A small part of our project involves checking one or more documents for inconsistencies. For example, if a document states that “milk tea costs $10 a cup” in one place, and the same or another document says, “spend $20 to buy a cup of milk tea,” that’s a contradiction. Because these conflicting statements aren’t always phrased identically, we need to use an LLM to understand the semantics to identify them.
We’ve been passing the documents directly to a model (like ChatGPT or Gemini) via their API for it to read and assess.
We’ve noticed that if there’s only one document to check, the results are pretty good. The model is generally able to find the contradictory points with reasonable accuracy.
However, when we increase the number of documents to two or three, the model’s performance drops significantly. The inconsistencies could be within the same document or across different ones. In these multi-document scenarios, the model seems to get confused and can’t reliably identify the contradictions.
Based on my description, does anyone have an idea what might be causing this issue? What steps could we take to enable the model to accurately find contradictions across several documents? (I’m wondering if, in this situation, instead of feeding the entire documents to the large model, we should be using a method similar to RAG by chunking and vectorizing the documents first. However, this is different from a typical RAG use case, as we’re not searching for specific information based on a user query.)
Thanks in advance for your help!