Document Comparison

Hi Everyone,

An interesting use case that I’d like to see explored is comparing multiple documents. In the context of this course, for example, what are the differences between lecture 1, 2, 3 and 4.

Currently I am able to store multiple docs into a vectordb but the LLM then seems to struggle to differentiate between the docs. I’m trying to get it to read two documents and highlight if there are any differences.

What I have also tried is storing the in two separate vectordbs. In theory we could then just compare the embeddings in the two vectordb and highlight the differences.

Has anyone here used such use cases?

I was able to do this using Lanchain Agents.

Hi Mark,
I am curious how you made it work. I am exploring a similar use case where I want to compare privacy polices of different companies example different airlines and then be able to ask questions like what are the differences between these airlines cancellation policies. Do you think Lanchain Agents will work for this use case?

Thank you.

Hi,
Yes, Langchain agents should work here. It’s been a while since I worked on it but you would essentially create “custom tools” that your agent can use. So, for example, tool 1 could be a retriever for your first doc, tool 2 is a retriever from 2nd doc, and so on. Then it’s a matter of fine tuning your prompt to ensure it uses all of the tools, extracts the information and then compares them.

1 Like

Thank you Mark.

Were anyone able to conduct a pdf comparison following the https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit/#openai-multi-functions tutorial?