🔮 Visualising RAGs with RAGxplorer (Inspired by Advanced Retrieval course)

gabrielchua · January 29, 2024, 11:57pm

Inspired by the Advanced Retrieval course, I built RAGxplorer.

It is an open-source tool to visualise RAG documents in the reduced embedding space.

It comes with a very simple and familiar API.

from ragxplorer import RAGxplorer
client = RAGxplorer(embedding_model="thenlper/gte-large")
client.load_pdf("presentation.pdf", verbose=True)
client.visualize_query("What are the top revenue drivers for Microsoft?")

I extended the code from the lab to include an interactive chart and to support any OpenAI and HuggingFace Inference Endpoint embedding models.

Any feedback would be most appreciated.

TMosh · January 30, 2024, 12:25am

You posted in the “AI Questions” area. Is there a question associated with your message?

If not, I think maybe it would be more at home in the “AI Projects” area.

You can move it there using the “pencil” icon in the thread title.

sheun · January 30, 2024, 7:48pm

This is super cool; congratulations on your great work. Why does it appear that the retrieved chunks are not close to the original query or to the sub questions? Shouldn’t they be? Please help me understand

Maybe it is because the projection we are viewing is not the ideal projection (i.e. the two dimensions that are plotted here)? Maybe there is another projection that shows they are infact closer?

gabrielchua · January 31, 2024, 12:12am

Hello! Great question

To clarify, the actual retrieval (i.e. calculating the cosine similarity) is done in the full dimensional space.

The dimensionality reduction algorithms like umap may result in some information loss, so a point that is nearest in the full dimensional space may not be the nearest point in the 2D space.

There are also hyperparameters to the dimensionality reduction algorithm that can be explored

Lastly, there was an related bug in the code, but that was spotted in a PR and fixed

sheun · January 31, 2024, 3:10am

Gotcha. Thanks for the explanation. By the way what are your thoughts on Langchain? Are there similar libraries you would recommend for developing on top of LLMs? curious for your perspective on this

Topic		Replies	Views
Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks AI Discussions ai-discussions , data-centric	0	28	June 2, 2025
Project Title: Cancer Information Retrieval from External Data Sources using RAG and Gemini Pro LLM Models AI Discussions ai-discussions , careers , langchain , vector-database , project	2	672	February 26, 2024
🌟 New Course! Enroll in Advanced Retrieval for AI with Chroma News and Announcements short-course , dl-ai-learning-platform	6	797	January 9, 2024
docAnalyzer - chat with large PDF dataset AI Discussions ai-discussions , careers , project	4	231	February 9, 2024
Looking for collaborators for a RAG-based project AI Discussions ai-discussions , project	14	345	August 30, 2024

🔮 Visualising RAGs with RAGxplorer (Inspired by Advanced Retrieval course)

Related topics