Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks

antonlees · June 2, 2025, 5:13pm

I’m building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages each), often containing numerous tables and charts.
I’m facing a few specific challenges:
30K+ total chunks across all docs → KNN retrieval gets noisy.
Tried LLM-based reranking, but it’s too slow and expensive to run on all 30K chunks.
Tried summarizing each chunk to improve retrieval, but:
It’s too expensive to generate LLM summaries for all 30K sections.
Table chunks are especially difficult:
Embeddings perform poorly on structured/numeric data.
Summary-style embeddings (e.g. first 300 tokens, or using just heading/caption) aren’t sufficient for value-level lookups.
Looking for ideas or proven strategies to:
Improve precision in initial retrieval at scale
Handle table-heavy content more effectively
Reduce cost while preserving accuracy
Any ideas, techniques, or tooling (besides LangChain) that worked for you?

Topic		Replies	Views
Optimizing RAG-based AI Assistant for High-Volume AI Discussions ai-discussions	0	47	May 15, 2025
Handling Full-Document Statistics in RAG Architectures AI Discussions ai-discussions	3	62	September 18, 2025
L2 - Basic RAG Pipeline Chunking Strategy Building and Evaluating Advanced RAG Applications	0	327	January 30, 2024
LLMs chat with PDFs AI Discussions llm	2	365	January 21, 2024
🔮 Visualising RAGs with RAGxplorer (Inspired by Advanced Retrieval course) AI Discussions ai-discussions , langchain , large-language-model , chroma , project	4	534	January 31, 2024

Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks

Related topics