This is something I’ve been thinking about recently while working on a RAG based project.
I’ve found that knowledge graph setups can significantly improve multi step reasoning and provide much better traceability compared to traditional RAG. However, the moment you try to scale this especially with large sources like books or multiple PDFs the graph construction itself quickly becomes the bottleneck.
The token cost of extracting entities and relationships merging them and then embedding everything adds up fast. It feels like the benefits are clear but the pipeline to get there is still quite expensive and complex.
Curious to hear from others working on similar problems how are you handling KG construction at scale?
Are there specific techniques tools or workflows that have worked well for you?