How to perform Question Answering on Multiple JSON Files: Beyond SQL Query Generation with LLMs

sonali2 · January 13, 2025, 2:04pm

Hello OpenAI Community,

I am currently exploring various approaches to implement a question-answering system on multiple JSON files. Here’s a quick summary of the methods I have already implemented:

SQL Query Generation with LLMs:

I built a system where the user’s natural language query is converted into an SQL query using an LLM.
The generated SQL query is then executed on a relational database to fetch the relevant records.

MongoDB Query Generation with LLMs:

Similarly, I developed a system where the user’s natural language query is translated into a MongoDB query using an LLM.
The generated MongoDB query is then executed on a MongoDB database to retrieve relevant data.

LangChain JSON Toolkit:

Using LangChain’s JSON Toolkit agent, I implemented a solution to handle question answering on JSON files by generating relevant MongoDB queries via an LLM and executing them on the database.

Combined JSON Data with Python Code Generation:

I combined all JSON file data and used an LLM to generate Python code dynamically to query the combined data programmatically.

While these methods have been effective, I am now seeking alternative approaches to enhance the robustness and versatility of my system. Specifically, I would like to explore methods that:

Do not rely solely on query generation.
Are suitable for handling unstructured or semi-structured JSON data.
Optimize for scalability and efficiency when dealing with large JSON datasets.

I would love to hear your suggestions, insights, and any other possible approaches or tools that could help me improve my system!

Looking forward to your ideas and recommendations.

Ikram_Ali · March 7, 2025, 5:44am

I am strugling too with processing large JSON by LLM for diverse kind of operations like making bulk of changes and creation of objectives, features and strategic themes etc.
Applied some techniques that you have already implemented. Now I am applying chunking strategies where facing the challenge of context losing as in some of my cases data loss is not bearable.
In your case, I would suggest to go for meaningful chunking of your data and then Embedding-Based Retrieval (Vector Search) that might help you to make a diverse solution.
I would like to see your current implementations if possible. May be I could suggest better and learn something from you

Topic		Replies	Views
JSON base RAG AI Discussions ai-discussions	0	656	July 11, 2024
Passing JSON as input to LLM Building and Evaluating Advanced RAG Applications	4	8594	July 2, 2024
How to approach JSON based RAG Building and Evaluating Advanced RAG Applications	17	18477	January 26, 2026
Creating domain specific LLM for creating a virtual data scientist(takes inputs in natural language as a query, uses data which is structured and gives insights as answers) Generative AI with Large Language Models week-module-1	3	501	July 21, 2023
Langchain JSON agent (create_json_agent) Functions, Tools and Agents with LangChain week-module-1	0	514	March 25, 2024

How to perform Question Answering on Multiple JSON Files: Beyond SQL Query Generation with LLMs

Related topics