Hello OpenAI Community,
I am currently exploring various approaches to implement a question-answering system on multiple JSON files. Here’s a quick summary of the methods I have already implemented:
- SQL Query Generation with LLMs:
- I built a system where the user’s natural language query is converted into an SQL query using an LLM.
- The generated SQL query is then executed on a relational database to fetch the relevant records.
- MongoDB Query Generation with LLMs:
- Similarly, I developed a system where the user’s natural language query is translated into a MongoDB query using an LLM.
- The generated MongoDB query is then executed on a MongoDB database to retrieve relevant data.
- LangChain JSON Toolkit:
- Using LangChain’s JSON Toolkit agent, I implemented a solution to handle question answering on JSON files by generating relevant MongoDB queries via an LLM and executing them on the database.
- Combined JSON Data with Python Code Generation:
- I combined all JSON file data and used an LLM to generate Python code dynamically to query the combined data programmatically.
While these methods have been effective, I am now seeking alternative approaches to enhance the robustness and versatility of my system. Specifically, I would like to explore methods that:
- Do not rely solely on query generation.
- Are suitable for handling unstructured or semi-structured JSON data.
- Optimize for scalability and efficiency when dealing with large JSON datasets.
I would love to hear your suggestions, insights, and any other possible approaches or tools that could help me improve my system!
Looking forward to your ideas and recommendations.