Passing JSON as input to LLM

Vignesh_Nainar_A · February 10, 2024, 12:14pm

Hi all!

I’m trying to build a RAG using llamaindex. I created 2 indices (one for the JSON files another for HTMLs and PDFs) when I create 2 query engines and pass it to an agent as tools. The model is not able to query the JSON and provide response. Most of the time the model uses implicit information to give a response and it is always wrong and sometimes it says the documents shared doesnt have the required data.

I am planning to pass the JSONs to the LLM without creating a vector index. Will it work?

Any suggestions on how the model can understand the query and retrieve the data from the JSON will be helpful.

Thanks!

haroldc · February 11, 2024, 3:28pm

Hi.
I did work with vectorized JSON files and LLMs, and it works fine. I don’t know how you are vectorizing the JSONs, but what I do is vectorize the content of the JSON, not the JSON directly: I extract the info I want to vectorize from the JSON, then I vectorize that info.

Let me know if your problem is in that way.

Vignesh_Nainar_A · February 12, 2024, 5:15pm

Hi @haroldc , I passed the json files to simpledirectoryreader and converted the docs to vector index. And I am using an open source LLM zephyr-7b-beta.

Instead of vectorizing, I tried using the JSONalyze query engine and it is better. But still the usage of prior knowledge of the LLM is still there even after prompting it NOT to use it.

haroldc · February 14, 2024, 1:23am

How are your JSON files?

Mines are like this:
JSON = {
‘page_content’: ‘The text or data I want to embedd’,
‘other_things’: ‘Here are the data I use for metadata’
}

ksvr444masters · July 2, 2024, 5:26pm

I am actually trying to implement a chatbot in GCP using VertexAI Agent builder. Even I have got 1000 json files each has a different structure. I have converted the json files into pdf and passed them to text-bison@002 model. Iam not getting appropriate results. Inaccurate results. Could you please guide me ways on how to make the model understand the json data effectively if we have 1000 json files.

Topic		Replies	Views
JSON base RAG AI Discussions ai-discussions	0	418	July 11, 2024
How to perform Question Answering on Multiple JSON Files: Beyond SQL Query Generation with LLMs AI Discussions ai-discussions , project	1	131	March 7, 2025
Model is not producing any queries (A good thing, but not for the learning sake) Building Systems with the ChatGPT API	1	60	June 10, 2024
Embeddings, Vector DB, FAQs earch and ranking AI Discussions vector-database	4	142	June 7, 2024
RAG for college course catelog AI Discussions ai-discussions , chatgpt , langchain , large-language-model , project	1	319	February 1, 2024

Passing JSON as input to LLM

Related topics