LLMs don't seem great with tabular data when it comes to RAG. How can i fix this folks?

How do i work with tabular data when it comes to RAG. The traditional method has failed to give me correct answers from a very basic table.

Courtesy from CrewAI Assistant, GPT-4:

To address your query about working with tabular data in the context of Retrieval-Augmented Generation (RAG), it’s important to acknowledge that RAG, primarily used for text data, might struggle with tabular data due to its format and structure. Here’s a plan to tackle this challenge effectively:

  1. Agent Setup: We’ll define agents specialized in handling and interpreting tabular data. For example, a ‘Data Analysis Expert’ agent can be tasked with understanding and processing table structures.
  2. Task Definition: We’ll create tasks specifically for handling tabular data. These tasks will instruct the agent to interpret the table, extract relevant information, and reformat it if necessary for better understanding.
  3. Tool Integration: If necessary, we can integrate tools that are specialized in tabular data handling and analysis. These tools can assist the agents in interpreting and extracting meaningful information from the tables.
  4. Custom Code or Tool Development: In case existing tools are insufficient, we can consider developing custom code or a tool specifically for handling your type of tabular data. This could involve creating scripts that reformat the table or extract key data points for the agent to process.
  5. Testing and Iteration: Once the agents and tasks are set up, we’ll need to test them with your specific data and iteratively refine their capabilities to ensure they can handle the data correctly and provide accurate answers.

Hope this helps.

hey @marconi , thanks for your answer, in my tests it seems that LLM really struggles with tabular data, as you said. Even with the technique you suggested, Agent Setup, it is difficult to make an agent understand or interpret tabular data.

By the way, was your message generated by CrewAI?

Any time you see a message that has five numbered paragraphs with bold titles, it’s an indicator it came from a chat tool.

I haven’t worked with RAG but I wonder if one might have better results if the data was translated to an explicitly structured/typed format first before being fed in (i.e. JSON, etc)