Creating domain specific LLM for creating a virtual data scientist(takes inputs in natural language as a query, uses data which is structured and gives insights as answers)

Gayathri_E_S · July 20, 2023, 3:25pm

I have a system that currently processes natural language queries and generates insights in charts and narratives formats using Python code rules. However, as the number of question types increases, the system and rules are becoming more complex, and changes in the question structure often lead to failures.

I also attempted using a Natural Language to SQL approach by refining the T5 model with domain-specific NL questions and corresponding SQL queries. However, this approach did not perform well, especially for complicated questions like diagnostics and forecasts.

After completing the first week course, I started wondering if it would be feasible to create a domain-specific Language Model (LLM) to address these challenges. There are two potential approaches:

Developing a domain-specific LLM trained on NL to SQL or based on question types, which could potentially improve the system’s performance and adaptability to different queries.
Alternatively, training an LLM to directly generate the current JSON format, which is currently being created by complex rules, thereby simplifying the system and making it more efficient.

Given the complexity of the problem, I’m uncertain about the feasibility of these approaches. Before diving in, I wanted to seek suggestions and guidance on how to proceed. Any insights or advice would be greatly appreciated. Thank you.

gent.spah · July 20, 2023, 3:37pm

In the comming weeks there are a few methods explained in tuning and adapting an LLM to a particular task and maybe with a little modification you could use it for your task.

Juan_Olano · July 20, 2023, 11:36pm

I am really interested in any follow up on this topic - if you can keep us in the loop, @Gayathri_E_S , that will be great! May be come with any other questions!

Gayathri_E_S · July 21, 2023, 5:55am

Sure, Will do. I’ll complete the remaining weeks and see If I can make any improvements on the same.

Topic		Replies	Views
Generative AI Projects GenAI with LLMs Resources	3	880	May 10, 2024
Can you mix and match different types of data? Finetuning Large Language Models	2	113	September 21, 2023
Generating Synthetic Dataset Using LLM AI Discussions ai-discussions , project	6	179	January 7, 2025
Can Large Language Models Replace Data Analysts? AI Discussions ai-discussions	2	235	April 18, 2025
I want to create a model which will help me generate test cases after looking at a piece of requirement AI Discussions ai-discussions	7	134	January 20, 2024

Creating domain specific LLM for creating a virtual data scientist(takes inputs in natural language as a query, uses data which is structured and gives insights as answers)

Related topics