Can you mix and match different types of data?

Jeremy_Tang · September 20, 2023, 8:10pm

I’m looking into fine-tuning an LLM for a domain specific instruction finetuning project. However, there’s a lot of related concepts that aren’t necessarily suited in a QA format that I believe the model would need to get a good understanding in general of the topic.

My question is, does anyone know, if I mix and training data with non - prompt/answer type data and also feed the LLM reams of relevant background text/documents if that would produce undesired results.

I’m curious to know before I invest fair bit of time/money into creating finetuning data. Any input would be greatly appreciated!

gent.spah · September 21, 2023, 7:39am

Yes it might cause catastrophic forgetting of the original task, but there are techniques overcoming that, you should check the Generative AI course from AWS we have here to research some of those techniques.

Jeremy_Tang · September 21, 2023, 3:50pm

Thanks, I’m actually just about 2/3rds of the way through the AWS Generative AI course.
Sounds like the approach would be to present the background information into chunked Q/A.

Topic		Replies	Views
Finetuning logs AI Discussions	0	99	September 22, 2023
Finetuning - length of data AI Discussions ai-discussions	2	66	October 13, 2023
Week 2: Instruction fine-tuning Generative AI with Large Language Models llm , prompting	1	67	November 18, 2024
How to create dataset on a specific topic to fine tune llm? Finetuning Large Language Models	0	189	November 27, 2023
Fine-Tuning a Large Language Models with QLoRA and PEFT(LLMs) Generative AI with Large Language Models week-module-2	11	734	July 15, 2023

Can you mix and match different types of data?

Related topics