In Week 2 of Generative AI with Large Language Models there are discussions of single-task fine tuning and multitask fine tuning. For single task you need 500-1000 examples. For multi-task you need 50,000 to 100,000 examples.
I’m thinking that it is hard to synthesize 50,000 to 100,000 examples to fine tune a model on. I did a full fine tune of Open AI awhile back where I had the model generate video scripts from learning objectives. The scripts would
- Layout learning objectives (what was going to be covered)
- Have a narrative about the topic at hand (5th grade social studies)
- Summarize what was just covered
I had 750 synthesized scripts (I did 1000 but manually threw out ~250) and the fine-tune sort of worked on the 3rd try. There is no way I could manually evaluate 50-100k scripts.
For people who have done this before, would it make sense to do a single task PEFT layer for each of the tasks (Learning objectives, script, summary) and daisy chain them versus doing one giant multi-task fine tune?
I also did a fine-tune where I had the API generate multiple choice assessment questions based on the scripts. I had done this as a separate fine tune as writing a prompt that combined the two seemed to yield lousy results.
Thank you for your help. Any information on creating generative pipelines with consistent results is appreciated.