Generative AI with Large Language Models Week 2 Multitask vs. Single task fine-tuning

In Week 2 of Generative AI with Large Language Models there are discussions of single-task fine tuning and multitask fine tuning. For single task you need 500-1000 examples. For multi-task you need 50,000 to 100,000 examples.

I’m thinking that it is hard to synthesize 50,000 to 100,000 examples to fine tune a model on. I did a full fine tune of Open AI awhile back where I had the model generate video scripts from learning objectives. The scripts would

  1. Layout learning objectives (what was going to be covered)
  2. Have a narrative about the topic at hand (5th grade social studies)
  3. Summarize what was just covered
    I had 750 synthesized scripts (I did 1000 but manually threw out ~250) and the fine-tune sort of worked on the 3rd try. There is no way I could manually evaluate 50-100k scripts.

For people who have done this before, would it make sense to do a single task PEFT layer for each of the tasks (Learning objectives, script, summary) and daisy chain them versus doing one giant multi-task fine tune?

I also did a fine-tune where I had the API generate multiple choice assessment questions based on the scripts. I had done this as a separate fine tune as writing a prompt that combined the two seemed to yield lousy results.

Thank you for your help. Any information on creating generative pipelines with consistent results is appreciated.

The Week 3 LoRA video seems to be advocating for multiple PEFT trainings for each task.