Module 3: question about error analysis experiments scale

Thank you for an excellent course! very much appreciated

regarding error analysis experiments scale (screenshot attached)

what does ‘small scale’ means exactly?

I understand that we create a smaller train dataset, with samples that address the fixes proposed.

But, how is the fine-tuning experiment?
(a) do a full fine-tuning, with the same base llm used before as input to fine-tune, but this time with a dataset that includes the original fine-tuning dataset PLUS the new smaller dataset with the fixes
(b) or do an iterative fine-tuning, the input is the previous fine-tuned model and only use as a train set the new smaller dataset with the fixes

for (b) probably some catastrophic forgetting can happen,
but (a) would take longer and more expensive

Thank you in advance!
Zen L

It is recommended to include some of the original instruction/alignment data along with the new fixes to avoid catastrophic forgetting. Even better is to use a curated subset—look up “coreset selection” if you are interested. You will also want a regression benchmark for your application to quantify forgetting. Other helpful tactics include using a low learning rate and PEFT.

Thank you!, I look at ‘coreset selection’ and looks helpful, I will look into it.
I do have evals that will let me know if scores degrade