Generative AI with Large Language Models fine tuning checkpoint

sph · April 16, 2024, 12:41pm

In week-2, part 2-2 “fine tune the model with the preprocessed dataset” we can read " Training a fully fine-tuned version of the model would take a few hours on a GPU. To save time, download a checkpoint of the fully fine-tuned model to use in the rest of this notebook. This fully fine-tuned model will also be referred to as the instruct model in this lab.".
But in the cell above we perform trainer.train() on a little dataset but apply to the whole model (so called “original model”). Therefore I understand what we are updating all the weights ? Why would it not be a full fine-tuned model ? Why does this command trainer.train() does not last hours ? (after I can see that we download a model after a checkpoint, so I guess after some iteration but not up to the end). Does it mean that the job still run in batch as we move forward in the notebook ?

gent.spah · April 17, 2024, 6:31am

The training you are performing there is on a very small dataset to give you an understanding of the training process.

The checkpoint provides the training on a large dataset over many iterations, its better! You want to see the difference between a small dataset training and a proper training with big dataset and many iterations!

No its done when the cell finishes running!

sph · April 18, 2024, 1:06pm

Thanks. Then I understand that the command trainer.train() runs quickely because the dataset is very small even if all weights are updates (even if there is a huge amount of weights) ? Right ?

gent.spah · April 19, 2024, 4:37am

Thats correct!