Code to generate the S3 checkpoints

In the lab for week2, one of the cells says “Training a fully fine-tuned version of the model would take a few hours on a GPU. To save time, download a checkpoint…” How was this checkpoint generated exactly? What were the training_args? What was the command to save the checkpoint? How long did it take on what kind of AWS instance?

Similarly, in the PEFT section of the lab, one of the cells says “To load a fully trained PEFT model, read a checkpoint of a PEFT model from S3.” Same questions here: How was this checkpoint generated exactly? What were the training_args? What was the command to save the checkpoint? How long did it take on what kind of AWS instance?

Thanks!

Hi Alon, and welcome to the community! A learner asked a similar question. Please monitor this topic. I’ll update there if I can get info about this. Thanks!

This has been a long time since this question has been asked. Please can you supply this code? Without it the value of this course is hugely less than it could be.

Hi Andrew. Sorry for the late reply. I asked this a while back but unfortunately, the training info was lost. Our partner remembers though that it ran for 5-6 hours on ml.g5.4xlarge to get that performance. I’ll tell them now that we still get questions about it. Perhaps they can find the time to replicate the results and share the settings.