Code to generate the S3 checkpoints

Alon_Bochman · August 20, 2023, 4:47pm

In the lab for week2, one of the cells says “Training a fully fine-tuned version of the model would take a few hours on a GPU. To save time, download a checkpoint…” How was this checkpoint generated exactly? What were the training_args? What was the command to save the checkpoint? How long did it take on what kind of AWS instance?

Similarly, in the PEFT section of the lab, one of the cells says “To load a fully trained PEFT model, read a checkpoint of a PEFT model from S3.” Same questions here: How was this checkpoint generated exactly? What were the training_args? What was the command to save the checkpoint? How long did it take on what kind of AWS instance?

Thanks!

chris.favila · September 16, 2023, 1:00am

Hi Alon, and welcome to the community! A learner asked a similar question. Please monitor this topic. I’ll update there if I can get info about this. Thanks!

Andrew75 · December 18, 2023, 10:01am

This has been a long time since this question has been asked. Please can you supply this code? Without it the value of this course is hugely less than it could be.

chris.favila · December 18, 2023, 10:14am

Hi Andrew. Sorry for the late reply. I asked this a while back but unfortunately, the training info was lost. Our partner remembers though that it ran for 5-6 hours on ml.g5.4xlarge to get that performance. I’ll tell them now that we still get questions about it. Perhaps they can find the time to replicate the results and share the settings.

Topic		Replies	Views
Generative AI with Large Language Models fine tuning checkpoint Generative AI with Large Language Models week-module-2	3	192	April 19, 2024
Hyper-parameters of that downloaded instruct_model Generative AI with Large Language Models week-module-2	4	417	August 4, 2023
Re-creating the model from S3 Generative AI with Large Language Models week-module-2	2	442	December 18, 2023
Lab 2. Training params for fine-tuned models from AWS Generative AI with Large Language Models week-module-2	1	188	May 7, 2024
Week2: training args for offline models Generative AI with Large Language Models week-module-2	5	458	August 18, 2023

Code to generate the S3 checkpoints

Related topics