Epocs and steps for Lab 2 (Fine-tune a generative AI model)

sigmoidisdead · February 12, 2025, 7:59pm

I’m trying to understand the settings used to train the fully trained model ( 2.2 - Fine-Tune the Model with the Preprocessed Dataset) that is provided later in the lab (the one we use for inference after the short training demonstration). I have a few specific questions:

num_train_epochs: How many epochs were used for the fully trained model?
max_steps: What was the max_steps value used?
per_device_train_batch_size: What batch size was used during training?
Learning Rate: Was the same learning rate used as in the short demonstration version?

Igor_Pereverzev · February 13, 2025, 7:32am

The fully trained model provided later in the lab (used for inference) was trained with the following settings:

num_train_epochs: The fully trained model was likely trained for more epochs: than the short demonstration, but the exact number may vary. If not explicitly stated in the lab, it is typically determined based on convergence.
max_steps: The training was likely conducted until convergence or using a fixed number of steps In many cases, max_steps is set to -1 to allow training for the specified number of epochs.
per_device_train_batch_size: The batch size depends on the hardware used, but it is generally larger than the one in the short demonstration for efficiency.
learning_rat: It may or may not be the same as in the short demo. Often, a lower learning rate is used for fine-tuning, but if not specified, it is safest to assume the same default value.

Try running the model with different parameters, review the results, and analyze how accurate they are. After that, you can experiment further with the settings.

sigmoidisdead · February 13, 2025, 1:54pm

Thanks for the answer, Igor! I understand the training likely went to convergence. However, I’m still very interested in knowing the approximate number of epochs used for the fully trained model provided in the lab. Even a rough range (e.g., “50-60 epochs”) would be very helpful for students to understand the scale difference compared to the single-epoch demonstration. It would be a valuable addition to the lab’s educational content. Don’t you think?

Igor_Pereverzev · February 14, 2025, 8:40am

I think yes it might make sense to give students a full understanding of how the number of epochs and other hyperparameters affect the model and how it was trained. The question is what do we want to achieve with this? Understanding how to work with models and various parameters is still a practical skill and it takes a lot of time

Topic		Replies	Views
Hyper-parameters of that downloaded instruct_model Generative AI with Large Language Models week-module-2	4	424	August 4, 2023
Lab 2 - What training parameters were used to full train the LoRA tuned model? Generative AI with Large Language Models week-module-2	2	86	June 25, 2024
What are the parameters using as arguments in full fine-tuning the google flan model in lab 2? Generative AI with Large Language Models week-module-2	1	305	December 17, 2023
Can you explain TrainingArguments for max_steps vs num_train_epochs? Generative AI with Large Language Models week-module-2	0	729	July 9, 2023
Week 2 Lab - what parameters to use to fully fine-tune the model? (part 2.2) Generative AI with Large Language Models ai-discussions	4	80	March 11, 2025

Epocs and steps for Lab 2 (Fine-tune a generative AI model)

Related topics