Epocs and steps for Lab 2 (Fine-tune a generative AI model)

I’m trying to understand the settings used to train the fully trained model ( 2.2 - Fine-Tune the Model with the Preprocessed Dataset) that is provided later in the lab (the one we use for inference after the short training demonstration). I have a few specific questions:

num_train_epochs: How many epochs were used for the fully trained model?
max_steps: What was the max_steps value used?
per_device_train_batch_size: What batch size was used during training?
Learning Rate: Was the same learning rate used as in the short demonstration version?

The fully trained model provided later in the lab (used for inference) was trained with the following settings:

num_train_epochs: The fully trained model was likely trained for more epochs: than the short demonstration, but the exact number may vary. If not explicitly stated in the lab, it is typically determined based on convergence.
max_steps: The training was likely conducted until convergence or using a fixed number of steps In many cases, max_steps is set to -1 to allow training for the specified number of epochs.
per_device_train_batch_size: The batch size depends on the hardware used, but it is generally larger than the one in the short demonstration for efficiency.
learning_rat: It may or may not be the same as in the short demo. Often, a lower learning rate is used for fine-tuning, but if not specified, it is safest to assume the same default value.

Try running the model with different parameters, review the results, and analyze how accurate they are. After that, you can experiment further with the settings.

Thanks for the answer, Igor! I understand the training likely went to convergence. However, I’m still very interested in knowing the approximate number of epochs used for the fully trained model provided in the lab. Even a rough range (e.g., “50-60 epochs”) would be very helpful for students to understand the scale difference compared to the single-epoch demonstration. It would be a valuable addition to the lab’s educational content. Don’t you think?

I think yes it might make sense to give students a full understanding of how the number of epochs and other hyperparameters affect the model and how it was trained. The question is what do we want to achieve with this? Understanding how to work with models and various parameters is still a practical skill and it takes a lot of time

1 Like