Hi, I tried to train a fully fine tuned version of the model, which is named as “instruct_model” in the lab. I set num_train_epochs=5, and max_steps=7787. But my model performs a lot worse than the aws downloaded one, and is only a bit better than the original model. Does anyone know what hyper-parameters were used by the course instructor to train their model? Do I need to change or add other hyperparameters in training_args? I cannot figure out why the model I fine tuned using the same code does not perform.
In the lab (as far as I remember) they dont fine tune with the entire dataset, just a part of it. Also the model to be fine tuned propetly (all its weights) needs to go a lot more epochs than just 5.
The model they let us download using the “!aws …” command was fine tuned using the entire dataset. They said it took several hours to train. Do you know what’s a reasonable epoch number I should use? I want to reproduce their model’s performance.
I dont know and their several hours might be days for you because they might have a lot of computing power!
can we have the paramaters for the full PEFT run. I think that is an important learning/experience…