Week # 2 | Lab 2 | Using the fine-tuned model

In lab 2, section Perform full fine-tuning, the lab instructions say " Training a fully fine-tuned version of the model would take a few hours on a GPU. To save time, download a checkpoint of the fully fine-tuned model to use in the rest of this notebook. This fully fine-tuned model will also be referred to as the instruct model in this lab."

But what if I want to use the model we fine tuned in the lab. The trainer.train() code executed within minutes somehow, and I tried the following code then to use the trained model:
original_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
instruct_model = AutoModelForSeq2SeqLM.from_pretrained(output_dir, torch_dtype=torch.bfloat16)

But the resulting instruct_model doesn’t work the same way as the downloaded instruct_model.

I’m confused. Why did “trainer.train()” take only few minutes when the instructions say it can take hours? And is the above code not the right way to use the resulting finetuned model? If it’s the right way, why is it not working as well as the downloaded instruct_model?

You are training on a smaller dataset and only for 1 epoch: huggingface_dataset_name = “knkarthick/dialogsum”. The entire fine-tuning consists of a bigger dataset and many more epochs, I would guess.

Here I think you are using the original model that I mentioned above, the instruct_model is a fully fine-tuned model given to you later on in the lab. The original model cannot perform as well as the instruct_model for the reasons above.