In lab 2, section Perform full fine-tuning, the lab instructions say " Training a fully fine-tuned version of the model would take a few hours on a GPU. To save time, download a checkpoint of the fully fine-tuned model to use in the rest of this notebook. This fully fine-tuned model will also be referred to as the instruct model in this lab."
But what if I want to use the model we fine tuned in the lab. The trainer.train() code executed within minutes somehow, and I tried the following code then to use the trained model:
original_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
instruct_model = AutoModelForSeq2SeqLM.from_pretrained(output_dir, torch_dtype=torch.bfloat16)
But the resulting instruct_model doesn’t work the same way as the downloaded instruct_model.
I’m confused. Why did “trainer.train()” take only few minutes when the instructions say it can take hours? And is the above code not the right way to use the resulting finetuned model? If it’s the right way, why is it not working as well as the downloaded instruct_model?