Clarification about Course 4 Week 3 HW

Hello,

I’m confused about the homework even though I passed it. In the first part, C4_W3_Assignment, we built an Encoder from scratch which is supposedly the BERT model. In the second part, C4_W3_Assignment_Ungraded_BERT_loss (done in Colab), we took a pretrained model and used it for prediction. The pretrained model was trained on the C4 dataset I suppose. This part is fine.

In the third part, C4_W3_Assignment_Ungraded_T5, the model now is an Encoder-Decoder, correct? I suppose the model is the Encoder model we did in the first part and the Decoder model that we did in week 2 homework put together. Is this correct? However, in this third part of the homework, it says the model was fine-tuned on the SQuaD dataset. So was the T5 model originally trained on the C4 dataset and then fine-tuned on the SQuaD? What were the fine-tuning steps? Or does fine-tuning mean T5 was originally trained on C4 and it was later trained using SQuaD for a different task?

Thanks,
John

Hi John,

As I read it, the T5 model used was trained on the C4 dataset. Then in section 3.1. code is presented to indicate how the SQuAD dataset can be processed in order to be able to perform finetuning. The actual finetuning is not discussed. Then the fine-tuned model is loaded in (section 3.2) and used on an example input.

If you want to see how finetuning of T5 can be done you can have a look here and here. It seems to lie a bit outside the scope of the course, which may be the reason it is not presented in the lab.

Thanks for your reply reinoudosch and for the links. The week 3 videos were really confusing to me. A lot of concepts were packed together and not clearly explained. However, I found the references to be very useful and helpful in clarifying things.