Hi, I have few doubts regarding the pretraining and working of T5 transformer.
- In the masked language model approach of pre-training what are the inputs to the encoder and decoder of the model? How is the loss calculated?
Younes said that the input to the model is original text but with few tokens masked with special mask tokens and target being the masked tokens delimited by the special tokens, but I did not understand the flow of the pre-training for t5, so can someone elaborate a bit more so that I can get a better intuition of it?
- How does t5 model handles different types of tasks? How was the pre training of supervised tasks like regression done in text-to-text format?