Just finished the assignment. Was stuck on Exercise 8 Transformer for a while. On one of the topics I found out that on the call to the decoder we need to pass in the output sentence and not the input sentence.
This is very unintuitive to me - shouldn’t the decoder be generating the output sentence? Its like giving the student the answer at the start of the exam. What am I missing? Is this code only used for the training phase and not for the eval phase?
Also it would be good if the course spent some time on how to set up the NN architectures based on the problem.
I suggest you to watch this video until around 6:00, which will show you when translating, one word is predicted at a time, and all words predicted so far are used as input to the decoder.
I need you to see that, at prediction time, not only do we need the French sentence as input, we also need the Translated English words as input. It is therefore not surprising that at training time, we need both the French sentence and the correct English translation as inputs.
If you wondered, at training time, whether we provide the whole correct English sentence as input, the answer is in the same video starting from around 12:00. I am not going to repeat the lecture here.
I recommend you to watch the whole video again because I think the lecture has explained your question quite well.