C5_W4 Transformer - Flummoxed. Why do we pass the output sentence to the decoder

Just finished the assignment. Was stuck on Exercise 8 Transformer for a while. On one of the topics I found out that on the call to the decoder we need to pass in the output sentence and not the input sentence.

dec_output, attention_weights = self.decoder(output_sentence, enc_output …

This is very unintuitive to me - shouldn’t the decoder be generating the output sentence? Its like giving the student the answer at the start of the exam. What am I missing? Is this code only used for the training phase and not for the eval phase?

Also it would be good if the course spent some time on how to set up the NN architectures based on the problem.

If you look at the diagram of the Transformer, the output of the encoder goes to the input of the decoder.

So “output_sentence” really means the encoder’s output.

Dont quite understand - this is the provided code:

So if this was a french/english transformer. Would output sentence not be in English? It goes directly into the decoder later on in the code:

image

Also as shown in this diagram 3b

image

Hi @Looja_Tuladhar,

I suggest you to watch this video until around 6:00, which will show you when translating, one word is predicted at a time, and all words predicted so far are used as input to the decoder.

I need you to see that, at prediction time, not only do we need the French sentence as input, we also need the Translated English words as input. It is therefore not surprising that at training time, we need both the French sentence and the correct English translation as inputs.

If you wondered, at training time, whether we provide the whole correct English sentence as input, the answer is in the same video starting from around 12:00. I am not going to repeat the lecture here.

I recommend you to watch the whole video again because I think the lecture has explained your question quite well.

Cheers,
Raymond

Thanks - rewatched. So it is for training phase only.

So post training, we would pass in as the output_sentence or create another function?

Hi @Looja_Tuladhar,

So at prediction phase, we don’t give it the true output sentence, but whatever output predicted so far.

I assume you refer “post training” to “prediction phase”. I don’t know what you mean by “create another function”.

Raymond