C5_W4 Transformer - Flummoxed. Why do we pass the output sentence to the decoder

Looja_Tuladhar · May 17, 2023, 6:31am

Just finished the assignment. Was stuck on Exercise 8 Transformer for a while. On one of the topics I found out that on the call to the decoder we need to pass in the output sentence and not the input sentence.

dec_output, attention_weights = self.decoder(output_sentence, enc_output …

This is very unintuitive to me - shouldn’t the decoder be generating the output sentence? Its like giving the student the answer at the start of the exam. What am I missing? Is this code only used for the training phase and not for the eval phase?

Also it would be good if the course spent some time on how to set up the NN architectures based on the problem.

TMosh · May 17, 2023, 6:56am

If you look at the diagram of the Transformer, the output of the encoder goes to the input of the decoder.

So “output_sentence” really means the encoder’s output.

Looja_Tuladhar · May 17, 2023, 7:02am

Dont quite understand - this is the provided code:

So if this was a french/english transformer. Would output sentence not be in English? It goes directly into the decoder later on in the code:

Looja_Tuladhar · May 17, 2023, 7:05am

Also as shown in this diagram 3b

rmwkwok · May 17, 2023, 10:09am

Hi @Looja_Tuladhar,

I suggest you to watch this video until around 6:00, which will show you when translating, one word is predicted at a time, and all words predicted so far are used as input to the decoder.

I need you to see that, at prediction time, not only do we need the French sentence as input, we also need the Translated English words as input. It is therefore not surprising that at training time, we need both the French sentence and the correct English translation as inputs.

If you wondered, at training time, whether we provide the whole correct English sentence as input, the answer is in the same video starting from around 12:00. I am not going to repeat the lecture here.

I recommend you to watch the whole video again because I think the lecture has explained your question quite well.

Cheers,
Raymond

Looja_Tuladhar · May 17, 2023, 2:14pm

Thanks - rewatched. So it is for training phase only.

So post training, we would pass in as the output_sentence or create another function?

rmwkwok · May 17, 2023, 9:47pm

Hi @Looja_Tuladhar,

So at prediction phase, we don’t give it the true output sentence, but whatever output predicted so far.

I assume you refer “post training” to “prediction phase”. I don’t know what you mean by “create another function”.

Raymond

Topic		Replies	Views
W4E8- confusing about the argument called output_setence Sequence Models coursera-platform	5	527	June 16, 2022
C5 Week 4 understanding transformer forward pass Sequence Models coursera-platform	2	536	May 10, 2022
Inference for NMT NLP with Attention Models week-module-2	11	433	June 23, 2023
Mask Multi Head Attention Sequence Models coursera-platform	5	620	May 2, 2022
C5_W4_A1_Ex- 8_Transformer_UNQ_C8_Wrong values Sequence Models coursera-platform	6	715	October 8, 2022

C5_W4 Transformer - Flummoxed. Why do we pass the output sentence to the decoder

Related topics