I am kind of confused about the argument here, For the transformer exercise, there is an argument called output_sentence, but I don’t know where to put it, is it a part in the decoder? but We already have an input_sentence which is supposed to be the same representation as the output_sentence? or did I miss something?
“input_sentence” is a source to be translated, and fed into the Encoder. In this picture, it is French sentence.
“output_sentence” is a sentence of “Ground Truth” to be used for training in the Decoder. In this picture, it is English sentence.
In the Decoder, an overview of a process flow is;
create a self-attention by using “output_sentence” followed by normalization
calculate similarity of query (Q) by using output (V,K), which is “enc_output”, from the Encoder
One addition about the behavior at the ‘prediction time’. In the training time, we feed “grand truth” to the Decoder. In the prediction (inference) time, output_sentence is actually output from the Decoder. When the Decoder predicts the translated word at time t, the Decoder refers a past sequence of “translated” words until time t-1. In the other words, at time t, output_sequence is actual output sequence from Decoder at time t-1.
Again, this is for the prediction time. Hope this clarifies behaviors at the training time and the prediction time.