In this week’s assignment, Transformer.call function calls encoder once given input sentence, it also calls encoder once given target output.
This make sense to me in the training process as attention matrix can be calculated all at once given the target output and look ahead mask.
My question is, when performing translation (say translate an English sentence into German) using trained transformer, do we also call Transformer.call only once? My understanding is that we generate word by word. So to begin with, we pass an empty output into decoder, and generate first word; then we pass the updated output to the decoder again to generate the second word etc. until EOS is generated. Is this correct?
In this case, don’t we need to call: encoder once to encode original sentence; and decoder multiple times to generate a sentence with multiple words? This logic doesn’t seem to be implemented by any functions of Transformer model.