UNQ_C6 I don't understand the overall concept

Hi @Nadle

They are passed to a function as a parameter. For example, at the very first step in # UNQ_C7 they are an empty list, for the second step this list is appended with some (one) token that the next_symbol function produced and so on.
So, in other words, at the start (for as in # UNQ_C7), cur_output_tokens is an empty list, after one step the list should contain 1 element and so on.
But, leaving UNQ_C7 aside, you can pass any sequence of tokens as cur_output_tokens and get the prediction for the next one.

Padding is needed for the model - model expects an array with the certain lengths. In theory, for the batch_size of 1 (as in next_symbol function) this could work without padding.
The formula is presented in “Hints” because a lot of learners find it hard to calculate this quantity without guidance.

The inputs to this weeks model in UNQ_C6 are input_tokens which is tokens from English sentence and the targets are padded_with_batch which during training were tokens from German sentence but now are our predictions (padded).
My longer explanation previously.