W4A1 target_sequence_length?

In the decoder part of the transformer model, I see that target_sequence_length is a fixed length. Why is it fixed, given that we usually don’t know how many tokens the target will end up being, especially in machine translation? Thanks!

This was discussed in the earlier section of the notebook about “Masking”. Please have a look at the instructions there. E.g. here’s one key sentence from that section:

When passing sequences into a transformer model, it is important that they are of uniform length. You can achieve this by padding the sequence with zeros, and truncating sentences that exceed the maximum length of your model: