Hi,
I found the following typos in the “Programming Assignment: Transformers Architecture with TensorFlow”:
In this section describing sequence lenghts, the final example suddenly has 4 truncated sequences in stead of 3:
which might get vectorized as:
[[ 71, 121, 4, 56, 99, 2344, 345, 1284, 15],
[ 56, 1285, 15, 181, 545],
[ 87, 600]
]
When passing sequences into a transformer model, it is important that they are of uniform length. You can achieve this by padding the sequence with zeros, and truncating sentences that exceed the maximum length of your model:
[[ 71, 121, 4, 56, 99],
[ 2344, 345, 1284, 15, 0],
[ 56, 1285, 15, 181, 545],
[ 87, 600, 0, 0, 0],
]
Sequences longer than the maximum length of five will be truncated
Extra in highlighted below:
5 - Decoder
The Decoder layer takes the K and V matrices generated by the Encoder and in computes the second multi-head attention layer with the Q matrix from the output (Figure 3a).
Finally in the code comment, there is an extra is
class Decoder(tf.keras.layers.Layer):
“”"
The entire Encoder is starts by passing the target input to an embedding layer
and using positional encoding to then pass the output through a stack of