So, I have just finished the course and have a couple of questions.
- In the programming exercise A1, I am confused why the
input_sentence
and output_sentence
in the Transformer
class should have shapes like this
Arguments:
input_sentence -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
An array of the indexes of the words in the input sentence
output_sentence -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
An array of the indexes of the words in the output sentence
Aren’t they should be a 2D tensor instead as according to the Enecoder
class call
method
"""
Forward pass for the Encoder
Arguments:
x -- Tensor of shape (batch_size, input_seq_len)
Because I think the batch of input should be “encoded” when it gets inside the encoder
- In the
DecoderLayer
class, Why should there be a padding_mask
for self.mha2
?
# BLOCK 2
# calculate self-attention using the Q from the first block and K and V from the encoder output.
# Dropout will be applied during training
# Return attention scores as attn_weights_block2 (~1 line)
mult_attn_out2, attn_weights_block2 = self.mha2(query=####,
value=####,
key=####,
attention_mask=####,
return_attention_scores=####)
# (batch_size, target_seq_len, embedding_dim)
The shape of Q, K, and V are not the same?
This is really a good course.
Posting solution code in a public topic is discouraged and can get your account suspended. It’s okay to share stacktrace on a public post and send code to a mentor via direct message. Please clean up the post.
Here’s the community user guide to get started.
- The 0th dimension is always batch size to make better use of the hardware. Please go back through rest of the labs if you have missed this detail. The entire batch of data is encoded ahead of encoding / decoding.
- The attention mask tells which positions to pay attention. We don’t want to pay attention to padding tokens and hence the need for the padding mask.
- See this in the
DecoderLayer#call
doc string:
x -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
enc_output -- Tensor of shape(batch_size, input_seq_len, embedding_dim)
I have corrected the post, sorry.
Back to your answer NO 3. Sorry, I was talking about the block-level class, not the layer-level class. Let’s focus on the Encoder
class first. You can see docstring of call method is like this
"""
Forward pass for the Encoder
Arguments:
x -- Tensor of shape (batch_size, input_seq_len)
training -- Boolean, set to true to activate
the training mode for dropout layers
mask -- Boolean mask to ensure that the padding is not
treated as part of the input
Returns:
out2 -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
"""
The x is 2D, though. That’s why I am confused
Thanks for clarifying. Encoder
class performs embedding
inside the call
method. So, input to the call
method should be 2D (i.e. batch size, sequence length). See this as well.
Yes,sir. That is why I am asking you why the docstring in call method of Transformer
class states that the input_sentence
and output_sentence
should have a 3D shape. Because I don’t think it makes sense to embed the already embedded tensor
"""
Forward pass for the entire Transformer
Arguments:
input_sentence -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
An array of the indexes of the words in the input sentence
output_sentence -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
An array of the indexes of the words in the output sentence
training -- Boolean, set to true to activate
the training mode for dropout layers
enc_padding_mask -- Boolean mask to ensure that the padding is not
treated as part of the input
look_ahead_mask -- Boolean mask for the target_input
dec_padding_mask -- Boolean mask for the second multihead attention layer
Returns:
final_output -- Describe me
attention_weights - Dictionary of tensors containing all the attention weights for the decoder
each of shape Tensor of shape (batch_size, num_heads, target_seq_len, input_seq_len)
"""
Are these errors, or I missed something?
You are correct. These lines in the docstring are incrrect.
The staff have been notified to fix the mistake.
Thank you