C5W4 Questions after finish the course

anon76241992 · December 30, 2023, 2:40am

So, I have just finished the course and have a couple of questions.

In the programming exercise A1, I am confused why the input_sentence and output_sentence in the Transformer class should have shapes like this

Arguments:
            input_sentence -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
                              An array of the indexes of the words in the input sentence
            output_sentence -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
                              An array of the indexes of the words in the output sentence

Aren’t they should be a 2D tensor instead as according to the Enecoder class call method

        """
        Forward pass for the Encoder
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len)

Because I think the batch of input should be “encoded” when it gets inside the encoder

In the DecoderLayer class, Why should there be a padding_mask for self.mha2?

# BLOCK 2
        # calculate self-attention using the Q from the first block and K and V from the encoder output. 
        # Dropout will be applied during training
        # Return attention scores as attn_weights_block2 (~1 line) 
        mult_attn_out2, attn_weights_block2 = self.mha2(query=####, 
                                                        value=####, 
                                                        key=####,
                                                        attention_mask=####, 
                                                        return_attention_scores=####)  
                                                        # (batch_size, target_seq_len, embedding_dim)

The shape of Q, K, and V are not the same?

This is really a good course.

balaji.ambresh · December 30, 2023, 4:31am

Posting solution code in a public topic is discouraged and can get your account suspended. It’s okay to share stacktrace on a public post and send code to a mentor via direct message. Please clean up the post.
Here’s the community user guide to get started.

The 0th dimension is always batch size to make better use of the hardware. Please go back through rest of the labs if you have missed this detail. The entire batch of data is encoded ahead of encoding / decoding.
The attention mask tells which positions to pay attention. We don’t want to pay attention to padding tokens and hence the need for the padding mask.
See this in the DecoderLayer#call doc string:

        x -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
       enc_output --  Tensor of shape(batch_size, input_seq_len, embedding_dim)

anon76241992 · December 30, 2023, 5:46am

I have corrected the post, sorry.

Back to your answer NO 3. Sorry, I was talking about the block-level class, not the layer-level class. Let’s focus on the Encoder class first. You can see docstring of call method is like this

"""
        Forward pass for the Encoder
        
        Arguments:
            x -- Tensor of shape (batch_size, input_seq_len)
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
        Returns:
            out2 -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
        """

The x is 2D, though. That’s why I am confused

balaji.ambresh · December 30, 2023, 5:59am

Thanks for clarifying. Encoder class performs embedding inside the call method. So, input to the call method should be 2D (i.e. batch size, sequence length). See this as well.

anon76241992 · December 30, 2023, 6:16am

Yes,sir. That is why I am asking you why the docstring in call method of Transformer class states that the input_sentence and output_sentence should have a 3D shape. Because I don’t think it makes sense to embed the already embedded tensor

"""
        Forward pass for the entire Transformer
        Arguments:
            input_sentence -- Tensor of shape (batch_size, input_seq_len, embedding_dim)
                              An array of the indexes of the words in the input sentence
            output_sentence -- Tensor of shape (batch_size, target_seq_len, embedding_dim)
                              An array of the indexes of the words in the output sentence
            training -- Boolean, set to true to activate
                        the training mode for dropout layers
            enc_padding_mask -- Boolean mask to ensure that the padding is not 
                    treated as part of the input
            look_ahead_mask -- Boolean mask for the target_input
            dec_padding_mask -- Boolean mask for the second multihead attention layer
        Returns:
            final_output -- Describe me
            attention_weights - Dictionary of tensors containing all the attention weights for the decoder
                                each of shape Tensor of shape (batch_size, num_heads, target_seq_len, input_seq_len)
        
        """

Are these errors, or I missed something?

balaji.ambresh · December 30, 2023, 10:07am

You are correct. These lines in the docstring are incrrect.
The staff have been notified to fix the mistake.
Thank you

Topic		Replies	Views
W4 Assignment 1 Exercise 8 Are the input dimensions of our transformer model correct Sequence Models week-4 , coursera-platform	2	249	January 9, 2024
Course 5 Week 4 Assignment Exercise 8 code comment typo? Sequence Models coursera-platform	1	591	July 21, 2021
Course 5 - Week 4 - understanding EncoderLayer dimensions Sequence Models coursera-platform	2	1224	May 14, 2021
#C4W2 - Exercise 4 Transformer error NLP with Attention Models week-2	7	301	February 20, 2024
Wrong comments in the assignment of C4W2 NLP with Attention Models	3	77	June 19, 2024

C5W4 Questions after finish the course

Related topics