C5-W4-A1 Revision Suggestion

I just finished the final assignment for course 5. It is more difficult than other assignments as fewer hints were provided to the students, but I think they are still reasonable.

Here are a few things I find confusing and also suggestions on possible revision. Please let me know if I misunderstood something.

2.1 - Padding Mask

There is a block of codes that tries demonstrate the effect of the masking

print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9))

The code produces the following output. Note the shape of the two outputs.

tf.Tensor(
[[7.2876644e-01 2.6809821e-01 6.6454901e-04 6.6454901e-04 1.8064314e-03]
 [8.4437378e-02 2.2952460e-01 6.2391251e-01 3.1062774e-02 3.1062774e-02]
 [4.8541026e-03 4.8541026e-03 4.8541026e-03 2.6502505e-01 7.2041273e-01]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[[7.2973627e-01 2.6845497e-01 0.0000000e+00 0.0000000e+00 1.8088354e-03]
  [2.4472848e-01 6.6524094e-01 0.0000000e+00 0.0000000e+00 9.0030573e-02]
  [6.6483547e-03 6.6483547e-03 0.0000000e+00 0.0000000e+00 9.8670328e-01]]

 [[7.3057163e-01 2.6876229e-01 6.6619506e-04 0.0000000e+00 0.0000000e+00]
  [9.0030573e-02 2.4472848e-01 6.6524094e-01 0.0000000e+00 0.0000000e+00]
  [3.3333334e-01 3.3333334e-01 3.3333334e-01 0.0000000e+00 0.0000000e+00]]

 [[0.0000000e+00 0.0000000e+00 0.0000000e+00 2.6894143e-01 7.3105860e-01]
  [0.0000000e+00 0.0000000e+00 0.0000000e+00 5.0000000e-01 5.0000000e-01]
  [0.0000000e+00 0.0000000e+00 0.0000000e+00 2.6894143e-01 7.3105860e-01]]], shape=(3, 3, 5), dtype=float32)

It was quite weird to see the data shape change from (3, 5) to (3, 3, 5) after the masking. It may be better to rewrite the code as

print(tf.keras.activations.softmax(x))

mask = tf.reshape(1 - create_padding_mask(x) * -1.0e9, x.shape)
print(tf.keras.activations.softmax(x + mask))

This produces the output

tf.Tensor(
[[7.2876644e-01 2.6809821e-01 6.6454901e-04 6.6454901e-04 1.8064314e-03]
 [8.4437378e-02 2.2952460e-01 6.2391251e-01 3.1062774e-02 3.1062774e-02]
 [4.8541026e-03 4.8541026e-03 4.8541026e-03 2.6502505e-01 7.2041273e-01]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[0.33333334 0.33333334 0.         0.         0.33333334]
 [0.33333334 0.33333334 0.33333334 0.         0.        ]
 [0.         0.         0.         0.5        0.5       ]], shape=(3, 5), dtype=float32)

Exercise 3 - scaled_dot_product_attention

The shape of the function argument was labeled as (..., seq_len_q, depth).

def scaled_dot_product_attention(q, k, v, mask):
    """
    Arguments:
        q -- query shape == (..., seq_len_q, depth)
        k -- key shape == (..., seq_len_k, depth)
        v -- value shape == (..., seq_len_v, depth_v)
        mask: Float tensor with shape broadcastable 
              to (..., seq_len_q, seq_len_k). Defaults to None.
    """

The use of “…” is a bit confusing and also inconsistent with rest of the assignment. Maybe use (batch_size, seq_len_q, depth) ?

Exercise 8 - Transformer

The function comment was confusing

class Transformer(tf.keras.Model):
    def call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
        """
        Forward pass for the entire Transformer
        Arguments:
            input_sentence -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
                              An array of the indexes of the words in the input sentence
            output_sentence -- Tensor of shape (batch_size, target_seq_len, fully_connected_dim)
                              An array of the indexes of the words in the output sentence

Shouldn’t input_sentence have the shape (batch_size, input_seq_len) as they are batch of character sequences? The same issue also goes for output_sentence

1 Like

Thanks for your suggestions. I’ll look into submitting some change requests.