C5_W4_A1_Transformer_Subclass_v1 - failed test

alangrosso · April 22, 2024, 9:15pm

On the notebook, when running the tests, the results are acceptable. But the rating indicates that they are not correct. This is contradictory. Any answer for this?

alangrosso · April 22, 2024, 9:16pm

@Deepti_Prasad?

Deepti_Prasad · April 22, 2024, 9:37pm

Please take a break, will reply back in a while. Thank you for following the instructions for posting query.

Regards
DP

TMosh · April 22, 2024, 10:28pm

Passing the tests in the notebook does not prove your code is perfect. The notebook tests only cover a few basic items.

The grader uses a different set of tests.

Start by focusing on your scaled_dot_product_attention() function.

Tips (common mistakes):

Don’t use any global variables. Only use the local variables for k, q, v, and mask.
Do use the tf.keras.activations.softmax() function.
Do use functions from the tensorflow package, not from numpy.

Deepti_Prasad · April 23, 2024, 12:21am

Hello @alangrosso

Issues in your assignment notebook

UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION scaled_dot_product_attention
def scaled_dot_product_attention

(a) matmul_qk = while you used tf.matmul is correct but the way you transpose k is incorrect, kindly refer the below link to do the correct
tf.linalg.matmul | TensorFlow v2.16.1 and use tf.matmul only and not tf.linalg.matmul

(b)while scaling matmul_qk, your below code is incorrect
dk = k.shape[0]
How to correct, use tensorflow cast to the tensorflow.shape of k to seq_len of -1 while you apply the correct datatype of tf.float32

(c)Then to calculate scaled_attention_logits = matmul_qk / np.sqrt(dk), you are not suppose to use np.sqrt but tf.math.sqrt

(d)Further for the step
add the mask to the scaled tensor, you have applied incorrect mask value,
The boolean mask parameter can be passed in as none or as either padding or look-ahead.
Multiply (1 - mask) by -1e9 before applying the softmax. (YOU HAVE APPLIED MASK * -1e9 WHERE AS you needed to use ((1 - mask) * -1e9)

(e)for the statement softmax is normalized on the last axis (seq_len_k) so that the scores
add up to 1.
You have applied incorrect softmax, you are suppose to use tf.nn.softmax and apply axis of -1

Your UNQ_C4 EncoderLayer has completely incorrect code recall, I am not sure if you changed those or you got it that way as I highly doubt it, in any case I would recommend you to get a fresh copy of assignment notebook and re-do the assignment by referring the previous copy of assignment on where you went wrong.
Also remember for this cell, you are only suppose to apply training only where it is asked and as for my code cell it shows you are only suppose to apply training for ffn_output while applying the dropout layer.
Remember by copying codes from old available GitHub repository might get you test passed but failure on auto-grader, so be careful about where you are seeking referral or help.
for
UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION
class Encoder(tf.keras.layers.Layer):
Scale embedding by multiplying it by the square root of the embedding dimension, this np.sqrt(self.embedding_dim) code is incorrect.
HINT: Scale your embedding by multiplying it by the square root of your embedding dimension. Remember to cast the embedding dimension to data type tf.float32before computing the square root. So use tf.math.sqrt to the tf.cast of self.embedding and apply the correct data type

Pass the encoded embedding through a dropout layer, remembering to use the training parameter to set the model training mode
so training=training for Pass the encoded embedding through a dropout layer

UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION DecoderLayer
class DecoderLayer(tf.keras.layers.Layer)
Again there is a mixup of code lines which again would required a fresh copy and redo of assignment. Only write codes where it is mentioned to write the codes, there was no mention of adding dropout layer to the block 1 but in your codes you have applied.
Similar mistakes for
UNQ_C7 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION Decoder
class Decoder(tf.keras.layers.Layer)
use of np.sqrt incorrect, applying training=training to dropout layer
I am really not sure
UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION Transformer
class Transformer(tf.keras.Model)
if you are using an old copy, no matter what I see difference in arguments for my assignment and yours.

Get a fresh copy and re-do the assignment and please make sure only write codes between ###START AND END CODE HERE and do not add or edit any code lines in assignment notebook.

Feel free to ask if anymore doubts.

Regards
DP

Topic		Replies	Views
Course 5 Week 4 Exercise 3 Sequence Models week-module-4 , coursera-platform	6	42	March 4, 2025
C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention Sequence Models coursera-platform	11	811	August 23, 2021
C5_W4_A1_Transformer_Subclass_v1 Unit 3 scaled_dot_product_attention Error Sequence Models coursera-platform	1	874	August 10, 2021
Stuck in C5_W4_A1_Transformer_Subclass_v1 errors Sequence Models coursera-platform	10	1003	March 4, 2022
C4W2-Exercise1 NLP with Attention Models week-module-2	8	213	May 27, 2024

C5_W4_A1_Transformer_Subclass_v1 - failed test

Related topics