My first guess is that there could be a problem in how you are using the training parameter in either the Encoder() class or the EncoderLayer() class - especially with regard to the dropout_ffn() function.
the unit test of EncoderLayer_test(EncoderLayer) has passed. It seems to me it has verified the setting of training parameter for dropout_ffn(). Thank you.
Can someone shed some light for me? All unit tests in the assignment have passed except this particular part in this exercise. It may just be a simple mistake that I am just in deadloop. Thanks in advance
There is error in the encoder grader cell. Assertion is a programming concept that helps us to test an expression. If the expression returns true, the control of the program is moved to the next line. But if the expression returns false, an AssertionError is generated by Python.
If unable to find, send the encoder codes via personal DM. Click on the name and then message.
Note in your case the values are not matching for the last row based on grader codes.
I noticed in the previous grader cell angle code used by you is different but you have got the same result. Use np.power rather than mathematical equation.
There are errors in other grader cells too, causing the current error you are getting. I will go one by one.
ERROR IN UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION scaled_dot_product_attention
def scaled_dot_product_attention(q, k, v, mask):
a. matmul_qk (incorrect code do not transpose k, hint refer the link given below to do the correction)
b. scale matmul_qk (incorrect code for both dk and scaled attention_logits) —incorrect code, refer the link given below
c. softmax is normalized on the last axis (seq_len_k) so that the scores
# add up to 1. ( this code is written correctly but is missing axis)
ERROR IN UNQ_C4 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION EncoderLayer
class EncoderLayer(tf.keras.layers.Layer):
a. INCORRECT CODE (CLEARLY MENTIONS NOT TO USE TRAINING BUT YOU INCLUDED TRAINING INTO YOUR CODE)
calculate self-attention using mha(~1 line).
Dropout is added by Keras automatically if the dropout parameter is non-zero during training
b. REMOVE THIS PART OF THE CODE [0], training=training from skip_x_attention code
c. Kindly use training=training only where it is mentioned to use. the first fan_output doesn’t require you to pass training=training but the next step requires you to use.
d. Again training=training is not required for encoder_layer_output.
ERROR IN UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION
class Encoder(tf.keras.layers.Layer):
a. for the below code use tf.cast rather than tf.constant(this is mentioned before the grader cell)
Scale embedding by multiplying it by the square root of the embedding dimension
I have tried to correct the codes as you suggested but problem still persists. Frankly, I don’t quite get what’s wrong with the code for both dk and scaled attention_logits. But I corrected it to use tensorflow syntax. Again, all test have passed except the persistent wrong values case 3. Pls kindly further advise. I DM you my revised notebook. Thank you