I got an AssertionError in Ex8 - Transformer Assignment

Joao_Mesquita · July 13, 2022, 5:17pm

I had almost finished the Transformer Assignment, but I go an AssertionError in the last exercise, and I passed all of the previous exercises:

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 Transformer_test(Transformer, create_look_ahead_mask, create_padding_mask)

~/work/W4A1/public_tests.py in Transformer_test(target, create_look_ahead_mask, create_padding_mask)
286 assert np.allclose(translation[0, 0, 0:8],
287 [0.017416516, 0.030932948, 0.024302809, 0.01997807,
→ 288 0.014861834, 0.034384135, 0.054789476, 0.032087505]), “Wrong values in translation”
289
290 keys = list(weights.keys())

AssertionError: Wrong values in translation

anon57530071 · July 13, 2022, 6:04pm

Assuming that you implemented all previous functions, the most possible case is the selection of sentence. See my annotated version of Transformer overview.

Have you selected appropriate inputs for Encoder and Decoder ?

Joao_Mesquita · July 13, 2022, 7:45pm

Solved, thank you. I called the decoder with the input_sentence instead of the output_sentence.

Joao_Mesquita · July 13, 2022, 7:54pm

Just two more questions that got me puzzled:

Why don’t we use scaled_dot_product_attention anywhere?
I’m a little confused by how we used word embeddings in this assignment since in the video lecture we didn’t go into it. So basically we reduced all that we did relate to word embeddings in week2 assig2 into an Embedded layer from tensorflow? I’m sorry if this question sounds stupid but I’m trying to understand every little detail of this exercise

anon57530071 · July 14, 2022, 1:16am

Here is an overview of “encoder” portion of Transformer.

‘Scaled dot product’ is one of key functions in Transformer, but is part of Keras MultiHeadAttention that we used for this assignment. So, implemented function is not used. But, due to its importance, I suppose it is included in this exercise.

Word embeddings is what we learned in the first exercise in the 2nd week, not the 2nd exercise. It is to represent a meaning of a “word” by a “multi-dimensional vector”. So, if meanings of two words are close, then, vectors should represent similar direction. This is useful to find a similarity of two words, and also represent characteristics of a word.

If you look at the figure above, you see another important data, a positional encoding, which is merged with a word embedding as an input to the Encoder. As a word embedding does not include a position related information, we need to add position information for creating “attentions”. That’s a position encoding. Then, merged data is fed to a multi-head-attention layer, which includes ‘scaled dot product’. (This figure also illustrates how input Q/K/V can be distributed to multiple “heads” in multi-head-attention layer.)

Hope this helps.

Joao_Mesquita · July 14, 2022, 12:59pm

Sry I meant the second week. About word embeddings and position encoding, I know what they are and what they do.

What I’m trying to figure out is if the Embedding layer we imported from TensorFlow Keras does more or less what we accomplished in week 2 ass. 2

Edit; Btw would love to know how did you get those amazing images

anon57530071 · July 14, 2022, 1:42pm

I see your point. And, actually, a good point.

In our exercise for “Word Vectors”, we used the GloVe vector, which is one of commonly used word vectors. (Other famous one is ‘word2vec’, which is also commonly used.)

On the other hand, in this Transformer exercise, we used Keras Embedding. Remember that we imported and set.

from tensorflow.keras.layers import Embedding, MultiHeadAttention, Dense, Input, Dropout, LayerNormalization

self.embedding = Embedding(input_vocab_size, self.embedding_dim)

It is not GloVe vector that we used. This Keras Embedding is “trainable”. So, if we use this in a real project, we need to “train” this, or load weights (embedding matrix) from other technologies like GloVe or word2vec, and make it “not trainable. (trainable=False)”. As this is not a commercial use, but for learning, we just used Keras Embedding with default setting.

Hope this helps.

Joao_Mesquita · July 14, 2022, 1:49pm

Thank you so much for the detailed explanations!

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_v1 : UNQ_C8 Sequence Models coursera-platform	8	792	November 5, 2021
C5-Week 4 UNQ_C8 : Programming Assignment - Transformer - Assertion Error Sequence Models week-module-4 , coursera-platform	5	349	October 18, 2024
C5_W4_A1_Transformer_Subclass_v1 grader output Sequence Models coursera-platform	4	563	February 19, 2022
Stuck in C5_W4_A1_Transformer_Subclass_v1 errors Sequence Models coursera-platform	10	1034	March 4, 2022
C5_W4_A1_Transformer_Subclass_v1- Exercise 8 Sequence Models coursera-platform	7	768	June 11, 2025

I got an AssertionError in Ex8 - Transformer Assignment

Related topics