C4W2_Ex4-Transformer: wrong shape for transformer

Yuichi_Tamano · May 5, 2024, 9:21pm

For ex4-Tranformer implementation, I am getting the correct attention weights but the output of my transformer has shape (1, 7, 13) instead of (1, 7, 350) as expected.

For enc_output, I am passing as parameters: input_sentence, training, and enc_padding_mask

For dec_output, I am passing as parameters: output_sentence, enc_output, training, look_ahead_mask, and dec_padding_mask

Any suggestions on what I’m doing wrong?

Deepti_Prasad · May 6, 2024, 4:09am

Hello @Yuichi_Tamano

Can you share a screenshot of the error you are talking about. Do not post any codes, only post the screenshot of the complete error. If the error log is too lengthy, take two screenshots.

Regards
DP

arvyzukai · May 6, 2024, 5:36am

Hi @Yuichi_Tamano

You’re probably using the wrong variable (most probably in defining/initializing the decoder):

emb_d = 13
target_vocab_size = 350

Cheers

Yuichi_Tamano · May 6, 2024, 2:39pm

This is the output from the test case:

I highlighted where my transformer shape has the wrong dimensions.

When I run the unit tests, I get the following error:

Thanks!

Deepti_Prasad · May 6, 2024, 4:34pm

Hi @Yuichi_Tamano

in the def call for Forward pass for the entire Transformer, for below code line
pass decoder output through a linear layer and softmax (~1 line)
MAKE SURE YOU HAVE USED THE CORRECT LAYER RECALL WHICH GETS YOU THE DECODER OUTPUT FROM THE FINAL LAYER, SEE BELOW HINT

self.final_layer = tf.keras.layers.Dense(target_vocab_size, activation=‘softmax’)

So if you have pass the correct layer for the code line mentioned here then,

then as told by arvy, please check the below code lines

scale embeddings by multiplying by the square root of their dimension
here you are suppose to use tf.math.sqrt to the tf.cast of the self.embedding_dim with the use of correct datatype which you have used tf.float32)

then next code where you are suppose to add the positional encoding to word embedding using self.pos_encoding to [:, :seq_len, :]

if all the above are recalled as per stated. then please share your codes via DM for review.

Regards
DP

Yuichi_Tamano · May 6, 2024, 4:57pm

Ah yes, that was the issue! I was not calling self.final_layer – I didn’t notice this layer was defined and instead was trying to pass dec_output directly to the softmax function.

Once I fixed this, everything worked fine.

Thank you @Deepti_Prasad !

Topic		Replies	Views
C5W4 Questions after finish the course Sequence Models coursera-platform	5	276	December 30, 2023
C5 W4 UNQ_C8 Transformer Sequence Models coursera-platform	2	878	September 30, 2021
C5_W4_Ex- 8_class Transformer(tf.keras.Model) Sequence Models coursera-platform	12	1368	October 18, 2022
C4W2_Assignment - Excercise 4 Transformer NLP with Attention Models week-module-2	13	265	January 29, 2025
Course 5, Week 4: Transformer Class Sequence Models coursera-platform	5	1576	April 25, 2022

C4W2_Ex4-Transformer: wrong shape for transformer

Related topics