C4W2_Ex4-Transformer: wrong shape for transformer

For ex4-Tranformer implementation, I am getting the correct attention weights but the output of my transformer has shape (1, 7, 13) instead of (1, 7, 350) as expected.

For enc_output, I am passing as parameters: input_sentence, training, and enc_padding_mask

For dec_output, I am passing as parameters: output_sentence, enc_output, training, look_ahead_mask, and dec_padding_mask

Any suggestions on what I’m doing wrong?

Hello @Yuichi_Tamano

Can you share a screenshot of the error you are talking about. Do not post any codes, only post the screenshot of the complete error. If the error log is too lengthy, take two screenshots.

Regards
DP

Hi @Yuichi_Tamano

You’re probably using the wrong variable (most probably in defining/initializing the decoder):

emb_d = 13
target_vocab_size = 350

Cheers

This is the output from the test case:


I highlighted where my transformer shape has the wrong dimensions.

When I run the unit tests, I get the following error:

Thanks!

Hi @Yuichi_Tamano

in the def call for Forward pass for the entire Transformer, for below code line
pass decoder output through a linear layer and softmax (~1 line)
MAKE SURE YOU HAVE USED THE CORRECT LAYER RECALL WHICH GETS YOU THE DECODER OUTPUT FROM THE FINAL LAYER, SEE BELOW HINT

self.final_layer = tf.keras.layers.Dense(target_vocab_size, activation=‘softmax’)

So if you have pass the correct layer for the code line mentioned here then,

then as told by arvy, please check the below code lines

scale embeddings by multiplying by the square root of their dimension
here you are suppose to use tf.math.sqrt to the tf.cast of the self.embedding_dim with the use of correct datatype which you have used tf.float32)

then next code where you are suppose to add the positional encoding to word embedding using self.pos_encoding to [:, :seq_len, :]

if all the above are recalled as per stated. then please share your codes via DM for review.

Regards
DP

1 Like

Ah yes, that was the issue! I was not calling self.final_layer – I didn’t notice this layer was defined and instead was trying to pass dec_output directly to the softmax function.

Once I fixed this, everything worked fine.

Thank you @Deepti_Prasad !

1 Like