Dimensional size error for C4W1_Assignment Decoder test

For Exercise 3 Decoder Test, I got:


Tensor of contexts has shape: (64, 14, 256) Tensor of right-shifted translations has shape: (64, 15) Tensor of logits has shape: (64, 256)

Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)

Someone can help me to fix this dimension error?
The Exercise 3 Decoder part snippet of code is:


The dense layer with logsoftmax activation

    self.output_layer = tf.keras.layers.Dense(
        units=units,
        activation= tf.nn.log_softmax
    ) 

Hi, Zhiyi_Li2

If I remember correctly, I encountered a similar issue. The problem was related to the input of the embedding layer (edit: or output layer). Perhaps the issue is related to the input you are using for that layer’s parameter.

I hope it proves helpful. Regards

Thanks, part of code snippet is:


x = context
y = self.embedding(target)

x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=None)

x = self.attention(x, y)

x = self.post_attention_rnn(x)


Not sure which step is wrong.

You are mixing up x and y. Moreover, y is not present in the assignment. The line “x = context” is not part of the task either. Similarly, in the assigment the result of self.embedding is stored in x, not y.

1 Like

Got it: It. I revised my code snippet, the output dimension still not correct:
ensor of contexts has shape: (64, 18, 256) Tensor of right-shifted translations has shape: (64, 14) Tensor of logits has shape: (64, 256)

Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)

Here is my code snippet:


      x = self.embedding(target)

    # Pass the embedded input into the pre attention LSTM
    # Hints:
    # - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)
    # - Pass in the state to the LSTM (needed for inference)
    x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=None)

    # Perform cross attention between the context and the output of the LSTM (in that order)
    x = self.attention(context, x) # 

    # Do a pass through the post attention LSTM
    x = self.post_attention_rnn(x)

    # Compute the logits
    logits = self.output_layer(x)

I changed the output layer as:

self.output_layer = tf.keras.layers.Dense(
units =vocab_size,
activation= tf.nn.log_softmax
)
The output looks like:


Tensor of contexts has shape: (64, 17, 256) Tensor of right-shifted translations has shape: (64, 21) Tensor of logits has shape: (64, 12000)

Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)

Still not correct.

1 Like

I think problem is caused by post_attention_rnn,


Here is my code snippet:

The RNN after attention

    self.post_attention_rnn = tf.keras.layers.LSTM(
        units=units,
        return_sequences=False
    )  

Output size is not correct.

Hi Zhiyi_Li2,

Your code appears to be in good shape. However, there is a policy regarding displaying code here.

The issue might be in another layer (maybe in the embedding with the parameters). Typically, @arvyzukai and gent.spah are the individuals who assist me with my questions.

If you’d like to get in touch with them, I’m sure they will help you with your issue.

Best regards.

Thanks, you may be right. I think just a simple dimension issue. Wait for NLP master jump in to help. I will obey code policy not show the code next time.

@arvyzukai Can you help to see what is wrong for this post_attention_rnn:


self.post_attention_rnn = tf.keras.layers.LSTM(
units=units,
return_sequences=False
)


Output size id (64, 256) correct one should be (64, 15, 256)

Hi @Zhiyi_Li2

There is nothing wrong with your post_attention_rnn.

Looking at the dimensions it seams that you lost sequence dimension somewhere. In other words, the problem should lie in your call() implementation. Please pay close attention to code hints and also instructions.

Let me know if you find any of them confusing.
Cheers

I tested by print out before and after post attention function:


The RNN after attention

    self.post_attention_rnn = tf.keras.layers.LSTM(
        units=units,
        return_sequences=False
    )  

Output:
x.shape after attention: (64, 14, 256)
x.shape post attention: (64, 256)

Tensor of right-shifted translations has shape: (64, 14)
Tensor of logits has shape: (64, 12000)

Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)

For right-shifted tranlators I did code snippet like:


target_emb = self.embedding(target)
# Pass a batch of sentences to translate from english to portugues
# encoder(to_translate)
x = context
# Pass the embedded input into the pre attention LSTM
# Hints:
# - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)
# - Pass in the state to the LSTM (needed for inference)
target_x, hidden_state, cell_state = self.pre_attention_rnn(target_emb, initial_state=state)

      # Perform cross attention between the context and the output of the LSTM (in that order)
    x = self.attention(x, target_x) #

Something wrong for handle target ? I noticed the dimension should be 15 instead of 14.

Something wrong on target operations

Move one step further: After i changed the code as:


self.post_attention_rnn = tf.keras.layers.LSTM(
units=units,
return_sequences=True,
return_state=False
)
The output sequence is much better:
Tensor of contexts has shape: (64, 15, 256) Tensor of right-shifted translations has shape: (64, 14) Tensor of logits has shape: (64, 14, 12000)

Expected Output
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)

But I noticed still some not aligned: should 15 instead of 14 in dimension size.
Any idea ?

Never mind, all test passed, I think the system has problem on Exercise 4.

Correct: Exercise 3.

Hi @Zhiyi_Li2

Have you passed the Assignment?

The reason for 14 and 15 mismatch should be the use of context (14) and target (15). So make sure you embedd the target in the Exercise 5 decoder (not the context).

Cheers

Simiar problem here. Solved. Assignment doc states “Post-attention lstm. Another LSTM layer. For this one you don’t need it to return the state.” True in theory, state information not required. However, return_sequences needs to be True for post_attention_rnn in order to pass checks and tests in assignment (my model is training now…)

It doesn’t matter, if you run all the code again, it could become 13,14,15.