For Exercise 3 Decoder Test, I got:

Tensor of contexts has shape: (64, 14, 256) Tensor of right-shifted translations has shape: (64, 15) Tensor of logits has shape: (64, 256)

**Expected Output**

```
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)
```

Someone can help me to fix this dimension error?

The Exercise 3 Decoder part snippet of code is:

# The dense layer with logsoftmax activation

```
self.output_layer = tf.keras.layers.Dense(
units=units,
activation= tf.nn.log_softmax
)
```

Hi, Zhiyi_Li2

If I remember correctly, I encountered a similar issue. The problem was related to the input of the embedding layer (edit: or output layer). Perhaps the issue is related to the input you are using for that layerâ€™s parameter.

I hope it proves helpful. Regards

Thanks, part of code snippet is:

x = context

y = self.embedding(target)

x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=None)

x = self.attention(x, y)

x = self.post_attention_rnn(x)

Not sure which step is wrong.

You are mixing up x and y. Moreover, y is not present in the assignment. The line â€śx = contextâ€ť is not part of the task either. Similarly, in the assigment the result of self.embedding is stored in x, not y.

1 Like

Got it: It. I revised my code snippet, the output dimension still not correct:

ensor of contexts has shape: (64, 18, 256) Tensor of right-shifted translations has shape: (64, 14) Tensor of logits has shape: (64, 256)

**Expected Output**

```
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)
```

Here is my code snippet:

```
x = self.embedding(target)
# Pass the embedded input into the pre attention LSTM
# Hints:
# - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)
# - Pass in the state to the LSTM (needed for inference)
x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=None)
# Perform cross attention between the context and the output of the LSTM (in that order)
x = self.attention(context, x) #
# Do a pass through the post attention LSTM
x = self.post_attention_rnn(x)
# Compute the logits
logits = self.output_layer(x)
```

I changed the output layer as:

self.output_layer = tf.keras.layers.Dense(

units =vocab_size,

activation= tf.nn.log_softmax

)

The output looks like:

Tensor of contexts has shape: (64, 17, 256) Tensor of right-shifted translations has shape: (64, 21) Tensor of logits has shape: (64, 12000)

**Expected Output**

```
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)
```

Still not correct.

1 Like

I think problem is caused by post_attention_rnn,

Here is my code snippet:

# The RNN after attention

```
self.post_attention_rnn = tf.keras.layers.LSTM(
units=units,
return_sequences=False
)
```

Output size is not correct.

Hi Zhiyi_Li2,

Your code appears to be in good shape. However, there is a policy regarding displaying code here.

The issue might be in another layer (maybe in the embedding with the parameters). Typically, @arvyzukai and gent.spah are the individuals who assist me with my questions.

If youâ€™d like to get in touch with them, Iâ€™m sure they will help you with your issue.

Best regards.

Thanks, you may be right. I think just a simple dimension issue. Wait for NLP master jump in to help. I will obey code policy not show the code next time.

@arvyzukai Can you help to see what is wrong for this post_attention_rnn:

self.post_attention_rnn = tf.keras.layers.LSTM(

units=units,

return_sequences=False

)

Output size id (64, 256) correct one should be (64, 15, 256)

Hi @Zhiyi_Li2

There is nothing wrong with your `post_attention_rnn`

.

Looking at the dimensions it seams that you lost sequence dimension somewhere. In other words, the problem should lie in your `call()`

implementation. Please pay close attention to code hints and also instructions.

Let me know if you find any of them confusing.

Cheers

I tested by print out before and after post attention function:

# The RNN after attention

```
self.post_attention_rnn = tf.keras.layers.LSTM(
units=units,
return_sequences=False
)
```

Output:

x.shape after attention: (64, 14, 256)

x.shape post attention: (64, 256)

Tensor of right-shifted translations has shape: (64, 14)

Tensor of logits has shape: (64, 12000)

**Expected Output**

```
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)
```

For right-shifted tranlators I did code snippet like:

target_emb = self.embedding(target)

# Pass a batch of sentences to translate from english to portugues

# encoder(to_translate)

x = context

# Pass the embedded input into the pre attention LSTM

# Hints:

# - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)

# - Pass in the state to the LSTM (needed for inference)

target_x, hidden_state, cell_state = self.pre_attention_rnn(target_emb, initial_state=state)

```
# Perform cross attention between the context and the output of the LSTM (in that order)
x = self.attention(x, target_x) #
```

Something wrong for handle target ? I noticed the dimension should be 15 instead of 14.

Something wrong on target operations

Move one step further: After i changed the code as:

self.post_attention_rnn = tf.keras.layers.LSTM(

units=units,

return_sequences=True,

return_state=False

)

The output sequence is much better:

Tensor of contexts has shape: (64, 15, 256) Tensor of right-shifted translations has shape: (64, 14) Tensor of logits has shape: (64, 14, 12000)

**Expected Output**

```
Tensor of contexts has shape: (64, 14, 256)
Tensor of right-shifted translations has shape: (64, 15)
Tensor of logits has shape: (64, 15, 12000)
```

But I noticed still some not aligned: should 15 instead of 14 in dimension size.

Any idea ?

Never mind, all test passed, I think the system has problem on Exercise 4.

Hi @Zhiyi_Li2

Have you passed the Assignment?

The reason for 14 and 15 mismatch should be the use of `context`

(14) and `target`

(15). So make sure you embedd the `target`

in the Exercise 5 decoder (not the `context`

).

Cheers

Simiar problem here. Solved. Assignment doc states â€śPost-attention lstm. Another LSTM layer. For this one you donâ€™t need it to return the state.â€ť True in theory, state information not required. However, return_sequences needs to be True for post_attention_rnn in order to pass checks and tests in assignment (my model is training nowâ€¦)

It doesnâ€™t matter, if you run all the code again, it could become 13,14,15.