In the course assignment for neural Machine Translation,

For the decoder part we calculates logits, by applying log_softmax at last, in decoder.

But while training, we calculate the softmax, as we are taking sparseCategoricalCrossEntropy as loss function, isn’t it wrong to do so,

As we are already calculating the probability by taking log_softmax and again we calculate probability by applying probability over probability, by taking softmax in sparseCategoricalCrossEntropy since we are passing logits here?

remember the decoder softmax function uses to compute the logits for every possible word in the vocabulary

where as

This is trying to calculate a translation model using the encoder-decoder classes recalled before with context and target trying to achieve on overall training data.

if you check the two different translator grade cell output, you will notice the difference of one include the context and one does not

```
Tensor of contexts has shape: (64, 14, 256)
```

```
Tensor of sentences to translate has shape: (64, 14)
```

Now notice this

Train the translator (this takes some minutes so feel free to take a break)

trained_translator, history = compile_and_train(translator)

so it is basically trying to train the on the translator model on the untrained data, where as the one you are mention is training on the defined batch which has sentences used to translate English to Portuguese.

So here it is more about attention model than the only the probabilities as the compile_and_train statement uses metric such masked_accuracy and masked_loss to get better result on the training data.

Regards

DP

Not very clear. Still have some questions on this, can we connect on this sometime?

You can ask if any further doubts here, will try my best to address.

No matter what going through this read can also help you

https://community.deeplearning.ai/uploads/short-url/nItXg2jZdMlR412J5Umphm4zFrF.pdf

Regards

DP