C4_W3, regarding use of "from_logits=True" during training




As we can see in the 1st image we have used “from_logits=True” in loss_object while specifying the loss function, and passed that object to transformer_utils.train_step() method as we can see in image-2.
But when i looked into the “masked_loss()” method in transformer_utils.py, i found that we define the mask as “mask=tf.math.logical_not(tf.math.equal(real, 0))” which i assume should be the case if we had specified “from_logits=False”.

But then the training works, can you explain what is happening here?

Hi @God_of_Calamity

When from_logits=True, it means that the model’s output is raw logits, and the loss function internally applies the softmax function to compute the actual probabilities before calculating the loss.
Now, the mask mask=tf.math.logical_not(tf.math.equal(real, 0)) is still valid because it identifies the padded tokens in the target sequences, regardless of whether logits or probabilities are used. Thus, the training works correctly despite the apparent discrepancy.

1 Like

I think it is pretty wrong since the transformers used used softmax


so the output given is not raw_logits, but probability, so I do think raw_logits should be False