As we can see in the 1st image we have used “from_logits=True” in loss_object while specifying the loss function, and passed that object to transformer_utils.train_step() method as we can see in image-2.
But when i looked into the “masked_loss()” method in transformer_utils.py, i found that we define the mask as “mask=tf.math.logical_not(tf.math.equal(real, 0))” which i assume should be the case if we had specified “from_logits=False”.
But then the training works, can you explain what is happening here?
Hi @God_of_Calamity
When from_logits=True
, it means that the model’s output is raw logits, and the loss function internally applies the softmax function to compute the actual probabilities before calculating the loss.
Now, the mask mask=tf.math.logical_not(tf.math.equal(real, 0))
is still valid because it identifies the padded tokens in the target sequences, regardless of whether logits or probabilities are used. Thus, the training works correctly despite the apparent discrepancy.
1 Like
I think it is pretty wrong since the transformers used used softmax
so the output given is not raw_logits, but probability, so I do think raw_logits should be False