C4_W3, regarding use of "from_logits=True" during training

God_of_Calamity · May 4, 2024, 2:58pm

As we can see in the 1st image we have used “from_logits=True” in loss_object while specifying the loss function, and passed that object to transformer_utils.train_step() method as we can see in image-2.
But when i looked into the “masked_loss()” method in transformer_utils.py, i found that we define the mask as “mask=tf.math.logical_not(tf.math.equal(real, 0))” which i assume should be the case if we had specified “from_logits=False”.

But then the training works, can you explain what is happening here?

Alireza_Saei · May 4, 2024, 7:35pm

Hi @God_of_Calamity

When from_logits=True, it means that the model’s output is raw logits, and the loss function internally applies the softmax function to compute the actual probabilities before calculating the loss.
Now, the mask mask=tf.math.logical_not(tf.math.equal(real, 0)) is still valid because it identifies the padded tokens in the target sequences, regardless of whether logits or probabilities are used. Thus, the training works correctly despite the apparent discrepancy.

Samuellac · October 8, 2024, 9:26am

I think it is pretty wrong since the transformers used used softmax

so the output given is not raw_logits, but probability, so I do think raw_logits should be False

Topic		Replies	Views
C2_W2_SoftMax lab Advanced Learning Algorithms week-2	5	233	March 20, 2024
Week 3 - Assignment - compute_total_loss - try to set from_logits=False Improving Deep Neural Networks: Hyperparameter tun	5	13210	July 23, 2023
Practice quiz: Multiclass Classification Advanced Learning Algorithms week-2	1	532	June 18, 2022
Week 2 - Improved implementation with SoftMax Advanced Learning Algorithms week-2	10	660	December 1, 2023
W2 - Creating the mask for masked_accuracy Exercise 5 NLP with Sequence Models week-2	5	487	March 1, 2024

C4_W3, regarding use of "from_logits=True" during training

Related topics