What exactly does the improved implementation of softmax video mean?

rmwkwok · August 24, 2022, 12:18am

Hello @Maxim_Kupfer, thank you for the question!

The log loss value will be calculated but not the sigmoid nor the softmax value. I am going to show you that and please feel free to check my calculation if you would like to. I will use the case of a binary outcome for simplicity, but the core idea is identical.

Given
p = \frac{1}{1+\exp{(-z)}}
Loss = -y\log{p} - (1-y)\log{(1-p)}

Here z is the logit value, and without calculating the sigmoid to get the corresponding value of p, we have the freedom to substitute p into Loss to make simplification that improves numerical stability:

Loss = -y\log{(\frac{1}{1+\exp{(-z)}})} - (1-y)\log{(1-\frac{1}{1+\exp{(-z)}})}
= -y\log{(\frac{1}{1+\exp{(-z)}})} - (1-y)\log{(\frac{\exp{(-z)}}{1+\exp{(-z)}})}
= \log(1+\exp{(-z)})- zy +z

If you calculate Loss in this manner, you will never have calculated p out, agree? Now there is one more trick to improve stability, which is to consider the case where z < 0 and z \ge 0 separately, because in the former case the \exp{(-z)} term can yield a very large value that overflows any floating point data type. This is what we are going to do when z <0:

\log(1+\exp{(-z)})- zy +z = \log{(1+\exp{(z)}))} -zy

which will never yield any exponentially large numbers because .\exp{(z)} \approx 0 when z<<0

In summary, now we have, from our original Loss function,

Loss^{\ge 0} = \log(1+\exp{(-z)}) -zy + z
Loss^{< 0} =\log{(1+\exp{(z)}))} -zy

that will never produce any exponentially large number.

Raymond

Topic		Replies	Views
TensorFlow use of Z3 instead of A3 Improving Deep Neural Networks: Hyperparameter tun	2	623	May 10, 2022
Why doesn't forward_propagation contain the activation values? Improving Deep Neural Networks: Hyperparameter tun	7	500	February 3, 2023
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-2	1	67	June 25, 2024
Question about week 3 assignment Improving Deep Neural Networks: Hyperparameter tun	1	689	August 8, 2022
DLS Course2: Week 3 Exercise 6 (compute_total_loss method) Improving Deep Neural Networks: Hyperparameter tun	15	1831	July 31, 2024

What exactly does the improved implementation of softmax video mean?

Related topics