Hello @Maxim_Kupfer, thank you for the question!
The log loss value will be calculated but not the sigmoid nor the softmax value. I am going to show you that and please feel free to check my calculation if you would like to. I will use the case of a binary outcome for simplicity, but the core idea is identical.
Given
p = \frac{1}{1+\exp{(-z)}}
Loss = -y\log{p} - (1-y)\log{(1-p)}
Here z is the logit value, and without calculating the sigmoid to get the corresponding value of p, we have the freedom to substitute p into Loss to make simplification that improves numerical stability:
Loss = -y\log{(\frac{1}{1+\exp{(-z)}})} - (1-y)\log{(1-\frac{1}{1+\exp{(-z)}})}
= -y\log{(\frac{1}{1+\exp{(-z)}})} - (1-y)\log{(\frac{\exp{(-z)}}{1+\exp{(-z)}})}
= \log(1+\exp{(-z)})- zy +z
If you calculate Loss in this manner, you will never have calculated p out, agree? Now there is one more trick to improve stability, which is to consider the case where z < 0 and z \ge 0 separately, because in the former case the \exp{(-z)} term can yield a very large value that overflows any floating point data type. This is what we are going to do when z <0:
\log(1+\exp{(-z)})- zy +z = \log{(1+\exp{(z)}))} -zy
which will never yield any exponentially large numbers because .\exp{(z)} \approx 0 when z<<0
In summary, now we have, from our original Loss function,
Loss^{\ge 0} = \log(1+\exp{(-z)}) -zy + z
Loss^{< 0} =\log{(1+\exp{(z)}))} -zy
that will never produce any exponentially large number.
Raymond