Im curious, why in this lab do we not use the softmax activation and instead use the linear activation with some sort of tf softmax method afterwards? Wouldn’t it be better to just use the softmax activation along with from_logits=True in the SparseCategoricalCrossentropy function? Thank you for the clarification in advance.
Hi @Ludeke
Welcome to the community!
As described in the lecture and the optional softmax lab, numerical stability is improved if the softmax is grouped with the loss function rather than the output layer during training. This has implications when building the model and using the model.
so like the image below to get more numerical stability is improved, you want to change the output activation layer to linear
and that didn’t calculate the output of activation from a^{1} \ to\ a^{n} the output of it is z^{1} \ to\ z^{n} so you want to use softmax activation function to get the probability of each class like the image below
Regards,
Abdelrahman
Thank you! That clears things up. I think my mistake came from thinking that the stability just came from adding the “from_logits=True”. Got it now!
Hi @AbdElRhaman_Fakhry,
can you explain why numerical stability is improved when using linear activation function not softmax?
Regards,
Rafal
Hi @sis6326
Welcome to the community!
The linear activation function is numerical stability that because the calculation or the output is equal W1X1 +W2X2 …+b so that the calculation is simple and noncompex or non linear. other wise the softmax is non linear activation function it’s mean that the calculations is more complex so the numerical digits may be discard or rounded so that it isn’t effective because the forward and backword propagation need very speed calculations and accurate calculations
Best Regards,
Happy Ramadan
Abdelrahman