C2 W2 softmax lab not using softmax activation

Ludeke · February 5, 2023, 8:53pm

Im curious, why in this lab do we not use the softmax activation and instead use the linear activation with some sort of tf softmax method afterwards? Wouldn’t it be better to just use the softmax activation along with from_logits=True in the SparseCategoricalCrossentropy function? Thank you for the clarification in advance.

AbdElRhaman_Fakhry · February 5, 2023, 9:46pm

Hi @Ludeke
Welcome to the community!

As described in the lecture and the optional softmax lab, numerical stability is improved if the softmax is grouped with the loss function rather than the output layer during training. This has implications when building the model and using the model.
so like the image below to get more numerical stability is improved, you want to change the output activation layer to linear

and that didn’t calculate the output of activation from a^{1} \ to\ a^{n} the output of it is z^{1} \ to\ z^{n} so you want to use softmax activation function to get the probability of each class like the image below

Regards,
Abdelrahman

Ludeke · February 7, 2023, 4:13am

Thank you! That clears things up. I think my mistake came from thinking that the stability just came from adding the “from_logits=True”. Got it now!

sis6326 · March 30, 2023, 1:59am

Hi @AbdElRhaman_Fakhry,
can you explain why numerical stability is improved when using linear activation function not softmax?

Regards,
Rafal

AbdElRhaman_Fakhry · March 30, 2023, 3:25pm

Hi @sis6326
Welcome to the community!

The linear activation function is numerical stability that because the calculation or the output is equal W1X1 +W2X2 …+b so that the calculation is simple and noncompex or non linear. other wise the softmax is non linear activation function it’s mean that the calculations is more complex so the numerical digits may be discard or rounded so that it isn’t effective because the forward and backword propagation need very speed calculations and accurate calculations

Best Regards,
Happy Ramadan
Abdelrahman

Topic		Replies	Views
Softmax implementation Advanced Learning Algorithms week-module-2	6	531	May 11, 2023
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-module-2	1	68	June 25, 2024
Improved Implementation of Softmax - Trouble Understanding the Logic Advanced Learning Algorithms week-module-2	4	39	August 18, 2024
Output layer: why a linear activation function instead of a relu? Advanced Learning Algorithms week-module-2	12	115	July 4, 2024
https://www.coursera.org/learn/advanced-learning-algorithms/lecture/Tyil1/improved-implementation-of-softmax Advanced Learning Algorithms week-module-2	1	45	June 30, 2024

C2 W2 softmax lab not using softmax activation

Related topics