Activation function of SoftMax after optimization


In the improved implementation for Softmax lecture, It shows the activation function is been changed from softmax to linear. How does this account for non-linearity which we introduce through softmax to get multiple classes?

All those non-linearity is transferred to the loss function. You don’t just change the output layer’s activation to linear, you also change the configuration of the loss function, and that change in the loss function compensates for the change in the output layer’s activation.

I will only assure you that the non-linearity is always maintained. However, as for how the loss function does that, you need to examine the Tensorflow code yourself.


1 Like

By Changing configuration of Loss Function , do you mean the below line on code :

loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) ?

Can you explain this configuration change as I was not understanding it clearly from the lecture

Yes, it is the from logits.


So when from logits becomes true, is this loss function a linear function and the formula is like the linear cost function (mean squared one) or the Softmax loss optimised loss function which is the log formula one? If it is linear how do we account for the softmax concept?

Also, please explain or share resources in regards with this particular concept where we change it to linear but it is accounting for softmax later and the mathematics behind

1 Like

Everything you would do with an activation and softmax is done automatically when you use from logits true.