Softmax implementation

gigaGPT · May 11, 2023, 2:33pm

Why did he scrapped softmax and changed it to linear?

saifkhanengr · May 11, 2023, 2:48pm

It would be better if you give us a link of this video (or at least a video name with the proper week number). So, we can watch and tell you why he changed it to linear.

TMosh · May 11, 2023, 3:24pm

TensorFlow works slightly better if you use a linear output and “from_logits = True”, rather than using softmax directly in the output layer.

Internally, the “from_logits = True” parameter causes the compile process to implement the softmax automatically.

gigaGPT · May 11, 2023, 3:44pm

I didn’t get u…I mean if we change activation from softmax to linear then how will softmax function be implemented?..Can u please elaborate a little more

gigaGPT · May 11, 2023, 3:45pm

Advanced algorithm …week 2…
Name of video-“Improved implementation of softmax”

TMosh · May 11, 2023, 4:05pm

The softmax activation will be implemented by TensorFlow during the compile phase.

Mujassim_Jamal · May 11, 2023, 4:58pm

Hi @gigaGPT,

There are two ways to compute the softmax activation, which are:

When using softmax activation in the last layer:
- The logits (z) are computed in this layer which are the raw, unnormalized outputs.
- Softmax activation is then applied to these logits to obtain the activations (a) for each class.
- The loss can be computed directly using these activations (a) with an appropriate loss function, such as sparse categorical cross-entropy in this case.
When using linear activation in the last layer:
- Similarly, the logits (z) are computed in the last layer which are again the raw, unnormalized outputs.
- Instead of applying softmax activation to obtain activations (a), the logits (z) are used directly.
- Setting from_logits=True in the compile step instructs TensorFlow to internally apply softmax to the logits (z), as Tom described.

So, the only difference between this two approaches is that in the first approach, softmax activation is computed separately before calculating sparse categorical cross-entropy loss. In the second approach, softmax activation is computed within the sparse cross-entropy loss as given in the slide and this approach is suitable because it prevents numerical instability and avoids potential rounding errors that can occur when computing the softmax activation separately.

Topic		Replies	Views
Improved implementation of softmax regression Advanced Learning Algorithms week-2	3	24	July 28, 2024
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-2	1	67	June 25, 2024
Activation function of SoftMax after optimization Advanced Learning Algorithms week-2	5	413	July 23, 2023
Model Output with and without Softmax Activation / from_logits=True Advanced Learning Algorithms week-2	11	479	June 1, 2023
Numerical correct implementation of softmax Advanced Learning Algorithms week-2	6	614	December 24, 2022

Softmax implementation

Related topics