Week 2 - Improved implementation with SoftMax

paulinpaloalto · November 30, 2023, 3:53pm

They don’t use higher resolution, but it turns out that different algorithms have different properties w.r.t. how rounding errors propagate. Of course we are dealing with exponents of e for the softmax or sigmoid and logarithms for the cross entropy loss. One concrete example is that with either sigmoid or softmax, the values can “saturate” and round to exactly 0. or 1. and then you get NaN for the cost because log(0) is -\infty. When you’re doing both computations together, they can catch that case and just use a number very close to 0. or 1. so that the cost is an actual value.

This is real math. Google “numerical analysis” and once you find a good site, read the section about “error propagation”. Or if you want a concrete example, complete the experiment described on this thread and you’ll see that the answers do differ in the 7th decimal place.

Or you could look at the actual TF code to see what they do. I’ve never actually had the guts to do that, but it is Open Source, right?

Topic		Replies	Views
Question about is_logit Advanced Learning Algorithms week-module-2	30	941	February 17, 2024
Improved implementation of softmax - Neural network training \| Coursera Advanced Learning Algorithms week-module-2	1	69	June 25, 2024
Week 3 compute_cross_entropy_cost Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	489	January 27, 2023
What exactly does the improved implementation of softmax video mean? Advanced Learning Algorithms week-module-2	9	822	August 18, 2023
Numerical correct implementation of softmax Advanced Learning Algorithms week-module-2	6	617	December 24, 2022

Week 2 - Improved implementation with SoftMax

Related topics