AlexNet Paper Question

Moutasem_Akkad · May 22, 2022, 4:05pm

Hi,

What do they mean by the word “Saturation” in the AlexNet Paper? It has been used few times. Does it mean unnromalized?

paulinpaloalto · May 22, 2022, 4:18pm

The term “saturation” in that context is referring to the way the “tails” of the activation functions tanh and sigmoid flatten out and are asymptotic to horizontal lines as |z| \rightarrow \infty. Of course mathematically sigmoid(z) is never exactly equal to 0 or 1, but in floating point it can round to 0 or 1. In 64 bit floating point, it only takes z > 36 (or somewhere close to that) to “saturate” meaning to round to 1. On the negative side, you have to go quite a bit further. There are two problems with “saturation”: the simple one is that if you get \hat{y} = 1, then the loss function is invalid and gives you NaN as the output. You can fix that by checking for the saturation case and subtracting a very small \epsilon. The harder problem to fix is that the gradients are so close to zero in those cases that convergence takes forever.

There is a connection with normalization in the sense that (in general) you’re more likely to have saturation problems with non-normalized inputs (e.g. images with uint8 pixel values instead of pixel values “standardized” to range of [0,1] or [-1, 1]). Normalization can help you get away from those issues. The other important thing they discuss in the paper is to use ReLU for the hidden layer activations. Of course you’re still stuck at the output layer with softmax (which is just the higher dimensional equivalent of sigmoid), but not taking the products of lots of small gradients in the hidden layers makes the problem easier to cope with.

Topic		Replies	Views
Course 1 Week 3 Neural Networks and Deep Learning coursera-platform	1	588	June 27, 2021
Image log classification problem AI Discussions	10	93	November 21, 2021
Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	634	July 1, 2021
Triplet Loss CORRECTION necessity Convolutional Neural Networks coursera-platform	2	536	September 15, 2021
Week 3 normalization 2 questions Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	558	January 9, 2022

AlexNet Paper Question

Related topics