Week 1 ReLU vs mod(x)

kenb · February 15, 2022, 2:02pm

Hi, @Rajat_Goyal. Take a look the response by @paulinpaloalto in a similar discussion. But to your first assertion, we do not have that advantage for sigmoid, because its slope is everywhere positive. It is only zero in the limits z \rightarrow -\infty and z \rightarrow +\infty. That’s exactly the problem: gradient descent runs the risk of becoming very slow because it’s spending a lot of time at small and/or large values of z because the gradient is changing only slowly. When it’s pegged at zero for negative z's, there is no change.

As for the modulus (i.e. absolute value) function, I do not have a definitive answer. Just because I have never seen it applied is not a basis for rejecting the idea out of hand for all applications! . Intuitively, the negatively-sloped part, i.e.z<0, “turns the volume down” on the neuron as z increases, rather than up. In gradient descent, the shape of the activation works in concert with the shape of the cost function, so it’s hard for me to imagine the result for the general case. Worth figuring and pondering!

Topic		Replies	Views
ReLU vs Sigmoid function Neural Networks and Deep Learning week-1	2	29	December 24, 2024
Confusion on sigmoid disadvantages in "Choosing Activation Functions" video in C2W2 Advanced Learning Algorithms week-2	5	348	December 3, 2023
Week 1 - importance/advantage of ReLU Neural Networks and Deep Learning	1	528	September 30, 2021
When calculating the derivative of dW, why do you add it to dZ * X across all m training sets? Neural Networks and Deep Learning	3	640	December 18, 2022
Sigmoid Function Intuition AI Discussions	2	62	September 16, 2023

Week 1 ReLU vs mod(x)

Related topics