Regarding derivative of ReLu activation function

paulinpaloalto · April 2, 2022, 8:46pm

Hi, Siddhesh.

It’s great that you are doing experiments like this! It’s always a good learning experience when you try to extend the ideas in the course. You’ve discovered something pretty interesting here. I do not have an explanation yet, but my results are the same as yours. I had already implemented this using Z > 0 and got good results. When I tried >= 0, I also get much worse results. So I think your code is correct, but we’ve got an unexplained phenomenon on our hands. Of course the issue is that ReLU is not differentiable at Z = 0, so (as Prof Ng comments in the lecture) you can get around that by just using one of the limit values as the derivative at Z = 0. The really surprising thing is that the choice of > versus >= makes such a big difference. <Update: this analysis is wrong. See the later replies on this thread.>

FWIW here’s another earlier thread about using sigmoid and ReLU for the hidden layer in the Planar Data exercise, but apparently we all used the Z > 0 formulation there for the ReLU derivative.

So the bottom line is that you’ve found something pretty interesting which appears to disagree with the formula Prof Ng shows in the lectures and needs more thought and investigation. Actually my next step is to check how the ReLU derivatives are computed in the Week 4 exercise.

Topic		Replies	Views
Problem in the code given for calculation of dz2 and dz1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	540	January 15, 2022
[C2W2 - Gradient Checking lab] ReLU backprop formula Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	512	November 3, 2022
Week 1 Regularization Backprop code doubt Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	568	June 6, 2021
Week1, programming exercise 2, np.int64(A2 > 0) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	July 22, 2021
Regularization - Prog Assign 2 Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	3	32	August 6, 2024

Regarding derivative of ReLu activation function

Related topics