C2_W2_Relu - How the ReLu works?

rmwkwok · July 7, 2022, 8:38am

This post explained why neurons can act differently - because they are initialized to different values.

Then gradient descent guides neurons to change so that the cost is miniimized.

ReLU itself is also a piecewise linear function (it changes direction at x=0), and this property is “inherited” by function that is addition of any number of ReLU functions. For example, you have 2 ReLUs: ReLU(x) and ReLU(x-1).

ReLU(x) turns at x=0, ReLU(x-1) turns at x=1. If you add the two up, the resulting ReLU(x) + ReLU(x-1) will turn at x=0 first, then turn at x=1 again, so the moment to turn is decided by the parameters w and b in ReLU(wx+b), and those parameters are changed by gradient descent.

Raymond

Topic		Replies	Views
Why non-linear activation function Advanced Learning Algorithms week-1	3	484	February 9, 2023
How non linear is ReLU? Neural Networks and Deep Learning coursera-platform	4	782	March 17, 2023
C2_W2_Relu Lab - "Why Non-Linear Activations?" Advanced Learning Algorithms week-2	3	276	February 26, 2024
Week2 relu lab Advanced Learning Algorithms week-2	10	60	May 29, 2025
RELU vs linear activation Supervised ML: Regression and Classification week-3	4	644	February 15, 2023

C2_W2_Relu - How the ReLu works?

Related topics