In this lab, the ReLU activation is described as follows:

The “off” or disable feature of the ReLU activation enables models to stitch together linear segments to model complex non-linear functions.

What I am confused about is if every neuron uses the total dataset, so how did the ReLU activation of every neuron in one layer make these segments, or how does it know where to cut the line to turn to the new one?

1 Like

Hello @Zephyrus,

This post explained why neurons can act differently - because they are initialized to different values.

Then gradient descent guides neurons to change so that the cost is miniimized.

ReLU itself is also a piecewise linear function (it changes direction at x=0), and this property is “inherited” by function that is addition of any number of ReLU functions. For example, you have 2 ReLUs: ReLU(x) and ReLU(x-1).

ReLU(x) turns at x=0, ReLU(x-1) turns at x=1. If you add the two up, the resulting ReLU(x) + ReLU(x-1) will turn at x=0 first, then turn at x=1 again, so the moment to turn is decided by the parameters w and b in ReLU(wx+b), and those parameters are changed by gradient descent.

Raymond

3 Likes

Thanks for this clear explanation! @rmwkwok

You are welcome @Zephyrus