hello ,
i don’t understand why relu works well as an activation function for hidden layers
because eventhough the function itself is non linear , when i tried to carry out the computations using the relu i ended up with a linear regression model exactly like what sir andrew demonstrates when he explained why we shoud not use linear activations .
i’m very confused about that if someone could clarify this for me i would be very grateful .
Hi @abdou_brk,
You can consider the ReLU function as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the „transition to negative part“ of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function, see also this tread:
Source: Choice of activation function - #8 by Christian_Simonis
Best regards
Christian