I have had some questions regarding the activation functions when viewing the lecture: ‘Why do we need activation functions?’ from week2.
My main doubt may come from the fact that I don’t see the full breakdown of all the calculations underneath each unit of each layer.
The first thing I don’t understand is what is the difference of a ReLU and a linear function in the hypothetical case that all z values are positive. Wouldn’t we be in the same case?
Would we be at this moment in a linear regression case?
in case you are in the linear (positive) part of the ReLU function, this just means that the output axon of this neuron is „firing“ and allows the input of the activation b+ \sum_i w_i x_i to pass through as output.
You can consider the ReLU as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the negative part of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function.
Since many (really a lot!) neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this allows to learn highly nonlinear behavior, although the activation function of one neuron itself possesses only a piecewise linear activation function in case of ReLU.
(see also Choice of activation function - #8 by Christian_Simonis)
Thank you all for your help. Now I understand better what is the purpose of the ReLU function. I have some new questions emanating from the optional Lab - ReLU activation.
In this case, is it better to create a new post or continue in this one?