Differences between ReLU and linear for positive values

Hi there,

in case you are in the linear (positive) part of the ReLU function, this just means that the output axon of this neuron is „firing“ and allows the input of the activation b+ \sum_i w_i x_i to pass through as output.

You can consider the ReLU as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the negative part of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function.

Since many (really a lot!) neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this allows to learn highly nonlinear behavior, although the activation function of one neuron itself possesses only a piecewise linear activation function in case of ReLU.
(see also Choice of activation function - #8 by Christian_Simonis)