Choice of activation function

In addition to the very good answers for illustrative purposes (although it may be obvious to many - but it helped several students when dealing with ReLU for the first time):

Since many neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this also allows to learn highly nonlinear behavior, although the activation function of one neuron itself (as you correctly pointed out) possesses only a piecewise linear activation function in case of ReLU.