Hi @1492r
You can consider the ReLU function as some kind of „filter“ which passes through positives numbers but blocks everything else to zero. The ability of the neural net to describe and learn non-linear characteristics and cause effects is enabled due the combination of many neurons where the non-linearity is emerging from the „transition to negative part“ of the ReLU function. During the training the „best“ parameters (or weights) can be learned to minimize a cost function, see also this tread:
Best regards
Christian