I understand that deep learning models are highly nonlinear by applying nonlinear activation functions in every layer and neuron. However, the values fed to these activation functions (denoted as Z in the course) seem to be always linear. So if the inputs are x1, x2, and x3,
Z = w1*x1 + w2*x2 + w3*x3 + b
My question is: is there any evidence that introducing non-linearity to Z will enhance NN performance? For example, if Z is defined as:
Z = w1*x1 + w2*x2 + w3*x3 + w4*x1^2 + w5*x1*x3 + b
Or is this kind of non-linearity already accounted for, implicitly, in the NN?
Thank you,
Amir