Why we always calculate Z using a linear equation?

In all the examples seen in neural network training & forward propagation why do we always use a linear function to calculate Z

Z = W*X + B          # Linear function of X
Output = Activation_Function(Z)

Can’t we use a non-linear function to calculate Z, like below?

Z = W*(X^2) + B     # Quadratic function: Notice X^2 instead of X
Output = Activation_Function(Z)

Hi @Mukul_Tewathia!

This is an interesting question. I’ve never thought about this before. I just pasted your question in ChatGPT and this is what it reply:

While it is technically possible to use a non-linear function like a quadratic function to calculate Z in a neural network, using linear functions like Z = WX + B is a common and effective approach for several reasons:*

1) Linearity simplifies the learning process: Using linear functions makes the optimization problem of training a neural network much simpler. Linear functions have only one global optimum, which can be found using techniques like gradient descent. Non-linear functions, on the other hand, can have multiple local optima, which can make the optimization process more challenging.

2) Non-linear activation functions already provide non-linearity: In neural networks, the non-linear activation functions like sigmoid, ReLU, or tanh already provide non-linearity to the model. Using non-linear activation functions allows the model to capture complex patterns and relationships between the input and output without needing to use non-linear functions to calculate Z.

3) Efficiency: Linear functions are computationally efficient, and their derivatives can be easily calculated. Using non-linear functions to calculate Z can significantly increase the computational complexity of the model, making it slower and more computationally expensive.

Isn’t it make sense?

Best,
Saif.

Non-linear models are created by adding non-linear combinations of the original features.

This allows us to use simple and easily-computed linear models but still get complex relationships.