This is an interesting question. I’ve never thought about this before. I just pasted your question in ChatGPT and this is what it reply:
While it is technically possible to use a non-linear function like a quadratic function to calculate Z in a neural network, using linear functions like Z = WX + B is a common and effective approach for several reasons:*
1) Linearity simplifies the learning process: Using linear functions makes the optimization problem of training a neural network much simpler. Linear functions have only one global optimum, which can be found using techniques like gradient descent. Non-linear functions, on the other hand, can have multiple local optima, which can make the optimization process more challenging.
2) Non-linear activation functions already provide non-linearity: In neural networks, the non-linear activation functions like sigmoid, ReLU, or tanh already provide non-linearity to the model. Using non-linear activation functions allows the model to capture complex patterns and relationships between the input and output without needing to use non-linear functions to calculate Z.
3) Efficiency: Linear functions are computationally efficient, and their derivatives can be easily calculated. Using non-linear functions to calculate Z can significantly increase the computational complexity of the model, making it slower and more computationally expensive.