Why do we need Activation function

If we have positive values, the ReLU function is identical to Linear Function.
Then, in cases where our inputs are all positive features, and all intial values of weights W and bias b are positive deep neural network is going still linear model isn’t it ?
Even we have very small inputs negative Relu is a semi linear function, the majority of calculs will still linear. Could you explain more in this topic?

The ReLU function is non-linear so a linear combination of those functions will expected to be non-linear.

In training, when the learned function is still linear (but the desired function is non-linear), then the gradient descent (or any other optimization algorithm) will help to move it close to that function.

Hi @Med-akraou

The Relu Acitvation function isn’t linear functions, it’s simple non-linear functions, and if we assume that all the features , and intial values of weights are positive:

  • The optimization algorithm will adjust and tune these values to be more closer to the output
  • we didn’t built the linear regression we built abig complex model to fit complex data like images , sounds with more than 1 hidden layer so the combination of these layer must lead to negative values and the output of relu function will be equal 0

Why we use the simple non-linear activation function this was discussed in this thread by Mentor @paulinpaloalto


Thank you @AbdElRhaman_Fakhry

There is a theorem called universal approximation theorem stating that with the combination of affine mappings (W*A_prev + b) and nonlinear activation functions, such as sigmoid or relu, then the forward network can approximate almost any (some sort of continuous) functions. Without these nonlinear activation functions, it’s hard.

Formal proof needs some maths such as “measure” theory and functional analysis.