Activation functions in the hidden layers

My question is :

If we use ReLU in all the hidden layers and sigmoid for the output layer, wouldn’t that be almost like using a normal sigmoid activation function without ANN?

I understood how Linear Activation function in the hidden layers with sigmoid in the output layer would make using an ANN pointless and not to mention waste of resources as mentioned by Prof. Ng.

Wouldn’t replacing Linear activation function with ReLU have the same effect as the former (at least in some cases) ?

Thanks for the response in advance! :slightly_smiling_face:

@bharathithal, welcome to the community.

The key characteristic of a neural network hidden layer is that it must have a non-linear activation function.

An advantage of ReLU is that it is extremely easy to compute compared to the sigmoid.

A disadvantage is that the output is zero for all negative inputs - so nothing is learned from the magnitude of a negative value.

1 Like

Hello @bharathithal

I believe this confusion stems from the thought that the ReLU is a Linear Function - But that is only one half of the story.

The ReLU is Linear in the range [0, \infty] and non-linear in the range [-\infty, + \infty]. While we generally focus only on the [0, \infty] range, the “0” output in the [-\infty, 0] range is just as important, as it silently helps to control the location of the inflection points that are so crucial for the Neural Network to model just about any output function.

1 Like

That clears my doubt! In fact, I had that doubt because I assumed that ReLU is linear function and neglected to consider the range. The lab session that week cleared my doubt as well where there’s a beautiful explanation with graph that kinda gives the intuition as to why ReLU is non-linear.

Thanks for clearing that up! :slight_smile:

You are most welcome @bharathithal :blush:

1 Like