ReLu activation function Vs sigmoid function

Is it possible to use ReLu action function for the output layer rather than sigmoid function

Sigmoid function still returns a non-zero number for some negative input. If you don’t expect the output of the layer before activation to be negative, ReLu should be preferred.

1 Like

Since both have some different characteristics, we usually select the better one to fit to the objectives.
As you know, output from Sigmoid curve is between 0-1. It is good to convert boarder range of data into this “easy-to-understand” range. But, of course, there are some demerits. If an input value is quite large or small, then, the gradient, which is one of important aspect in DL, disappears. The max value of the 1st order derivative (gradient) is small. So, it may be slow in convergence.
ReLU has, as you know, quite unique characteristics. It cuts off negative values and makes it to zero. If you think negative values are quite important (like temperature), it may not work. On the other hand, if input value is positive, it works well especially for hidden layers, since the gradient is constant which helps to reduce the amount of computational efforts. (You will see lot’s of derivatives, partial derivatives, … for back-propagation.) But, of course, this is also a demerit, since the gradient is always 0 for the negative values.
So, we need to choose the right one. Not a simple replacement.

1 Like