Can 'relu' activation be used in the last layer of a neural network?

Nasim_Deljouyi · January 19, 2024, 11:06pm

Hi learners,

I am working on a regression problem whose inputs and outputs are positive or zero. I am using ‘linear’ activation in the output layer of a neural network, and It predicts negative values for the outputs that actually should be zero. This happens when I use the data on a larger scale, and for a smaller scale of data that doesn’t happen.
Now, I was wondering if I could use ‘relu’ in the output layer instead of ‘linear’ to resolve this issue. Or an appropriate model should predict nonnegative values if even ‘linear’ activation is used?

TMosh · January 19, 2024, 11:39pm

Real-valued outputs can be any value. It depends on what the model learns about the data set. There is no way (nor should you) to force the output values to be limited to the range of the input values.

No, do not use ReLU in the output layer.

If you wish, you can artificially clip the negative values to zero, but I don’t recommend it. That won’t add value. It would only artificially make the outputs look better.

paulinpaloalto · January 19, 2024, 11:52pm

Yes, that’s what you want/need if the output values on your training set are all positive. So this is driven by your loss function, right? In a regression problem, you are typically using some form of Euclidean distance as the loss. Usually the square of the distance is used, because it has nicer mathematical properties in terms of the gradients. So if the label output on a given sample is positive, but the model predicts a negative value for that sample, then the loss is even greater than it would be if you used ReLU to “clip” the negative answer to be zero. So that means the gradients should push even more strongly to correct the model outputs during training in cases that produce negative predictions.

At least that’s what I would intuitively expect, but this is an experimental science: try it both ways (linear activation vs ReLU at the output layer) and see what you notice in terms of how long the training takes and the quality (accuracy) of the results you get with the trained model.

Nasim_Deljouyi · January 20, 2024, 12:56am

Thank you so much for your feedback @paulinpaloalto and @TMosh.

Topic		Replies	Views
ReLu for output layer Neural Networks and Deep Learning coursera-platform	1	592	May 13, 2021
ReLU is used in hidden layers WHY? Advanced Learning Algorithms week-module-2	3	491	May 31, 2023
Can Linear Regression Use Neural Networks? Structuring Machine Learning Projects coursera-platform	7	566	October 22, 2021
Numerical output values (not logical values) Neural Networks and Deep Learning coursera-platform	5	535	July 12, 2022
Best output activation function for limited range cases Advanced Learning Algorithms week-module-2	7	522	September 6, 2022

Can 'relu' activation be used in the last layer of a neural network?

Related topics