ReLu for output layer

kbabu · May 13, 2021, 3:55am

Hello,
If we want to build a regression with non-negative outputs, can ReLu be used as the activation function in the output layer? How does this approach compare with using just a linear activation function?

Thanks

kenb · May 13, 2021, 12:58pm

Hi @kbabu, I see this is your first post, so welcome the the DLS! The short answer to your question is yes. The videos form week 3 on activations functions briefly alludes to that solution. In the statistics literature, this is known as a “limited dependent variable” model, since non-negative values do not occur. That said, you will soon see that that is not an issue in most deep learning applications.

So, the following is (way) more than you need to know. These models can fall into one of two buckets. The first, is the simplest. The target values (of the “dependent variable”) simply cannot occur, such as negative values for a person’s weight. Using the ReLU as the output activation results in a “truncated” regression model. One could imagine that the errors in the MSE loss function are randomly distributed according to a Normal (or “Gaussian”) distribution, in which case the part of the distribution (bell-shaped) over values less than zero is simply lopped off (or “truncated”). Since the ReLU activation effectively introduces a nonlinear constraint on the model, it becomes a nonlinear regression model which will be effectively handled by gradient descent optimization. Cool, huh?

Aside: In using linear regression to predict house prices (as in the videos), I suppose that a negative house price could occur in the unusual case that the homeowner pays the buyer to take the house off their hands! Not likely.

The second case, more difficult, is one in which the target values forms a “censored” sample. For example, one might be interested in knowing the number of tickets demanded for a concert, but we can only measure the number of tickets sold. When the event sells out, however, we do know that the actual number demanded is greater than the number sold. The number of tickets demanded is said to be censored. Here the excess capacity of the arena (empty seats) is bounded below by zero. This is not as easy to address.

If you are interested in the second case, you could start by Googling “Tobit regression.”

Topic		Replies	Views
Can 'relu' activation be used in the last layer of a neural network? AI Discussions ai-discussions	3	846	January 20, 2024
Practice Quiz: Activation Functions Advanced Learning Algorithms week-2	2	622	December 14, 2023
Can Linear Regression Use Neural Networks? Structuring Machine Learning Projects	7	564	October 22, 2021
Linear Activation Function Hidden Layer Neural Networks and Deep Learning	3	570	May 25, 2021
Activation function in NN NLP with Classification and Vector Spaces week-3	3	327	March 30, 2022

ReLu for output layer

Related topics