ReLu for output layer

If we want to build a regression with non-negative outputs, can ReLu be used as the activation function in the output layer? How does this approach compare with using just a linear activation function?


Hi @kbabu, I see this is your first post, so welcome the the DLS! The short answer to your question is yes. The videos form week 3 on activations functions briefly alludes to that solution. In the statistics literature, this is known as a ‚Äúlimited dependent variable‚ÄĚ model, since non-negative values do not occur. That said, you will soon see that that is not an issue in most deep learning applications.

So, the following is (way) more than you need to know. These models can fall into one of two buckets. The first, is the simplest. The target values (of the ‚Äúdependent variable‚ÄĚ) simply cannot occur, such as negative values for a person‚Äôs weight. Using the ReLU as the output activation results in a ‚Äútruncated‚ÄĚ regression model. One could imagine that the errors in the MSE loss function are randomly distributed according to a Normal (or ‚ÄúGaussian‚ÄĚ) distribution, in which case the part of the distribution (bell-shaped) over values less than zero is simply lopped off (or ‚Äútruncated‚ÄĚ). Since the ReLU activation effectively introduces a nonlinear constraint on the model, it becomes a nonlinear regression model which will be effectively handled by gradient descent optimization. Cool, huh?

Aside: In using linear regression to predict house prices (as in the videos), I suppose that a negative house price could occur in the unusual case that the homeowner pays the buyer to take the house off their hands! Not likely.

The second case, more difficult, is one in which the target values forms a ‚Äúcensored‚ÄĚ sample. For example, one might be interested in knowing the number of tickets demanded for a concert, but we can only measure the number of tickets sold. When the event sells out, however, we do know that the actual number demanded is greater than the number sold. The number of tickets demanded is said to be censored. Here the excess capacity of the arena (empty seats) is bounded below by zero. This is not as easy to address.

If you are interested in the second case, you could start by Googling ‚ÄúTobit regression.‚ÄĚ