What cost functions are used for tanh and ReLU?

gaussian · July 9, 2021, 12:05pm

For the sigmoid, we used the negative log-likelihood. Is there a similar approach for deriving good cost functions for tanh and ReLU? Are there convex cost functions for them as well?

Given ReLU’s popularity, I’m especially curious about the most appropriate cost function for ReLU. Would it be just the ordinary squared loss function?

kenb · July 9, 2021, 12:54pm

Hi @gaussian. The optional video “Explanation of the logistic regression cost function” from Week 2, shows how the Principle of Maximum Likelihood can be used to derived that cost function from a Bernoulli distribution probability model. That is the natural distribution for a “Bernoulli trial”, the outcome of an event that has probability p of “success” and probability 1-p of “failure”. That fits very well the conditional probability model of an image being one of a cat, or of something else.

They key here is that we are modeling probabilities in classification tasks, and so one needs to start with a probability model. the tanh and ReLU functions are not candidates for a probability model. (Why?) In the context of an ordinary linear regression model where the output is a continuous variable (on the real line) and the errors are Gaussian, the Maximum Likelihood principle leads to a mean squared error (MSE) cost function.

As for the second part of your question, the ReLU function could be used as the activation in the simple linear regression model if non-negative outputs do not make sense (e.g. house prices). That is, the output is

y = \max\lbrace 0, wx+b\rbrace.

In that case, the errors are typically assumed to have a truncated normal distribution. Maximum Likelihood can be applied here to derive an appropriate cost function. Due to the nonlinearity of the model, it’s probably a bit ugly.

Hope this helps! @kenb

Topic		Replies	Views
Same cost function for tanh fuction at output layer as sigmoid? Neural Networks and Deep Learning	11	640	January 16, 2022
Week 2 Logistic Regression Cost Function Improving Deep Neural Networks: Hyperparameter tun	5	549	March 26, 2022
Cost Function when output layer has activation other than sigmoid Neural Networks and Deep Learning	7	622	March 24, 2022
Explanation of Logistic Regression Cost Function Neural Networks and Deep Learning	2	602	June 21, 2021
Optional Lab: Model Evaluation and Selection - Neural network Advanced Learning Algorithms week-3	2	299	February 13, 2024

What cost functions are used for tanh and ReLU?

Related topics