Loss Function of Week 3 Neural networks topic

paulinpaloalto · February 12, 2024, 5:27pm

The formula you show is the cross entropy loss function for a binary (yes/no) classification, in which you would have (of course) only two terms. The first term is the loss in the y = 1 case and the second is the loss for the y = 0 case.

When you generalize to a multiclass problem with more than 2 possible output classes, we switch from sigmoid for the output activation to softmax, which gives us a probability distribution across the various output classes for any given sample. So the loss function is just the generalization of the loss function for the binary case: it is still the same cross entropy calculation, but only one term will be selected by the y label value for each sample. It’s exactly the same formula as the one for the binary case if you think about it in that way.

I forget how much detail Prof Ng goes into in the lectures about explaining that. You should scan through those lectures again. Or there’s a nice lecture on YouTube by Prof Geoff Hinton covering softmax and the cross entropy loss function.

Or if your question is more basic about why the logarithm is used there, it comes from “maximum likelihood estimate” in statistics. Here’s a thread from mentor Raymond that gives a nice intuitive explanation with examples. And here’s a thread that shows the graph of log between 0 and 1.

Topic		Replies	Views
Week 1 questions Sequence Models	1	525	December 26, 2021
Week 1, lab 2, counting labels and weighted loss AI for Medical Diagnosis week-1	3	380	November 10, 2023
Doubt regarding the notation in Softmax Lab Advanced Learning Algorithms week-2	3	381	August 20, 2023
Loss function for multilabel classification Advanced Learning Algorithms week-2	4	646	September 7, 2023
Logistic Regression Cost Function Intuition start around 3:24 Neural Networks and Deep Learning week-2	3	252	March 25, 2024

Loss Function of Week 3 Neural networks topic

Related topics