Loss of Multiclass Classification

saifkhanengr · August 31, 2022, 6:41am

Hello! I hope you are doing well.

I am wondering how we generalize the loss (loss = -logaN for y = N).

From logistic regression:
loss = -loga1 for y = 1
loss = -loga2 for y = 0
We got this form by putting values of y in a complete equation of logistic loss (left side of the attached figure).
Using this, if we put y = 3, loss will be
loss = -3loga1 - (1-3)loga2
loss = -3loga1 +2loga2
I put a2 = 1 - a1 and tried solving it using log properties but did not get the generalizable form (loss = -loga3 for y=3). Kindly someone explains this.

Furthermore, as we know that loss = -loga2 for y = 0 but this is same for y = 2. Kindly clarify this too. I will be thankful to you.

Regards,
Saif Ur Rehman.

TMosh · August 31, 2022, 7:04am

For multiple classes with an NN, there will be multiple output units (one per class). The total cost is the sum of the cost for each output unit. The ‘y’ values are converted into a one-hot representation, so we can use true/false prediction for each output unit.

TMosh · August 31, 2022, 7:09am

This is discussed further in the 5th video in the Multiclass Classification section.

rmwkwok · August 31, 2022, 7:12am

Hello Saif @saifkhanengr,

We do not derive the log loss from any equation on this slide. There are many ways to discuss the origination of the log loss l = -\log{a_n} where y = n, and one of which is by considering maximizing the likelihood of model parameters given the observed data.

Consider y^{(i)} = c_i for sample i and the corresponding model prediction for class c_i is a^{(i)}_{c_i}, then we can say that, given the model which is trained by the observed data, the likelihood function for the model to predict all the observations (training data) is

a^{(1)}_{c_1} \times a^{(2)}_{c_2} \times ... \times a^{(m)}_{c_m} which is basically the joint probability assuming that samples are independent of each other.

A good model will maximize this likelihood, or equivalently will maximize the log version of this likelihood, and therefore we can convert multiplications into additions and yield

\log{a^{(1)}_{c_1}} + \log{a^{(2)}_{c_2}} + ... + \log{a^{(m)}_{c_m}}

which is how we get the “general form” of the log loss for each sample to be \log{a^{(n)}_{c_n}} .

From this form, we can derive the log loss formula for the binary case which is on the L.H.S. of the slide screenshot.

Cheers,
Raymond

saifkhanengr · August 31, 2022, 9:31am

Thanks, Raymond, for correcting me.

Topic		Replies	Views
Loss function for Multiclass classification for Logistics Regression Supervised ML: Regression and Classification week-module-3	4	295	January 17, 2024
Cost function of multi-class classification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	644	August 3, 2021
Loss Function of Week 3 Neural networks topic Neural Networks and Deep Learning coursera-platform	4	667	February 12, 2024
Logistic Regression Cost Function Neural Networks and Deep Learning coursera-platform	1	722	May 12, 2021
Cost Function and Loss Function Supervised ML: Regression and Classification week-module-3	10	849	September 20, 2023

Loss of Multiclass Classification

Related topics