Why the linear regression and classification have identical Gradient Function?

rkranjan · February 8, 2023, 7:25am

For the linear regression we identified the cost function to be 1/2m of sum of squares of the differences of actual and predicted values.
Then the gradient function was calculated by taking the partial derivative of this cost function by “w” and by “b” – and we got the gradient equation for w and b.

However, for the classification we identified the cost function which was something along the lines of: 1/m SUM of -yi log f - (1-yi)log(1-f)
Now f is a sigmoid function of wx+b.
So why didn’t we compute the gradient by taking the partial derivative wrt to “w” and “b” of the loss function above?

Instead – both the linear regression and classification problems are reported to have identical gradient function:
dJ/dwj = SIGMA (f - yi) xj

I wish I had the ability to cut-n-paste the screenshots or to capture equations – that would have made by life easier.

rmwkwok · February 8, 2023, 7:32am

Hello @rkranjan,

For both linear regression and logistic regression, their derivatives of cost with respect to w and b happen to have the same form. Did you try to carry out the derivatives yourself? Here is a very similar discussion.

Raymond

rkranjan · February 9, 2023, 6:48pm

Hello Raymond-

Thank you for your response.

Indeed I tried to do the detailed derivative myself. I must have made some mistake somewhere.

It may have been mentioned by Andrew in the course that the detailed mathematical calculation results in the same form. But I missed it.

Should we infer something more general from this? How did the two loss functions – that look so different – ended up giving the identical gradient form? What are the other loss functions that may result in the same outcome?

Thank you once again.

rmwkwok · February 10, 2023, 8:45am

Hello @rkranjan,

I will let you decide whether you want to do the research You might look for some loss functions, and then take the derivative, and see what they end up. Making a table to summarize them would be wonderful. Your call.

However, we can rewrite the cost gradients for linear regression and logistic regression into this

which clearly shows us that the gradients are somehow proportional to the error, which makes a lot of sense, because in the other words, if the error is zero, the gradients are zero. This amazing property aligns with our intuition, doesn’t it?

Certainly it is an interesting fact that they share the same look! However, from their respective loss functions, we could also have a glimpse of that:

Linear regression, z is model prediction.

Logistic regression, p is model prediction.

Even though both of them do have the error term, they don’t actually look similar, do they? Unless we want to engineer a function g such that p = g(z), because in this way,

Logistic regression, where p = g(z)

While we have the freedom to engineer any g, what is better than a g that ends up making the bracketed term be 1? Because:-

we get rid of the denominator
this implies \frac{\partial{p}}{\partial{z}} = p(1-p) which again has a nice property that as p approaches to 1 or 0, this gradient approaches to 0
it gives us the look of linear regression’s

It turns out that if we solve this equation \frac{\partial{p}}{\partial{z}} = p(1-p) by integration, we find that g is just our very familiar sigmoid function. It is only because we choose the sigmoid function as our g, we make the loss gradient of our logistic regression to look very similar to that of the linear regression. While there is no law in the nature that prohitbits any one from choosing another g, but if that person does, their loss gradient will no longer look like the linear regression’s.

Lastly, I don’t claim this was how historically the sigmoid had come into logistic regression, and I have never read that history either. These are just some logical statements.

If you choose to share it with us here, we can take a look!

Cheers,
Raymond

rkranjan · February 10, 2023, 7:47pm

Wow Raymond. Thank you very much for this very insightful response.

rmwkwok · February 11, 2023, 1:41am

You are welcome, @rkranjan!

Topic		Replies	Views
Why gradients are same? Supervised ML: Regression and Classification week-3	1	233	March 26, 2024
Why the derivative of the cost functions used in gradient descent is the same for linear and logistic regression, while the costs function that are derived originally different? Supervised ML: Regression and Classification week-3	2	26	November 17, 2024
Gradient descent for Logistic Regression : Same partial derivative. Why? Supervised ML: Regression and Classification week-3	13	1152	September 11, 2022
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-3	18	695	June 9, 2024
Why isn't the actual loss function for logistic regression not put in place of cost function while implementing gradient descent? Shouldn't the cost function containing the log function be partially differentiated? Supervised ML: Regression and Classification week-3	9	871	October 10, 2022

Why the linear regression and classification have identical Gradient Function?

Related topics