Calculation of partial derivative of the cost function for logistic regression

ai_is_cool · February 16, 2025, 5:39pm

Has anyone actually done the maths to calculate the partial derivative of the cost function J(\vec w, b) with respect to w_j?.

I have gone through my calculations twice and I get the same result as Andrew in his Video “Gradient Descent Implementation” but with a minus sign in front of the result.

I taken screenshots of my calculations and I have pasted them here if you need to check them.

Its been many years now since I calculated partial derivatives so I may have made an error somewhere.

Hopefully someone with more recent experience can spot the error.

I see my mistake now. I had written down f(\vec w, b) incorrectly before differentiating it. I had…

f(\vec w, b) = \frac{1}{1 - e^{-z}}

instead of…

f(\vec w, b) =\frac{1}{1 + e^{-z}}

I’ll keep the post here for anyone who is interested in how Andrew arrives at the expression for the partial derivative.

Riya_Parikh17 · February 16, 2025, 6:34pm

That’s nice homework to do to understand algorithm properly. Thank you for sharing it.

ai_is_cool · February 16, 2025, 6:36pm

Isn’t it amazing how the gradient descent algorithm is identical to that for linear regression except for f(\vec w, b)?

TMosh · February 16, 2025, 6:39pm

Yes, it is a remarkable coincidence.

ai_is_cool · February 16, 2025, 10:17pm

Seems like too much of a coincidence given that f(\vec w, b) are so different - one is linear and the other non-linear.

TMosh · February 16, 2025, 11:41pm

Apparently the non-linear log function in the cost equation is counteracted by the nonlinear exponential in the sigmoid function that is part of f(w, b).

rmwkwok · February 17, 2025, 9:28am

and it is also reasonable for loss function’s first derivative to be proportional to error such that it tends to zero when the error tends to zero.

ai_is_cool · February 17, 2025, 10:31am

Can you present this result mathematically, say using…

L(f(\vec w, b), y^{(i)}) = -log(f(\vec w, b))

for y^{(i)} = 1

ai_is_cool · February 17, 2025, 10:35am

I have also noticed that Andrew is missing a constant factor of…

\frac{1}{ln(10)}

in his final result for gradient descent of logistic regression.

rmwkwok · February 17, 2025, 11:44am

That was just an intuition, and not a result of a mathematical deviation.

If the first derivative of the loss was not tend to zero when error tends to 0, then gradient descent would push it away which is not a favourable consequence.

By error, I meant this part of your work → . It was called error because it’s the difference between the truth and the prediction.

Doesn’t this argument sound reasonable to you? I am a Physics graduate, and we always like to discuss intuitive understanding of maths formula, though it’s not always easy to.

Cheers,
Raymond

P.S. you have presented mathematically the result of the loss being proportional to the error

rmwkwok · February 17, 2025, 11:59am

The slides might not have (I havn’t checked all of them) specified the base for the log, but if we think backward from the final result of gradient descent of logistic regression, the base was e.

ai_is_cool · February 17, 2025, 12:16pm

The first derivative of…

L(f(\vec w, b), y^{(i)}) = -log(\frac{1}{1 + e^{-z}})

is not zero when the error…

f(\vec w, b)

equals zero if you think about how…

-log(x)

Approaches \infty as x \to 0, passes through x = 1 and continues to take negative values as x \to \infty.

The first derivative never becomes zero.

rmwkwok · February 17, 2025, 12:17pm

I was talking about the first derivative of the loss with respect to weight.

ai_is_cool · February 17, 2025, 12:19pm

Mathematics nomenclature uses log(x) to mean base-10 which Andrew uses and ln(x) to mean base-e.

rmwkwok · February 17, 2025, 12:24pm

But this is a Machine Learning class, though I don’t know who can tell what Machine Learning nomenclature is regarding the use of log, but this Machine Learning Specialization uses base e.

ai_is_cool · February 17, 2025, 12:24pm

The first derivative never reaches zero with respect to the error or w_j if you think about the “shape” of -log(x) against x.

rmwkwok · February 17, 2025, 12:26pm

Agreed! I should change it from “equal zero” to “tends to zero”.

rmwkwok · February 17, 2025, 12:28pm

I have changed it to “tends to zero” in my previous post.

ai_is_cool · February 17, 2025, 12:28pm

Not “Machine Learning nomenclature”, mathematical nomenclature.

Shouldn’t Andrew be using ln(x) instead of log(x) if base-e logarithms are being used?

rmwkwok · February 17, 2025, 12:31pm

He could, but I am not sure that he should.

Btw, what about the intuition? Does it make sense to you, that as the error tends to zero, the derivative should too?

Topic		Replies	Views
Derivative of the Cost Function in Logistic Regression Supervised ML: Regression and Classification week-3	2	513	August 24, 2023
Why the linear regression and classification have identical Gradient Function? Supervised ML: Regression and Classification week-2	5	695	February 11, 2023
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-3	18	693	June 9, 2024
Misunderstandings On The Analytical Equations of GD In Logistic Regression Supervised ML: Regression and Classification week-3	3	467	January 8, 2023
Gradient Descent for Logistic Regression: Need Help with intuition Supervised ML: Regression and Classification week-3	1	533	August 3, 2022

Calculation of partial derivative of the cost function for logistic regression

Related topics