Week 3: Gradient Descent Implementation

kathanpatel15195 · March 28, 2026, 2:00pm

For logistic regression The cost function is different than the squared error cost function (used for linear regression). Then how come when computing the gradient descent of logistic regression cost function, the gradient descent of linear regression cost function is calculated. I understand the f(x) is different for linear regression (line function) and for logistic regression (sigmoid function).

GRadient descent for Logistic regression:

Please clarify.

sanjaypsachdev · March 28, 2026, 5:00pm

The reason the update rule looks identical is due to how the Chain Rule interacts with the Sigmoid function.

When we calculate the derivative of the Logistic Cost function d/dwj J(w, b), we break it down into three parts:

How the Cost changes with respect to the Prediction (f).
How the Prediction (f) changes with respect to the Linear sum (z = w.x + b).
How the Linear sum (z) changes with respect to the Weight (wj).

The Step-by-Step Derivation:

If we look at a single training example, the derivatives are:
1. Cost w.r.t Prediction: dJ / df = (f - y) / f (1 - f)
2. Prediction w.r.t z (Sigmoid Derivative): df / dz = f (1 - f)
3. z w.r.t Weight: dz / dwj = xj
When you multiply these together using the Chain Rule:

dJ / dwj = dJ / df . df / dz . dz / dwj

dJ / dwj = ( ( f - y ) / ( f ( 1 - f) ) ) . f ( 1 - f ) . xj

The f (1 - f) terms cancel out perfectly, leaving you with the familiar term:

dJ / dwj = (f - y) xj

Linear Regression Prediction formula - w . x + b (Any real number)
Logistic Regression Prediction formula - 1 / ( 1 + e ^ -( w . x + b ))

If you tried to use the Linear Regression “Squared Error” cost function for Logistic Regression, the resulting math would involve more complex derivatives that are non-convex. This would create a “bumpy” surface with many local minima, making it nearly impossible for Gradient Descent to find the global best solution.

By using the Log Loss function shown in your image, we ensure the gradient simplifies beautifully and the “bowl” remains convex, ensuring we can find the optimal weights.

paulinpaloalto · March 28, 2026, 5:32pm

Here’s another historical thread that shows the calculation of the derivatives in the Logistic Regression/sigmoid/cross entropy loss case.

kathanpatel15195 · March 28, 2026, 6:49pm

Thank you @sanjaypsachdev for the clarification and reminding me the concept of chain rule in derivatives.

kathanpatel15195 · March 28, 2026, 6:58pm

Thank you @paulinpaloalto for further reference.

Topic		Replies	Views
Derivative of "Simplified Cost Function" Supervised ML: Regression and Classification week-module-3	1	598	March 19, 2023
How to get the derivatives of the logistic cost/loss function [TEACHING STAFF] Supervised ML: Regression and Classification week-module-3	18	4245	May 9, 2024
Why the linear regression and classification have identical Gradient Function? Supervised ML: Regression and Classification week-module-2	5	735	February 11, 2023
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-module-3	18	783	June 9, 2024
Why logistic regression is not used to calculate gradient descent Supervised ML: Regression and Classification week-module-3	8	393	May 8, 2024

Week 3: Gradient Descent Implementation

Related topics