Why the derivative of the cost functions used in gradient descent is the same for linear and logistic regression, while the costs function that are derived originally different?

In the first course of ml specialization (week #3) for the gradient descent application in the logistic regression, the derivative of the cost function is exactly the same like the one used for linear regression. How the derivative of two totally different cost function ended with exactly the same d/dj used in the gradient descent?

The similarity in the form of the derivatives arises because both linear and logistic regression involve linear combinations of the input features, (X\theta), in their hypothesis functions. The key difference is in the transformation applied to this linear combination:

  1. Linear Regression: Uses the identity function (i.e., no transformation).
  2. Logistic Regression: Uses the logistic (sigmoid) function.

Despite the different transformations, when taking the derivative with respect to (Theta), the chain rule results in expressions that involve the difference between the predicted values and the actual values, scaled by the input features (X). This leads to similar-looking gradient expressions, even though the cost functions themselves are different.

In summary, the derivatives appear similar because they fundamentally represent the gradient of the error with respect to the model parameters, which in both cases involves the difference between predicted and actual values, weighted by the input features.

It’s a happy coincidence of the partial derivatives for the two cost functions. Keep in mind that one uses the sum of the squares of a linear function, and the other is a logarithmic function that includes includes the sigmoid() function.

The math just turns out that the partial derivatives look rather similar.