Summary and the derivations of gradients for linear regression and logistic regression

rmwkwok · March 19, 2023, 10:21am

Hello!

This post first tabulates the gradients for a linear regression problem and a logistic regression problem.

For those who are interested and are familiar with differentiation and chain rule, the derivation steps for the gradients are also shown after the table. For the sake of comparing the derivation steps, I have made the second table.

One important take-away is that the gradients for the linear regression and the logistic regression do look the same which can be proven by the derivation steps.

Cheers,
Raymond

Table 1: model and gradients formula

(Click to enlarge)

Note

the superscript (i) represent the i-th sample

Derivation steps 1: Linear regression

\frac{\partial{J}}{\partial{w_1}} = \frac{\partial}{\partial{w_1}}J = \frac{\partial}{\partial{w_1}}\frac{1}{m}\sum_{i=1}^{m}L^{(i)} = \frac{1}{m}\sum_{i=1}^{m}\frac{\partial}{\partial{w_1}}L^{(i)} = \frac{1}{m}\sum_{i=1}^{m}\frac{\partial{L^{(i)}}}{\partial{w_1}}

By chain rule, we have
Equation [1]: \frac{\partial{J}}{\partial{w_1}} = \frac{1}{m} \sum_{i=1}^{m} \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} \frac{\partial{\hat{y}^{(i)}}}{\partial{z^{(i)}}} \frac{\partial{z^{(i)}}}{\partial{w_1}}

Similarly, we have
Equation [2]: \frac{\partial{J}}{\partial{b}} = \frac{1}{m}\sum_{i=1}^{m} \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} \frac{\partial{\hat{y}^{(i)}}}{\partial{z^{(i)}}} \frac{\partial{z^{(i)}}}{\partial{b}}

There are 4 distinct gradients we need to fill in to the equation [1] and [2]:

Equation [3]: \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} = (\hat{y}^{(i)} - y^{(i)})

Equation [4]: \frac{\partial{\hat{y}^{(i)}}}{\partial{z^{(i)}}} = 1

Equation [5]: \frac{\partial{z^{(i)}}}{\partial{w_1}} = x^{(i)}

Equation [6]: \frac{\partial{z^{(i)}}}{\partial{b}} = 1

By substituting equations [3], [4], [5] into [1], and [3], [4], [6], into [2], we get the \frac{\partial{J}}{\partial{w_1}} and \frac{\partial{J}}{\partial{b}} as listed in the table.

Derivation steps 2: Logistic regression

\frac{\partial{J}}{\partial{w_1}} = \frac{\partial}{\partial{w_1}}J = \frac{\partial}{\partial{w_1}}\frac{1}{m}\sum_{i=1}^{m}L^{(i)} = \frac{1}{m}\sum_{i=1}^{m}\frac{\partial}{\partial{w_1}}L^{(i)} =\frac{1}{m} \sum_{i=1}^{m}\frac{\partial{L^{(i)}}}{\partial{w_1}}

By chain rule, we have
Equation [1]: \frac{\partial{J}}{\partial{w_1}} = \frac{1}{m}\sum_{i=1}^{m} \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} \frac{\partial{\hat{y}^{(i)}}}{\partial{\sigma{(z^{(i)})}}} \frac{\partial{\sigma{(z^{(i)})}}}{\partial{z^{(i)}}} \frac{\partial{z^{(i)}}}{\partial{w_1}}

Similarly, we have
Equation [2]: \frac{\partial{J}}{\partial{b}} = \frac{1}{m} \sum_{i=1}^{m} \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} \frac{\partial{\hat{y}^{(i)}}}{\partial{\sigma{(z^{(i)})}}} \frac{\partial{\sigma{(z^{(i)})}}}{\partial{z^{(i)}}} \frac{\partial{z^{(i)}}}{\partial{b}}

There are 5 distinct gradients we need to fill in to the equation [1] and [2]:

Equation [3]: \frac{\partial{L^{(i)}}}{\partial{\hat{y}^{(i)}}} = -\frac{y^{(i)}}{\hat{y}^{(i)}} + \frac{1-y^{(i)}}{1-\hat{y}^{(i)}} = \frac{\hat{y}^{(i)} - y^{(i)}}{\hat{y}^{(i)}(1-\hat{y}^{(i)})}

Equation [4]: \frac{\partial{\hat{y}^{(i)}}}{\partial{\sigma{(z^{(i)})}}} = 1

Equation [5]: \frac{\partial{\sigma{(z^{(i)})}}}{\partial{z^{(i)}}} = \left(\frac{1}{1+\exp{(-z^{(i)})}}\right)^2\exp{(-z^{(i)})}
\text{ }=\left(\frac{1}{1+\exp{(-z^{(i)})}}\right)^2\left(1+\exp{(-z^{(i)})}-1\right)
\text{ }=\left(\frac{1}{1+\exp{(-z^{(i)})}}\right) - \left(\frac{1}{1+\exp{(-z^{(i)})}}\right)^2
\text{ }=\sigma{(z^{(i)})} - \sigma{(z^{(i)})}^2
\text{ }= \sigma{(z^{(i)})}(1-\sigma{(z^{(i)})})
\text{ }=\hat{y}^{(i)}(1-\hat{y}^{(i)})

Equation [6]: \frac{\partial{z^{(i)}}}{\partial{w_1}} = x^{(i)}

Equation [7]: \frac{\partial{z^{(i)}}}{\partial{b}} = 1

By substituting equations [3], [4], [5], [6] into [1], and [3], [4], [5], [7] into [2], we get the \frac{\partial{J}}{\partial{w_1}} and \frac{\partial{J}}{\partial{b}} as listed in the table.

Table 2: comparing the derivation steps