C2_W2_Assignment - section 3 / Exercise 5 - clarification

This is not an error, but something that confused me and may confuse others. In the week 2 assignment section “3 - Linear Regression using Gradient Descent” it presents equation 1 which makes sense, with the earlier caveat that “Division by 2
is taken just for scaling purposes.” So far so good…

At first read the partial derivatives in equation 2 made no sense to me - where were they coming from? And why does the 2nd not have a second x? I eventually figured out the derivations but this took time, probably as a) I’m new to partial derivatives, b) the summation distracted me considerably. Explicitly stating the following might help others:

  1. the (mX + b - Y) ^2 part of pdEwrtM is d(mX + b - Y)(mX + b - Y) = (mX + b - Y) (X) + (X) (mX + b - Y) = 2 (mX + b - Y)
  2. the (mX + b - Y) ^2 part of pdEwrtB is d(mX + b - Y)(mX + b - Y) = (mx + b - Y) * 1 + 1 * (mx + b - Y) = 2 (mx + b - Y)

hth,
Jeremy

Hi Jeremy,

let’s review the formulas:

Cost function:
E = \frac{1}{2} \sum_{i=1}^{n} (mX_i + b - Y_i)^2

Partial derivative with respect to ( m ):
\frac{\partial E}{\partial m} = \sum_{i=1}^{n} (mX_i + b - Y_i) \cdot X_i

Partial derivative with respect to ( b ):
\frac{\partial E}{\partial b} = \sum_{i=1}^{n} (mX_i + b - Y_i)

This uses the chain rule to differentiate the squared error term.