Hey Folks,
I am currently going through the Week 2 Optional Lab: C1_W2_Lab02_Multiple_Variable_Soln
I am specifically looking at the implementation of compute_gradient()
routine.
I have pasted its code below:
def compute_gradient(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape #(number of examples, number of features)
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_db, dj_dw
In the above code, when computing the descent for each weight parameter for the model, in the inner for loop, we have this line of code:
dj_dw[j] = dj_dw[j] + err * X[i, j]
What I don’t understand is why are we adding err*X[i,j]
to dj_dw[j]
(itself)?
Given that when intializing dj_dw
above we have set it to 0
, would’nt this line of code suffice?
dj_dw[j] = err * X[i, j]
Or am I missing something?
I have not tried running the model yet with this code change, but even if the results come the same (whichi I expect it to), I am curious of why it was implemented this way
- Thank you!