C1_W2_Lab02_Multiple_Variable_Soln, problem with understanding compute_gradient(X, y, w, b)

In the C1_W2_Lab02_Multiple_Variable_Soln, in the part of compute gradient we have the following function:

def compute_gradient(X, y, w, b): 
    Computes the gradient for linear regression 
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
    return dj_db, dj_dw

Since the derivatives are a cumulate sum over all examples, shouldn’t it be
dj_dw[j] += err * X[i, j] and then divide by m? I having trouble associating this code with the provided formulas. Can someone help me?

The code you propose would add dj_dw[j] to itself on every iteration of j, resulting in the gradients being 2 times too large.

You can either use “q += …”, or “q = q + …”, but not both at the same time.

I correct the code, in my option it should be dj_dw[j] += err * X[i, j] the same for b, dj_db += err . I’ve changed, and got the same results

I’m just having a hard time to understand the code how it was written. I can’t associate it with the original formula

OMG! I’m feeling dum right now hahahaah. I have just understood what you said. The cumulative sum is already happening, I just did not paid attention to the code. The inner for loop can also be changed to dj_dw += err * X[i, :] to update the entire vector all at once. It also yields the same result.

You can also omit the for-loop over the features, and compute that using a matrix product.