Week 2:Optional Lab: C1_W2_Lab02_Multiple_Variable_Soln - Understanding the code

Hey Folks,
I am currently going through the Week 2 Optional Lab: C1_W2_Lab02_Multiple_Variable_Soln

I am specifically looking at the implementation of compute_gradient() routine.
I have pasted its code below:

def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

In the above code, when computing the descent for each weight parameter for the model, in the inner for loop, we have this line of code:

dj_dw[j] = dj_dw[j] + err * X[i, j]    

What I don’t understand is why are we adding err*X[i,j] to dj_dw[j] (itself)?
Given that when intializing dj_dw above we have set it to 0, would’nt this line of code suffice?

dj_dw[j] = err * X[i, j]    

Or am I missing something?

I have not tried running the model yet with this code change, but even if the results come the same (whichi I expect it to), I am curious of why it was implemented this way

  • Thank you!

Since there are two nested for-loops, that code accumulates the gradients for each feature of each example.

@TMosh thanks for the quick response.
While I get that its accumulating the results in vector dj_dw, in that line of code, its still assigning value to one specific element of the vector. As I have mentioned, the initial value for dj_dw[j] will be 0 (ref to the dj_dw initialization line above). So it still does not explain why the self previous (presumably 0) value of dj_dw is added to itself in that line.

Here is a practical example that uses initialization to 0.

Let’s say you have a list of values, and you want to add them all together.

The list is [1, 2, 3].

If you want to do this using a for-loop, here is the rpocess:

  • Set the initial sum to 0. Call it ‘s’.
  • Then via the for loop, you have this sequence of intermediate results (written out in detail)
  • s = 0 (initial value)
  • s = s + 1, now s =1
  • s = s + 2, now s = 3
  • s = s + 3, now s = 6

yes, I am aware about the for loop and the increments that is happening. My question is not about HOW it works but rather WHY we need it in this case. As I said before dj_dw[j] is initialized to 0 and only even add value iserr + X[i, j] (unless and I missing something here)

I think you are missing something. I can’t quite grasp what though.