Gradient Descent for Logistic Regression

Aviv_Simionovici · January 15, 2023, 7:26am

Hello and thank you for an excellent course!
My question is regarding the Gradient Descent Implementation function in the Optional lab: Gradient descent for logistic regression.
this is the algorithm:

in the function “def compute_gradient_logistic(X, y, w, b):”
the
dj_dw = np.zeros((n,)) #(n,)
is zeroed only before the outer loop.
question: shouldn’t it be zeroed before each inner loop, that is zeroed for each vector X in the training data?
I think that the correct implementation should include the remarked line:

{code removed by mentor}

Please explain why I am wrong.
Thanks,
Aviv

rmwkwok · January 15, 2023, 12:23pm

With dj_dw = dj_dw/m, you mean to accumulate dj_dw over m samples, how would you then reset it to zero before all samples are iterated through?

Raymond

PS: I removed your code because we can’t share assignment code here.

Kic · January 15, 2023, 12:25pm

Hi @Aviv_Simionovici ,

dj_dw = np.zeros((n,))
is to create the variable dj_dw as a vector of size n ( the number of features).
This variable is updated for each sample and each feature of the sample in the code, so there is no need to create the same variable again and again.

Aviv_Simionovici · January 15, 2023, 1:46pm

thanks @rmwkwok and @Kic - I realized now that the code needs both loops - the outer loop to sum / sigma over i - the training samples and the inner loop to address all j - features. At each training sample i run an inner loop over all features and accumulate the dj_dw[j]. Nested loops mechanism. It doesn’t make sense to zero before the inner loop (or to recreate the variable) - my mistake - I am rusty at coding.

rmwkwok · January 15, 2023, 1:53pm

You are very welcome, @Aviv_Simionovici!

Raymond

Svetlana_Verthein · January 17, 2023, 12:58am

I have a related question
We have:

In Lab 6 of Week 3 (C1_W3_Lab06_Gradient_Descent_Soln) the compute_gradient_logistic function that calculate this partial derivative has the outer loop of i (looping through each training example) with the inner loop of j (the sum of each j-th feature from 1 to n).

I don’t understand - in the formula the j is fixed and the sum is looping by i, but in the lab the i is fixed and the sum is looping by j?
Thank you!

rmwkwok · January 17, 2023, 2:11am

Hello @Svetlana_Verthein,

Let’s think about this:

If we follow exactly the formula you shared with us, what’s going to happen is that the compute_gradient_logistic will accept a parameter called j, and then instead of having an inner loop to iterate through different values of j, we should remove it and just fix the j value as provided, right?
In that case, the compute_gradient_logistic function will only be useful to calculate one j at a time.
But we don’t want to call compute_gradient_logistic as many times as the number of features.
So, instead of passing the value of j in, we use a loop to go over all possible j, so that we only need to call compute_gradient_logistic once, and all the weights’ gradient are found.

Is this clear to you?

Raymond

Svetlana_Verthein · January 17, 2023, 9:26pm

Hello, Raymond, and thank you so much for your answer.
But that’s not what I had in mind:

compute_gradient_logistic doesn’t need to accept j as a parameter
everything in this function stays the same, only the outer loop becomes a j-loop (finding w_j for each one of the feature columns, i.e. w_1 for column of all features x_1, etc for all n features) and the inner loop becomes the i-loop (than we’d exactly follow the formula which specifies sum over all training examples 1 to m for each j).

In other words, in my mind I’d just reverse the j-loop (make it the outer loop) and i-loop (make it the inner loop calculating the sum over all the training examples for that feature) - and it would conform to the formula.
Otherwise it seems we are finding a unique w for each training example, rather than for each feature.
Does it make sense?
Thank you!

rmwkwok · January 17, 2023, 10:29pm

Hello @Svetlana_Verthein,

Switching the order of looping samples and loop features is fine, and I think your approach also makes sense.

Cheers,
Raymond

Jem_Lane · March 26, 2023, 1:17pm

Hi, I have a related question to this thread:

in the def compute_gradient_logistic(X, y, w, b) function, in the For loop over ‘i’,

f_wb_i is equal to the sigmoid of a dot product of X[i],w + b.

Does the dot product not involve multiplication of all x (x0, x1…xn) and w (w0, w1…wn) values for each [i] (example), i.e. all n w’s? If so, why do we need an inner j loop to go through all the n features if the features have all been included with each run of the first (‘i’) loop?

Jem

rmwkwok · March 27, 2023, 1:25am

Hi Jem,

Your understanding that it involves is correct.

There are two ways to do it - (i) via an inner loop, or (ii) do another vector operation. The lab chose the first way, but we can actually implement it without the inner loop. As an exercise, can you figure it out how to do it without the inner loop?

Cheers,
Raymond

Jem_Lane · March 27, 2023, 7:13am

Hi Raymond,

thanks for your reply. Sorry, I was asking why given the dot product step involves multiplication of all x and w values for each [i] example, we need another step to go through all the n features. I’m obviously missing something, but it appears we are multiplying by each w feature twice with these two steps.

Jem

Topic		Replies	Views
C1_W3_Lab09_Regularization_Soln incorrect dj_db Supervised ML: Regression and Classification week-3	6	514	September 3, 2022
C1_W3_Logistic_Regression-Gradient Computation Supervised ML: Regression and Classification week-3	2	535	July 31, 2022
Gradient for Logistic Regression- Error term Supervised ML: Regression and Classification week-3	4	503	January 5, 2023
C1_W3_Logistic_Regression - compute gradient Supervised ML: Regression and Classification week-3	3	522	December 21, 2022
Week3 lab 6 Supervised ML: Regression and Classification week-3	23	79	April 9, 2025

Gradient Descent for Logistic Regression

Related topics