Help with code for Multivariable Gradient Descent

Flimdejong · January 4, 2024, 3:28pm

Hello

I have a question regarding the lab of week 2, multiple linear regression. In the line: dj_dw[j] = dj_dw[j] + err * X[i, j]. They use the previous value of the gradient of w with respect to j and add to that value upon the existing one. Why?

I can understand it when you update the value when you calculate the final w array but why do they also do it for calculating the gradient? Thanks.

The full code snippet is here below.

{moderator edit: code removed}

pastorsoto · January 4, 2024, 3:52pm

Hi @Flimdejong great question!

In the compute_gradient function you provided, the line dj_dw[j] = dj_dw[j] + err * X[i, j] is an implementation of the gradient computation for multiple linear regression. Let me explain why this approach is used:

Accumulating Gradient Over All Examples: The goal of the compute_gradient function is to compute the gradient of the cost function with respect to each parameter w[j] and the bias b. In the context of multiple linear regression, the cost function is often a mean squared error (MSE) function. The gradient tells us how much the cost function changes with a small change in the parameters.
Batch Gradient Descent: The code implements a form of batch gradient descent. In this method, the gradient is computed over the entire dataset (m examples) before updating the parameters. This is why for each feature j, the algorithm sums up err * X[i, j] for all examples i.
Understanding the err * X[i, j] Term: The term err * X[i, j] is essentially the partial derivative of the cost function with respect to the parameter w[j]. Here, err = (np.dot(X[i], w) + b) - y[i] is the prediction error for the i-th example. Multiplying this error by X[i, j] gives the contribution of the j-th feature of the i-th example to the gradient.
Why Summation is Necessary: By accumulating dj_dw[j] across all examples, we effectively compute the sum of these gradients, which is needed to calculate the mean gradient (since the MSE involves an average over all examples). After completing the loop over all examples, the code then averages the accumulated gradients by dividing dj_dw and dj_db by m. This averaging step is crucial because it ensures that the gradient is representative of the entire dataset, not just a single example.
The Final Update Step: After computing the gradient, the weights w and bias b are typically updated outside this function, in the direction opposite to the gradient (gradient descent step). This is where the actual parameter update happens, using the gradients computed by this function.

So, the reason for the accumulation (the dj_dw[j] = dj_dw[j] + err * X[i, j] step) is to sum up the gradients across all examples before averaging, which aligns with the mathematical formulation of the gradient for the MSE cost function in batch gradient descent.

I hope this helps!

Flimdejong · January 4, 2024, 4:00pm

Ah I see. It is clear to me now. Thanks for your quick and thorough explanation! @pastorsoto

TMosh · January 4, 2024, 4:01pm

@Flimdejong, in the future, please do not post your code on the forum.
That is not allowed by the Code of Conduct.

Topic		Replies	Views
C1_W2_Lab02_Multiple_Variable_Soln, problem with understanding compute_gradient(X, y, w, b) Supervised ML: Regression and Classification week-module-2	5	587	September 2, 2022
Week 2:Optional Lab: C1_W2_Lab02_Multiple_Variable_Soln - Understanding the code Supervised ML: Regression and Classification week-module-2	5	236	April 27, 2024
Why are we adding 'err' in the `dj_dw' while compute gradients for multifeatured linear regression? As we did not do in single variable linear regression? Supervised ML: Regression and Classification week-module-2	2	24	September 27, 2024
Week3 lab 6 Supervised ML: Regression and Classification week-module-3	23	110	April 9, 2025
How would the gradient descent update ( Lab_02) look like with Mean Squared Error Supervised ML: Regression and Classification week-module-2	1	541	June 23, 2022

Help with code for Multivariable Gradient Descent

Related topics