Why are we adding 'err' in the `dj_dw' while compute gradients for multifeatured linear regression? As we did not do in single variable linear regression?

When creating a post, please add:

1 Like

I’m pretty sure you did use the same method when w was a scalar.

1 Like

@TMosh is right that the fundamental idea of calculating the gradient is the same whether w is a scalar (single-variable) or a vector (multivariable). The difference is in the number of features you’re dealing with.

  • In single-variable linear regression, there’s only one feature, so you multiply the error by that single feature value, X[i] .
  • In multivariable linear regression, you have multiple features, so you multiply the error by each feature X[i, j] (for each j ) and accumulate the gradients separately for each weight.

Here’s how it would look,

\text{err} = (w \cdot X[i] + b) - y[i] ,

in the single-variable case,

\frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} \text{err} \cdot X[i] ,

and the multivariable case,

\frac{\partial J}{\partial w_j}= \frac{1}{m} \sum_{i=1}^{m} \text{err} \cdot X[i,j] .