I don’t understand why it considered x_1 (outside the parentheses) for updating w_1 and inside the parentheses considered x for all features? Is not supposed that both x’s the same?

@jas_yousef

Take a look this math and u may understand well all this is only one and two but you can generalize if u understand it well.

To compute f_wb for example (i), we need to use all of the features and weights. This is because the error value is based on the difference from the y(i) value.

To compute the gradients, each weight is independent, we only consider that weight’s feature value to compute its gradients.

This is because the gradients are the partial derivatives of the cost function, where we consider one weight at a time, and all other weights are considered as constants.

Ooh I’m wrong, you know i though he is referring the math behind and where does the X1 come from, so sorry if you are taking the MLS your explanation is better thanks.

Ok this is good

But why for the single weight, we consider single x(1) instead of the whole vector of x

(I mean x(1) outside the parentheses after subtract f_w,b and y)

That’s how the calculus works when you compute the partial derivatives with respect to each weight.