Why is the summation removed here?

Because the derivative is taken with respect to just w_j, so all other term will be what?

Raymond

why is the first summation not removed then?

because it is for samples. w_j interacts with every sample.

You get a term in the derivative for every w_j value, so you still have a summation.

