Why is the summation removed here?
Because the derivative is taken with respect to just w_j, so all other term will be what?
why is the first summation not removed then?
because it is for samples. w_j interacts with every sample.
You get a term in the derivative for every w_j value, so you still have a summation.
Thanks so much