Using per-item features


First, let me thank you for the formation you made, it is clear, interesting from the start to the end, put in perspective with real problems, and I applaude your team for this.

I have a small issue with the Using per-item features video on this slide right here:

Andrew says that we can remove the m(j) parameter from the formula and that it will not change the result when we minimize J since it is a constant and I agree with that.

But why does not he remove the 1/2 as well ? That I cannot understand. Could you enlighten me ?

Andrew did not remove the 1/2 from J, so that when we take the derivative of J, the square in the squared error loss term will cancel out the 1/2

Okay I see, It is about convenience purposes for the optimisation algorithm.

