Why exactly $m_j$ is removed from the loss function

So Andrew sir has first introduced m_j and now he removed this term with a message “Even if we remove it weights will converge to same. Because m_j is just a constant”

This doesnt make sense to me. Basically we divide by the number of elements used in the average. The loss function is MSE so if we remove division by m_j terms it will not be the mean but simply squared error. And \frac{1}{2} to prevent upscaling when 2 will be multiplied on differentiation.

In the course assignment I see it is mentioned squared error. So I believe then it was by mistake added in the class and then later removed

That term isn’t the average of the squares of the weights.
Division my m is correct.
The concept is that the larger the data set, the less regularization is needed.

That’s not an error. In this method the cost is based on the squared error.

The numerator there is m.

You mean denominator? Or i didnt get this

Can you indicate your question in red ink?

My question is why did m_j was first used in the loss function and later it was removed saying “add it or remove it doesnt affect the weights learned”?

Sorry, I do not understand your question.

Can you give a specific link or video timestamp where this is stated?

Watch this video after 8:30

I have not checked the video, but in any problem involving the minimisation of a loss function, the value obtained after its minimisation will be the same if it gets multiplied by a constant.

As a simple example imagine the loss being L = A*(x - 2) ** 2, with A > 0. The minimum value is x = 2, regardless of the value of A.

In the case of the equation you are showing, as long as the value is being multiplied by the MSE and the L2 part, then you can remove it. If you were to add a regularisation term that does not have the m(j) part, then you cannot remove that value. So, I can see why it is important to show it, but then cross it off for this particular case.

Hopefully, that makes it a bit clearer

1 Like