In the mean normalization video, Andrew teaches about the requirement of normalization. Here in the image taken from the video, he explain that since the user Eve has not yet reviewed any movie yet, W and b will be almost equal to zero. My doubt, since we have a condition (i,j):r[i][j] = 1 , then how are we getting the values of W and b for the user Eve to be very small([0,0] , 0). Are they just initial values that are assigned to the weights or is there any step I’m missing that will help us update the weights and bias here. Please help me …
hi @Shireesh_Kumar1 Yes, you are correct. The values are just default initial values since the user hasn’t rated any movie, typically the values are small, in this case zero.
I think I am going to elaborate on Andrew’s explanation:
Because Eve hasn’t rated any movies yet, the parameters w and b don’t affect this first term in the cost function because none of Eve’s movie’s rating play a role in this squared error cost function. And so minimizing this means making the parameters w as small as possible.
The regularization term intends to push all weights to zeros. The only countering force is that zero weights will increase the error term which in turns increases the cost. The algorithm’s objective is to decrease the cost so it’s finding the balance between the two forces.
Since Eve never rated anything, eve’s weight and bias parameters will not show up in the error term, but the weight parameters will always show up in the regularization term. Therefore, without the countering force, Eve’s weight parameters will eventually be pushed to zero by regularization.
As for the bias, as you said, it’s due to zero initialization.