L2_normalizatoin in MLS-Course 3 (Unsupervised Learning) W2 (Content-based filtering)

Hi, I have a question about the use of L2 normalization for the two NNs in content based filtering of MLS-course 3 week 2:

In the tensorflow implementation slides and the lab both vu and vm are l2_normalize’d before they go to the dot layer for prediction

``````vu = tf.linalg.l2_normalize(vu, axis=1)
vm = tf.linalg.l2_normalize(vm, axis=1)
``````

in the lab code, the user ratings of movies have been MaxMinScale’d:

``````scalerTarget = MinMaxScaler((-1, 1))
``````

Why do we do L2 Normalization before dot product, and why do we maxmin scale the output Y? In earlier classes, Dr Ng did not go through the need to scale output label values. What’s the harm of not doing either?

2 Likes

Its good to normalize before dot product so multiplications do not become large numbers and in gradient descent the oscillations are not huge, as well as probably less memory usage!

Minmax scaling here I think is converting the output to a categorical or discrete output, its better used in processing by computer systems in general as well as machine learning equations when regression is involved!

1 Like

Thanks. So on the l2 norm of vu and vm, even though they are supposed to converge to relatively small numbers, but before convergence, without the L2 norm, it’s possible during the iteration we have large values, so this L2 norm ensures at each iteration, we have pretty consistently small value to deal with?

And on the output Y, I’m not sure I understand you. Here the values are not categorical, they are the user ratings of movies (0.5 to 5.0 with a step of 0.5) so the Y_hats are between 0.5 and 5.0, and Y_hats are not supposed to be discrete - do you mean that the MinMaxScaler is to confine the Y_hats in such a way that they never go beyond the upper and lower bounds of 5.0 and 0.5?

1 Like

I have to see the lab to give specific answers, my answer above is the general case of why these are used so!

Hello @wr200m,

We specifically need L2 norm there because it makes sure that, after the dot product, the result is between -1 and 1 , which is required by the min-max-scaled y.

So, what do you think is the benefit of forcing the range to be consistent with the labels?

Cheers,
Raymond

1 Like