L2_normalizatoin in MLS-Course 3 (Unsupervised Learning) W2 (Content-based filtering)

wr200m · January 22, 2024, 1:26am

Hi, I have a question about the use of L2 normalization for the two NNs in content based filtering of MLS-course 3 week 2:

In the tensorflow implementation slides and the lab both vu and vm are l2_normalize’d before they go to the dot layer for prediction

vu = tf.linalg.l2_normalize(vu, axis=1)
vm = tf.linalg.l2_normalize(vm, axis=1)

in the lab code, the user ratings of movies have been MaxMinScale’d:

scalerTarget = MinMaxScaler((-1, 1))
scalerTarget.fit(y_train.reshape(-1, 1))
y_train = scalerTarget.transform(y_train.reshape(-1, 1))

Why do we do L2 Normalization before dot product, and why do we maxmin scale the output Y? In earlier classes, Dr Ng did not go through the need to scale output label values. What’s the harm of not doing either?

gent.spah · January 22, 2024, 7:52am

Its good to normalize before dot product so multiplications do not become large numbers and in gradient descent the oscillations are not huge, as well as probably less memory usage!

Minmax scaling here I think is converting the output to a categorical or discrete output, its better used in processing by computer systems in general as well as machine learning equations when regression is involved!

wr200m · January 22, 2024, 10:10pm

Thanks. So on the l2 norm of vu and vm, even though they are supposed to converge to relatively small numbers, but before convergence, without the L2 norm, it’s possible during the iteration we have large values, so this L2 norm ensures at each iteration, we have pretty consistently small value to deal with?

And on the output Y, I’m not sure I understand you. Here the values are not categorical, they are the user ratings of movies (0.5 to 5.0 with a step of 0.5) so the Y_hats are between 0.5 and 5.0, and Y_hats are not supposed to be discrete - do you mean that the MinMaxScaler is to confine the Y_hats in such a way that they never go beyond the upper and lower bounds of 5.0 and 0.5?

gent.spah · January 23, 2024, 5:33am

I have to see the lab to give specific answers, my answer above is the general case of why these are used so!

rmwkwok · January 25, 2024, 2:29am

Hello @wr200m,

We specifically need L2 norm there because it makes sure that, after the dot product, the result is between -1 and 1 , which is required by the min-max-scaled y.

So, what do you think is the benefit of forcing the range to be consistent with the labels?

Cheers,
Raymond

Topic		Replies	Views
MinMaxScaling of target in Content Based Recommender Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	395	August 7, 2023
Lab exercise Unsupervised Learning, Recommenders, Reinforcement week-module-1	3	25	July 20, 2025
Double Question: L2 norm-ing item and user vectors, and similar movies using \|\|v_k - v_j\|\|^2 Unsupervised Learning, Recommenders, Reinforcement week-module-2	3	254	December 18, 2023
Wrong code in lecture Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	32	October 11, 2024
Unable to use tf.linalg.l2_normalization() to keras layer Unsupervised Learning, Recommenders, Reinforcement week-module-2	6	43	February 1, 2025

L2_normalizatoin in MLS-Course 3 (Unsupervised Learning) W2 (Content-based filtering)

Related topics