L1 Regularization

In the regularization lecture, professor Andrew Ng introduces two different notations for L1 regularization.


To be on the same page, notice that this formula applies for the logistic regression where w is a column vector consisting of weights between each input neuron and the output layer (which consists of one neuron if we are dealing with binary classification).

Now, my confusion arises from the fact that LHS of the equation is not equal to the RHS of the equation. In LHS, we sum the absolute values of each weight and then multiply with (lambda)/2m.
However, in the RHS we calculate the magnitude of the vector w, which is equivalent to summing squares of each weight and then taking the square root of the sum. Obviously LHS and RHS are not the same and they don’t necessarily give the same results.

I strongly believe LHS is the real L1 regularization.
1- Is there really a mistake at this point about the notation in RHS or do i miss something?
2- Can you confirm that LHS is the one that is called L1 regularization.

Thanks a lot.

Welcome to the community.

Please pay attention to the subscript “1”.

L1 Norm = \parallel W\parallel_1 = | w_1|+|w_2|+|w_3|+ ...|w_{nx}|
L2 Norm = \parallel W\parallel_2 =\sqrt{w_1^2+w_2^2+w_3^2+...w_{nx}^2}
LP Norm = \parallel W\parallel_p =\sqrt[p]{w_1^p+w_2^p+w_3^p+...w_{nx}^p}

So, the equation that Andrew wrote is correct.

(In the case of Matrix, we use Frobenius norm.)

1 Like

Thank you for clarification. Now, it makes sense. However, if I get it correctly, in the last equation you should have taken absolute values of weights, is it correct?

That’s right. (Even for L2, it must be the absolute value from the formula point of view. )
Thank you for pointing out. Here is the updates. (Updated LP norm only)

L1 Norm = \parallel W\parallel_1 = | w_1|+|w_2|+|w_3|+ ...|w_{nx}|
L2 Norm = \parallel W\parallel_2 =\sqrt{w_1^2+w_2^2+w_3^2+...w_{nx}^2}
LP Norm = \parallel W\parallel_p =\sqrt[p]{|w_1|^p+|w_2|^p+|w_3|^p+...|w_{nx}|^p}

1 Like