L1 Regularization

Usta_Shifu · July 25, 2022, 4:53pm

In the regularization lecture, professor Andrew Ng introduces two different notations for L1 regularization.

To be on the same page, notice that this formula applies for the logistic regression where w is a column vector consisting of weights between each input neuron and the output layer (which consists of one neuron if we are dealing with binary classification).

Now, my confusion arises from the fact that LHS of the equation is not equal to the RHS of the equation. In LHS, we sum the absolute values of each weight and then multiply with (lambda)/2m.
However, in the RHS we calculate the magnitude of the vector w, which is equivalent to summing squares of each weight and then taking the square root of the sum. Obviously LHS and RHS are not the same and they don’t necessarily give the same results.

I strongly believe LHS is the real L1 regularization.
1- Is there really a mistake at this point about the notation in RHS or do i miss something?
2- Can you confirm that LHS is the one that is called L1 regularization.

Thanks a lot.

anon57530071 · July 25, 2022, 5:18pm

Welcome to the community.

Please pay attention to the subscript “1”.

L1 Norm = \parallel W\parallel_1 = | w_1|+|w_2|+|w_3|+ ...|w_{nx}|
L2 Norm = \parallel W\parallel_2 =\sqrt{w_1^2+w_2^2+w_3^2+...w_{nx}^2}
LP Norm = \parallel W\parallel_p =\sqrt[p]{w_1^p+w_2^p+w_3^p+...w_{nx}^p}

So, the equation that Andrew wrote is correct.

(In the case of Matrix, we use Frobenius norm.)

Usta_Shifu · July 25, 2022, 6:42pm

Thank you for clarification. Now, it makes sense. However, if I get it correctly, in the last equation you should have taken absolute values of weights, is it correct?

anon57530071 · July 26, 2022, 12:44am

That’s right. (Even for L2, it must be the absolute value from the formula point of view. )
Thank you for pointing out. Here is the updates. (Updated LP norm only)

L1 Norm = \parallel W\parallel_1 = | w_1|+|w_2|+|w_3|+ ...|w_{nx}|
L2 Norm = \parallel W\parallel_2 =\sqrt{w_1^2+w_2^2+w_3^2+...w_{nx}^2}
LP Norm = \parallel W\parallel_p =\sqrt[p]{|w_1|^p+|w_2|^p+|w_3|^p+...|w_{nx}|^p}

Topic		Replies	Views
dW for L2 regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	724	November 5, 2024
DLS 2 Week 1 Programming Assignment 2 - About Notation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	547	August 10, 2021
L2 regularization AI Discussions	1	64	December 29, 2021
Neural Network Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	566	May 7, 2021
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	515	August 30, 2022

L1 Regularization

Related topics