Why do we square the L2 Norm of the regularization term?

helmstetter91 · September 16, 2022, 10:19pm

I was originally going to ask why the square root of the L_{2} Norm was omitted in \lvert\lvert w \rvert\rvert_{2}^{2}=\sum_{j=1}^{n_x}w_j^2. Then it hit me that we’re squaring \lvert\lvert w \rvert\rvert_{2} in \frac{\lambda}{2m}\lvert\lvert w \rvert\rvert_{2}^{2} and therefore, the root is irrelevant. Why do we square the L_{2} norm?

balaji.ambresh · September 17, 2022, 6:54am

Please see this page to learn about p-norm.

helmstetter91 · September 18, 2022, 1:46am

I’ve read the page you linked on p-norm but I don’t believe I found any answer to my question.

I understand that \lvert\lvert w\rvert\rvert_{2}=(\sum_{j=1}^{n_{x}}w_{j}^{2})^\frac{1}{2}

I do not understand why we then take \lvert\lvert w\rvert\rvert_{2} and square it, \lvert\lvert w\rvert\rvert_{2}^{2}

balaji.ambresh · September 18, 2022, 7:07am

This is done to simplify calculations. If you look at the backward propagation step, the 2 gets cancelled out from taking the derivative of the regularization term and we don’t have any square roots left.

Topic		Replies	Views
Regularization - L1 and L2 Improving Deep Neural Networks: Hyperparameter tun week-module-1	4	41	August 1, 2025
L2 regularization AI Discussions	1	82	December 29, 2021
Regularization derivative for L2 norm Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	1	329	March 16, 2024
Course 4, Week 4, Assignment 1 L2-Norm missing tf.sqrt()? Convolutional Neural Networks coursera-platform	1	568	November 24, 2021
Why is the regularization term in the cost function w_j^2 instead of just w_j? Supervised ML: Regression and Classification week-module-3	3	263	February 9, 2024

Why do we square the L2 Norm of the regularization term?

Related topics