I was originally going to ask why the square root of the L_{2} Norm was omitted in \lvert\lvert w \rvert\rvert_{2}^{2}=\sum_{j=1}^{n_x}w_j^2. Then it hit me that we’re squaring \lvert\lvert w \rvert\rvert_{2} in \frac{\lambda}{2m}\lvert\lvert w \rvert\rvert_{2}^{2} and therefore, the root is irrelevant. Why do we square the L_{2} norm?

I’ve read the page you linked on p-norm but I don’t believe I found any answer to my question.

I understand that \lvert\lvert w\rvert\rvert_{2}=(\sum_{j=1}^{n_{x}}w_{j}^{2})^\frac{1}{2}

I do not understand why we then take \lvert\lvert w\rvert\rvert_{2} and square it, \lvert\lvert w\rvert\rvert_{2}^{2}

This is done to simplify calculations. If you look at the backward propagation step, the `2`

gets cancelled out from taking the derivative of the regularization term and we don’t have any square roots left.