L2 regularization

I’m having trouble understanding the formula for L2 regularization.

from the lecture video they described L2 in a NN as

I know it’s a basic question but I don’t understand there are three terms l,k,j sums and not just j,k as in the video. I just need some direction. Thx

(Just had a thought. Is it simply a notational difference? nothing else?)

1 Like

No, the point is that you are summing over all the layers. At each layer, you have a 2D W^{[l]} matrix and you compute the square of the norm of that matrix, which requires the two inner summations over the rows and columns. And then you need to sum that over all the W^{[l]} at each layer of the network.