Frobenius Norms

For deep neural networks, we use Frobenius norms instead of L2 norms on Weights, is that right?

My understanding is that L2 is usually applied on vectors, while W[1] here is a matrix.

  1. l ↩︎

For the objects we are dealing with, they are really the same. It’s just the square root of the sum of the squares of the elements in both cases, right? Although we don’t bother with the square root for the same reason that we use Mean Squared Error as a regression cost function instead of Euclidean distance: the computational cost is less and the gradients are simpler.