Regularization derivative for L2 norm


when it talks about the formula for dw as dW[L]=(from backprop) + lambda/m * W[L]. I was wondering why it was not + lambda/2m instead. This is because it was shown in the formula but not in the derivative.

The derivative causes the β€˜2’ term to be canceled. It’s an application of the power rule for taking derivatives.

1 Like