when it talks about the formula for dw as dW[L]=(from backprop) + lambda/m * W[L]. I was wondering why it was not + lambda/2m instead. This is because it was shown in the formula but not in the derivative.
The derivative causes the β2β term to be canceled. Itβs an application of the power rule for taking derivatives.
1 Like