Derivative of regularized logistic cost function-- does it need the DIMENSION of the w vector?

conscell · December 19, 2024, 1:36am

Prof. Ng in this video (at 4:30) said: “A couple of things I would like to point out by convention, instead of using lambda times the sum of w_j squared. We also divide lambda by 2m so that both the 1st and 2nd terms here are scaled by 1 \over 2m. It turns out that by scaling both terms the same way it becomes a little bit easier to choose a good value for lambda. And in particular you find that even if your training set size growth, say you find more training examples. So m the training set size is now bigger. The same value of lambda that you’ve picked previously is now also more likely to continue to work if you have this extra scaling by 2m.”

In modern frameworks such as TensorFlow and PyTorch, L2 regularization is implemented as weight decay and it doesn’t take into account the batch size:

TensorFlow
The L2 regularization penalty is computed as: loss = l2 * reduce_sum(square(x)).
dense = Dense(3, kernel_regularizer= tf.keras.regularizers.L2(l2=0.01)
PyTorch
It is built into the optimizer: torch.optim.SGD(params, lr=0.001, weight_decay=0.01).
weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

Topic		Replies	Views
Regularized Logistic Regression Cost Function Supervised ML: Regression and Classification week-3	1	476	March 25, 2023
Logistic Regression Derivative of J(w,b) Supervised ML: Regression and Classification week-3	12	1049	May 16, 2023
Derivative of "Simplified Cost Function" Supervised ML: Regression and Classification week-3	1	586	March 19, 2023
Derivative of the Cost Function in Logistic Regression Supervised ML: Regression and Classification week-3	2	510	August 24, 2023
Derivative of regularization term Supervised ML: Regression and Classification week-3	22	1525	November 6, 2024

Derivative of regularized logistic cost function-- does it need the DIMENSION of the w vector?

Related topics