Derivative of regularized logistic cost function-- does it need the DIMENSION of the w vector?

Hi @s-dorsher,

I believe there is a mistake in your derivation. For the regularization term \displaystyle J_{\rm reg} = {\lambda \over 2m} \sum_{j = 1}^n w_j^2 we want to compute the derivative with respect to a specific weight w_j:

{\partial J_{\rm reg} \over \partial w_j} = {\lambda \over 2m} \left( {\partial\over \partial w_j} \sum_{j = 1}^n w_j^2 \right) = {\lambda \over 2m} \left( {\partial\over \partial w_j} \sum_{k = 1}^n w_k^2 \right) = {\lambda \over 2m} \sum_{k = 1}^n {\partial\over \partial w_j} w_k^2 \\ = {\lambda \over 2m} \left( {\partial\over \partial w_j} w_1^2 + \dots + {\partial\over \partial w_j} w_j^2 + \dots + {\partial\over \partial w_j} w_n^2 \right) \\ = {\lambda \over 2m} \left( 0 + \dots + 2 w_j + \dots + 0 \right) = {\lambda \over m} w_j.

Please notice, that I changed the summation index from j to k in the sum to avoid confusion, as the partial derivative is taken with respect to w_j, and using the same index in both the sum and the derivative would lead to ambiguity.

Dividing by m helps to ensure that the gradient step size due to the regularization is of similar scale to the gradient of the loss function.

3 Likes