A doubt on derivative

Dear Mentor,

Could you please guide me on how to calculate this derivative?

From the lecture, the derivative of regularization term is this

From my understanding, the answer should be this

Please correct me if there is any mistake.

Thank you

Hello,

can you share the video link from where you are having this doubt, so we can directly have a look on the required video

Regards
DP

Note that in the regularized cost definition, the norm of W is squared.
That is equivalent to the sum of the squares of the individual weight values.

Then you use the power rule (with exponent = 2) to compute the partial derivatives of each weight value.

Note that the cost is defined with a 2 in the denominator, so that it cancels the 2 in the numerator that comes from the power rule.

dl/dw is the partial derivative of the loss function for each of the Xs*. It is the rate of change of the loss function to the change in weight.

So when L2 regularisation is applied, this new definition of dw[l] is still a correct definition of the derivative of your cost function, with respect to your parameters, even after adding the extra regularization term at the end and for this reason L2 regularisation is also called weight decay

here W(l) is not summation but partial loss of cost function in relation to partial loss of weight.

Hi Ms Deepti_Prasad,

Time: 8:35 / 9:42

Hi Mr Tom Mosher,

I still can’t get the correct answer. Could you please guide me based on this attachment?

Hi Ms Deepti_Prasad,

May i know why here W(l) is not summation? Could you please guide me based on this attachment?

Thank you

Jason

your image only explains your doubt

SORRY FOR THE BAD HAND WRITTING :joy:

Update : I am not deleting this even if it is wrong, so for people who have assumed like me, understand it is incorrect :grinning:
Regards
DP

Cheers,
Raymond

1 Like

ok Raymond so I was wrong in my interpretation :frowning:

1 Like

Unfortunately, this step is wrong. :wink:

image

The safest way is to consider an element from the Matrix because it makes it a scalar by scalar differentiation. When it is scalar by scalar, everything we have learnt about differentiation will work.

1 Like

Thanks for correcting. actually in the regularisation video Professor Andrew only mentions as "it turns dj/dw is /|w(l)/m and doesn’t not explains how. So the question raised by him is genuine.

Mr Raymond, Thank you so much for your guidance :grinning:

Ms Deepti_Prasad, Thanks for your reply and handwritten note.

Mr Tom Mosher, Thanks for your reply.