Dear Mentor,
Could you please guide me on how to calculate this derivative?
From the lecture, the derivative of regularization term is this
From my understanding, the answer should be this
Please correct me if there is any mistake.
Thank you
Dear Mentor,
Could you please guide me on how to calculate this derivative?
From the lecture, the derivative of regularization term is this
From my understanding, the answer should be this
Please correct me if there is any mistake.
Thank you
Hello,
can you share the video link from where you are having this doubt, so we can directly have a look on the required video
Regards
DP
Note that in the regularized cost definition, the norm of W is squared.
That is equivalent to the sum of the squares of the individual weight values.
Then you use the power rule (with exponent = 2) to compute the partial derivatives of each weight value.
Note that the cost is defined with a 2 in the denominator, so that it cancels the 2 in the numerator that comes from the power rule.
dl/dw is the partial derivative of the loss function for each of the Xs*. It is the rate of change of the loss function to the change in weight.
So when L2 regularisation is applied, this new definition of dw[l] is still a correct definition of the derivative of your cost function, with respect to your parameters, even after adding the extra regularization term at the end and for this reason L2 regularisation is also called weight decay
here W(l) is not summation but partial loss of cost function in relation to partial loss of weight.
Hi Ms Deepti_Prasad,
Time: 8:35 / 9:42
Hi Mr Tom Mosher,
I still can’t get the correct answer. Could you please guide me based on this attachment?
Hi Ms Deepti_Prasad,
May i know why here W(l) is not summation? Could you please guide me based on this attachment?
Thank you
Jason
your image only explains your doubt
SORRY FOR THE BAD HAND WRITTING
Update : I am not deleting this even if it is wrong, so for people who have assumed like me, understand it is incorrect
Regards
DP
ok Raymond so I was wrong in my interpretation
Unfortunately, this step is wrong.
The safest way is to consider an element from the Matrix because it makes it a scalar by scalar differentiation. When it is scalar by scalar, everything we have learnt about differentiation will work.
Thanks for correcting. actually in the regularisation video Professor Andrew only mentions as "it turns dj/dw is /|w(l)/m and doesn’t not explains how. So the question raised by him is genuine.
Mr Raymond, Thank you so much for your guidance
Ms Deepti_Prasad, Thanks for your reply and handwritten note.
Mr Tom Mosher, Thanks for your reply.