From videos, dw1 = X .dz
basically as per intuition explained in videos, what that means is if we nudge w1 by tiny size may 0.001 how much loss function will vary.
For ex if w1 is 3, and say loss is 6
nudge w1 to 3.001, if loss becomes 6.002, then we say dw1 = 2. But I am not able to get why we are subtracting this value from w1. After all dw1 means slope of loss function w.r.t w1, means how many times loss function will change, it has nothing to do to specifiy how much w1 should change.
Then why we are using this value to change w1?
Hi Sandeep,
May be slide 10 and 11 from week2 would help to understand that.
w1 is the current set of weights and goal is to find the ideal set of weights for prediction based on training samples. After 1 iteration of calculating y_hat, dz and dw we know how much we need to change the w1. Now we need to set the new value for w1 for next iteration of gradient descent.
This means that w1_new = w1 -alpha*dw1. alpha is another parameter to control the step of each gradient. Once we have calculated new w1,we start the next iteration of gradient descent algorithm to find the next w1, till our total cost function is minimised or reaches a global minimum assuming it has a global minimum.
I hope this along with the slides will help it.
If I misunderstood the question or my explanation is not clear, let me know.
Arundeep