Hello, I was watching how to apply gradient descent on multiple training examples but I didn’t really understand the whole process.
The left part of this slide was understandable, but the right one is not clear enough. how do we implement w1, w2 and b, do we update parameters in a separate loop?
Thanks in advance.
What is happening here is that after you go through the m examples (one epoch, or one pass of your training set) you normalize (average) the partial derivatives of the weights (w) and intercepts (b). Then, in another loop, going through all the weights from 1 to n and all intercepts b, you subtract from the current values the learning_rate*(the partial derivative you found in the loop on the left and normalized it). Then you can restart the whole process once again, going for another iteration!
1 Like
Thanks for replying.
I understood everything you said except one thing, do you mean we update the parameters in the same 1 to n loop?
In the pic w1, w2 and b:
w1 = w1 - alpha * dw1
w2 = w2 - alpha * dw2
b = b - alpha * db
Are out of both loops on the left, so I didn’t really know how and where to implement them.
Thank you.
1 Like
The left side shows the loop that you go through to compute all the gradients as averages over the gradients on each sample.
Then once you complete that computation, you apply the gradients once, as shown on the right side of the slide.
Then, as Gent described, you repeat that whole two step process for as many iterations as you need to in order to get good convergence on the weight and bias values.
2 Likes