Gradient descent for logistic regression on m examples

In the lecture Gradient Descent on m Examples, since we calculate the derivatives for each example and then average them out, I wanted to understand what is the role played by each example?
I tried to understand it in the following way: We are trying to optimise a function L =f(w1,w2,b)where each input x^(i) acts as a parameter that modifies slightly the surface of L in the w,b space. Then for each input we have a slightly different set of derivatives of L w.r.t. w and b. Finally we average all the derivatives to be able to use GD and obtain a new operating point (w,b) on which we compute again the error and the derivatives until we arrive to the global optimum.

Would this be a correct interpretation of this process?

1 Like

Yes thats the general go about of the process!

1 Like