In the lecture Gradient Descent on m Examples, since we calculate the derivatives for each example and then average them out, I wanted to understand what is the role played by each example?

I tried to understand it in the following way: We are trying to optimise a function `L =f(w1,w2,b)`

where each input x^(i) acts as a parameter that modifies slightly the surface of L in the w,b space. Then for each input we have a slightly different set of derivatives of L w.r.t. w and b. Finally we average all the derivatives to be able to use GD and obtain a new operating point (w,b) on which we compute again the error and the derivatives until we arrive to the global optimum.

Would this be a correct interpretation of this process?