Gradients using more than one point (row) of data

In video 3 of week 3 we get the formulas for the gradients of the loss function w.r.t. w1,w2,b. For w1 and w2 we see that there is a multiplication with the actual input 1 and 2 respectively.
In real life we would calculate the cost function taking into account more than one rows of data (batch or all of them).
How would the calculation of the gradients become then? instead of x1 we would have sum of x1 for all rows?

Hello @gkouro

I am yet to watch the video but I also think we should use a batch of multiple rows of data, rather than just a single row. Based on this blog: https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
Please have a look at it, It also contains some papers link that has detailed mathematics on calculations of the gradients.

Happy Learning
Isaak