[Week 1] How are the weights updated in backpropagation thorough time?

Thank you Paul for your answer, covering many SGD aspects and thus drawing the big picture.

If you are adding them one at a time (without repetitively recomputing the loss for earlier time steps), you will not change the loss function, but the coordinate where you take the gradients, right? E.g. if you immediately update the weights based on the gradient for \mathcal{L}^{T_y} , will you not take the gradient for \mathcal{L}^{T_y-1} at a slightly different coordinate?