Hello,

I have a little question about differences between vectorized and non vectorized implementation of forward prop/back prop and gradient descent.

Please correct me if i’m wrong:

- using non vectorized implementation, parameters are initialised then updated for each training example,
- whereas using vectorized implementation, parameters are still initialised but get updated only once using the whole training set.

If that is correct, and if we suppose that both implementations takes the same computation time (which is false but just for me to make it clear): could we say that non vectorized implementation will converge “faster” after -let’s say- 5 iterations, because parameters are updated much more often ?

I’m completly missing something here ?

Thanks for your help !