Updating parameters, vectorized VS non vectorized

I have a little question about differences between vectorized and non vectorized implementation of forward prop/back prop and gradient descent.
Please correct me if i’m wrong:

  • using non vectorized implementation, parameters are initialised then updated for each training example,
  • whereas using vectorized implementation, parameters are still initialised but get updated only once using the whole training set.
    If that is correct, and if we suppose that both implementations takes the same computation time (which is false but just for me to make it clear): could we say that non vectorized implementation will converge “faster” after -let’s say- 5 iterations, because parameters are updated much more often ?
    I’m completly missing something here ?
    Thanks for your help !

Hi, @Nosical. The term vectorization as used here refers to operating on all training examples simultaneously, rather than each example individually. In the latter case, loops are required for many of the operations, and loops are computationally inefficient (i.e., slow)