Clarification about weights in mini batch processing

Would like to confirm I have this right wrt weights updating in mini batch processing

  • prior to mini batch processing, all weights start the same (across all the samples)
  • after all samples are processed and loss computed, weights are adjusted
  • for next mini batch, weights for all samples now start at same point with new values

Weights are independent of samples. The dimensions of W^{[l]} and b^{[l]} are determined only by the number of features in the input and then the number of output neurons in each layer, right?

The key point about minibatch gradient descent is that the all the weights are updated after each minibatch. That is as opposed to “full batch” gradient descent in which the weights get updated only after processing the entire batch of samples. The terminology is that one “epoch” means one training iteration through all the training samples. So in “full batch” the weights get updated once per epoch. In the case of minibatch, one epoch means iterating over all of the minibatches that form the full set of training samples, so the weights will be updated multiple times per epoch. That is the primary advantage of minibatch: when it works well, you end up being able to get the same level of convergence with fewer total epochs of training. Of course, as with everything here, there is no guarantee that always happens, since you can also have more statistical noise in the gradients in the minibatch case, particularly if the minibatch size is very small. So you may also need to also apply momentum or other techniques to mitigate that.

Thanks very much - helps a lot.