Different meaning of Mini-batch used in Training Large Model Vidoe

Training Large Models - The Rise of Giant Neural Nets and Parallelism:

Data parallelism implies a batch divided into mini-batches and each worker given the same model.

After each worker has completed their forward and backprop, weight update is computed for full batch by synchronizing updated model weights from each worker.

Here the batch has been referred to as mini batch. This is confusing. Since batch is referred to as mini batch also, in Data Parallelism context, please use some other term to refer to chunks out of mini batch that’s distributed to different workers.


I could agree with you that for people learning the first time these terms they can be a little confusing. But at the same time, consider that you should start using exactly the same terminology that experienced people and researchers are using, otherwise you will get confused after, when progressing. The terms used are batch, mini batch…so honestly I don’t think it is bad what has been done in the video you mentioned. In addition, you’ll find many times conversations of the same kind. The area is distributed training that in some way is advanced.