Week 2 final lab

Here why it is always the batch number 157 got executed not the other batches? Shouldn’t 40 out of 157 batches be randomly selected during the 40 runs?

Hi @flyunicorn,

Each epoch completes after all 157 batches have been processed. The number displayed corresponds to the final batch.

1 Like

then what is the benefit of splitting the set into 157 parts if all examples will be used in 1 epoch anyway?

In full-batch gradient descent, the model processes the entire training set before updating the weights, resulting in accurate but slow updates and high memory usage. In contrast, mini-batch gradient descent splits the data into smaller batches (157 in this lab), allowing the model to update weights more frequently (once per batch) using less memory and making faster progress. Each mini-batch gives a slightly noisy estimate of the true gradient which can act as a form of regularization.