Can we allow replacements in Mini-Batch partitions? C2 W2

Ldai-CBS · August 23, 2024, 5:55pm

In the example that Andrew showed in the course, a training dataset (m = 5,000,000) can be partitioned into 5,000 mini batches, each has 1,000 x-objects.
I’m curious whether we can allow replacements when constructing mini batches. For example, can we randomly select 1,000 objects from m for each batch and repeat this process like 10,000 times to have 10,000 mini batches.
Intuitively, it looks like bootstrapping.

paulinpaloalto · August 23, 2024, 7:21pm

The normal process (which we will see in the programming exercise in W2) is that you randomly shuffle the full dataset on each “epoch” before creating the minibatches. So in the example of m = 5,000,000 and batch size = 1000, you’d have 5,000 minibatches in each epoch, but the contents of each individual minibatch will be different in each epoch. The intent is to smooth out the statistical behavior.

Note that this is not exactly what I think you are proposing. In your scheme some of the data is duplicated in each epoch. It’s not clear whether that is a good thing or not. If you have 10,000 minibatches in an “epoch” that is effectively changing the definition of an epoch. Perhaps it all comes out in the wash and with your scheme you’d end up needing half as many “double epochs” to achieve the same level of convergence you’d get from Prof Ng’s definition. But if I’m understanding your proposal correctly, I think that in both cases the total cost in terms of wall clock time and cpu/gpu time would be essentially the same or very close to it.

nadtriana · August 23, 2024, 7:53pm

@paulinpaloalto made an important observation about the difference between the standard approach and the alternative scheme you proposed. While your proposed approach is unconventional, it could provide insights into how different mini-batch strategies affect training dynamics. However, careful consideration should be given to the potential drawbacks, especially concerning overfitting and bias.

Ldai-CBS · August 23, 2024, 8:37pm

@paulinpaloalto Thank you for your reply, I think you are right.
So I tried this novel mini-batch method through revising the code in W2 assignment.
The cost figure seems to be more zigzagging. But there is basically no difference in accuracy and computing time.

Ldai-CBS · August 23, 2024, 8:42pm

But I’m also wondering, why are the 6.1 6.2 6.3 result figures in W2 so smooth. I didn’t expect that before. Andrew said in the course that using mini-batch gradient descent would make the cost figure seem oscillating than batch gradient descent right?

paulinpaloalto · August 23, 2024, 8:50pm

The answer is to look closely at how those graphs are plotted in the assignment. Notice that they are only showing the cost values at the end of each 100 full epochs, right? So that smooths out all the statistical noise that you get on each individual minibatch.

Ldai-CBS · August 23, 2024, 9:02pm

I see! Thank you so much!

Topic		Replies	Views
Construction of mini-batches: could one just "sample w/o replacement from the full batch"? Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	11	91	March 13, 2025
Possibility of loosing Effectiveness of large datasets in Mini Batch Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	6	43	May 11, 2025
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	860	May 18, 2021
Why do we process all minibatches in an epoch? Improving Deep Neural Networks: Hyperparameter tun week-module-2 , ai-discussions	4	75	October 14, 2025
Questioning the way of computing epoch_cost Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	591	May 12, 2021

Can we allow replacements in Mini-Batch partitions? C2 W2

Related topics