I think I have a problem in handling the end case (last mini-batch < mini_batch_size i.e. less than 64).
I used the same mini_batch_X as in the normal size case and multiplied it by (m - mini_batch_size)

It’s not m minus the batch size, right? It’s m minus the batch size times the number of full minibatches. Or it’s the remainder when you divide m by the minibatch size.

When you get a “wrong shape” error, the first question is “Ok, what shape is it?” If you print the shape you are getting, that should give a good clue about the nature of the problem.

1 Like

Is the shuffled_X has the same shape as the mini_batch_X?

I’m not sure what you mean. Note that shuffled_X is the entire X dataset, whereas the minibatches are subsets of it. The point is that shuffled_X is the whole X dataset, but randomly reordered so that you can get different minibatches each time (each full epoch). They all have the same number of features (rows), but not the same number of columns (samples).

Got it, Thanks a lot.