[Course 5 - Week 2 - Assignment 2] Why not "shuffle" the training data in SGD?

In Week 2’ assignment 2, the model doesn’t shuffle the data when training:

def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    Model to train word vector representations in numpy.

Is it good to have the training data shuffled for each iteration of SGD?

Also in Week 1’s assignment #2 (Dinosaur Island-Character-Level Language Modeling), the model does contain the code for shuffling the dataset but this is done before entering the optimization loop:

# Shuffle list of all dinosaur names # [NOTE]
  # Optimization loop
  for j in range(num_iterations):

Does it have any effect in this case?


I think in short it is hard to tell whether we have to keep shuffling it. Let’s try to think through this: if we do, then we have different sets of mini-batch from an epoch to another epoch, otherwise, we have the same set. The effect of two different mini-batches is that one of them might better lead us to the underlying best set of training parameters. However, we won’t know in advance which mini-batch will be the better one. This is why I said it is hard to tell.

Moreover, with or without shuffling, we already have many mini-batches in an epoch, so the total effect of them is even more unpredictable.

Therefore, if we really want to know, we just need to try it out ourselves. This time it might happen to make a difference, but next time it does not.