Creating and randomizing training, dev, and test data sets

There is an assignment in Week 2 of Course 2 where they suggest how to do that. See the Optimization Assignment. The technique used there is to split the training set into mini-batches, but the shuffling technique is completely generic.

You can use np.random.permutation to generate a permuted list of numbers and then use ranges of that list as indices into the “samples” dimension of your arrays.

Here’s a little experiment to show the idea:

np.random.seed(2)
perm = list(np.random.permutation(8))
print(f"perm = {perm}")
A = np.random.randint(0,10,(2,8))
print(f"A = {A}")
print(A[:,perm[0:4]])
perm = [4, 1, 6, 2, 3, 7, 5, 0] 
A = [[2 1 5 4 4 5 7 3] 
     [6 4 3 7 6 1 3 5]] 
[[4 1 7 5] 
 [6 4 3 3]]