Creating and randomizing training, dev, and test data sets

paulinpaloalto · July 13, 2021, 7:34pm

There is an assignment in Week 2 of Course 2 where they suggest how to do that. See the Optimization Assignment. The technique used there is to split the training set into mini-batches, but the shuffling technique is completely generic.

You can use np.random.permutation to generate a permuted list of numbers and then use ranges of that list as indices into the “samples” dimension of your arrays.

Here’s a little experiment to show the idea:

np.random.seed(2)
perm = list(np.random.permutation(8))
print(f"perm = {perm}")
A = np.random.randint(0,10,(2,8))
print(f"A = {A}")
print(A[:,perm[0:4]])
perm = [4, 1, 6, 2, 3, 7, 5, 0] 
A = [[2 1 5 4 4 5 7 3] 
     [6 4 3 7 6 1 3 5]] 
[[4 1 7 5] 
 [6 4 3 3]]

Topic		Replies	Views
Week 1: Train / Dev / Test video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	382	August 8, 2024
Module1, Setting Up your Goal: Is one test set sufficient for an adequate model performance estimation? Structuring Machine Learning Projects coursera-platform	11	541	March 29, 2023
Confusion about Training Set vs. Dev Set Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	861	December 19, 2021
A few theoretical/practical questions related to structuring projects Structuring Machine Learning Projects week-module-3 , coursera-platform	14	229	March 29, 2024
Week1 Lecture1 Query regarding the point mentioned at time 10.30. Train/dev sets Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	537	March 8, 2022

Creating and randomizing training, dev, and test data sets

Related topics