Can you clarify if the assignment is expecting to get
a NUM_BATCHES of batches out of each of the train and validation datasets, or
train and validation datasets each with a batch size of NUM_BATCHES?
If it is the first option, then how do you expect to get 1125 and 125 as len(train_dataset) and len(validation_dataset)? Shouldn’t they be 128 and 128 batches respectively?
From the markdown, it’s clear that the dataset for this assignment contains 160_000 data points. When we do a train test split of .9, train split will contain 160_000 * .9 = 144_000 data points and the test split will contain 160_000 * .1 = 16_000 data points.
With NUM_BATCHES = 128, each batch contains 128 data points. This leaves the number of batches in the train split as 144_000 / 128 = 1125 and test split as 16_000 / 128 = 125.
All through the Lab exercises, the variable BATCH_SIZE has been consistently used to set the batch size.
Therefore, in the Assignment, with the variable NUM_BATCHES along with the instruction “Turn the dataset into a batched dataset with num_batches batches”, the direction is clear that it is not about the batch size, instead to create that number of batches.
It appears that the unittest implementation is inconsistent with the requirement. And thus, the implementation is forced to follow the unittest validation through the whole assignment.
It’s up to the DLAI team to decide if they want to update and clear up the confusion.