I’ve done quite a bit of Googling and I’m just flat stuck on how to split my data… My python knowledge isn’t good enough. I understand the need to shuffle, and then create a for loop to iterate through each directory to establish that a given file size is greater than zero, and if it is to move it from source to training directory… and once 90% has been moved, move the rest to validation (also providing file size is greater than zero)… I just simply don’t know how to write that in Python!
Knowledge of python is assumed for this specialization. Please become familiar with python before moving forward.
Happy learning.
You could try creating a list of n integers randomly shuffled, then use just the first 10 percent of them as index values.
Did you read the original post by @JamesRiley at the top of this topic?
No tensorflow code is required in implementing the functionality for splitting the dataset.
@JamesRiley I think this will help. It provides instructions for splitting data to create training and validation sets in python for machine learning. Split Your Dataset With scikit-learn's train_test_split() – Real Python
Firstly, you’ll have to filter the images that are invalid and then perform the split. Again, you don’t need train_test_split
to get the results. Relying on random
module and list indexing is sufficient (do look at the imports at start of the notebook).
If you need help with basic python constructs, please confirm and I’d be happy to get the moderator involved. It’s possible that you might be right (although the course assumes knowledge of python) in asking for help at that level and if the moderator agrees, they can change the level of the course from intermediate and the instructor can add a python tutorial as well.