My split data function runs however it produces a random number of test images and training images. Everytime I run the code, I get a random number of splits. This causes errors later on during training as some of the zero length data files are in the training set.
The output after I run the code is this:
666.jpg is zero length, so ignoring
11702.jpg is zero length, so ignoring
666.jpg is zero length, so ignoring
11702.jpg is zero length, so ignoring
Original cat’s directory has 12500 images
Original dog’s directory has 12500 images
There are 12365 images of cats for training
There are 12382 images of dogs for training
There are 2366 images of cats for validation
There are 2383 images of dogs for validation
Please look at the way the function is tested. The grader will explicitly invoke your code for cats and dogs separately. Please use the function parameters to copy valid images in the source directory into training and validation directories.
That is what I have done . I have created lists called train_dog, train_cat, val_dog, val_cat filled with images. The length of these lists are the sizes they should be. However when I copy them over to the directory I get random numbers of cats and dogs in the directory.
Sure. Read the next cell before attempting the function that splits the data. You don’t have to pay attention to cats / dogs separately. Look only at the SOURCE_DIR
I am really struggling with this function. I dont understand how we are not required to do it seperately as the cats and dogs images are in 2 seperate folders. It is nearly the deadline and In have not managed to find a solution. What can I do now?
You don’t need to think about cats and dogs images separately because below the split_data function, there’s a cell,# Test your split_data function.
This is a pre-coded cell.Learners don’t need to think about this. Here, the paths for dogs and cats are already defined separately, and then we run the split_data function with required parameters, that invokes the required output directly.
Btw I have seen your notebook, you have hardcoded separately for each case also for the graded function #create_train_val_dirs.Don’t do that.
Kindly go through the instructions of the notebook, which are given above the functions.
Hi @nilosreesengupta, I am trying to approach the split data function in a different way. However, I am not sure how to do it so that it is not done seperately.
I am going to try and do the following steps:
Create a directory called ‘Copies’ with 2 sub-directories called ‘cats_copies’ and ‘dogs_copies’
Copy the images of cats and dogs into these subdirectories so as not to change the original files.
Remove images of size less than zero.
Randomise the order of the contents of the file
Split it into the test and validation sets
Can you tell me if this is the right approach and if not how do I go about it?
I have been doing this assignment for a long time and Im struglling to get anywhere with it.
How do I do it without using seperate directories. The cats and dogs images are both in 2 seperate directories so how can I treat them as one without writing the same code for both directories