Keep getting random train test splits from split data function

My split data function runs however it produces a random number of test images and training images. Everytime I run the code, I get a random number of splits. This causes errors later on during training as some of the zero length data files are in the training set.

The output after I run the code is this:

666.jpg is zero length, so ignoring
11702.jpg is zero length, so ignoring
666.jpg is zero length, so ignoring
11702.jpg is zero length, so ignoring

Original cat’s directory has 12500 images
Original dog’s directory has 12500 images

There are 12365 images of cats for training
There are 12382 images of dogs for training
There are 2366 images of cats for validation
There are 2383 images of dogs for validation

Please fix notebook permissions on google colab. I don’t have permissions access to the link.

you should be able to access the code from this link:

[public link to colab removed - moderator]

Please look at the way the function is tested. The grader will explicitly invoke your code for cats and dogs separately. Please use the function parameters to copy valid images in the source directory into training and validation directories.

That is what I have done . I have created lists called train_dog, train_cat, val_dog, val_cat filled with images. The length of these lists are the sizes they should be. However when I copy them over to the directory I get random numbers of cats and dogs in the directory.

Can you help me as I have been struggling for a long time on this

Sure. Read the next cell before attempting the function that splits the data. You don’t have to pay attention to cats / dogs separately. Look only at the SOURCE_DIR

I am really struggling with this function. I dont understand how we are not required to do it seperately as the cats and dogs images are in 2 seperate folders. It is nearly the deadline and In have not managed to find a solution. What can I do now?

Hello @Maheed_Miah ,

As @balaji.ambresh has said, focus on SOURCE_DIR.

You don’t need to think about cats and dogs images separately because below the split_data function, there’s a cell,# Test your split_data function.
This is a pre-coded cell.Learners don’t need to think about this. Here, the paths for dogs and cats are already defined separately, and then we run the split_data function with required parameters, that invokes the required output directly.

Btw I have seen your notebook, you have hardcoded separately for each case also for the graded function #create_train_val_dirs.Don’t do that.

Kindly go through the instructions of the notebook, which are given above the functions.

Since your deadline is near, and if you still have doubts, you can check this Test your split data function not printing expected output - #5 by nilosreesengupta

With regards,
Nilosree Sengupta

Hi @nilosreesengupta, I am trying to approach the split data function in a different way. However, I am not sure how to do it so that it is not done seperately.

I am going to try and do the following steps:

  1. Create a directory called ‘Copies’ with 2 sub-directories called ‘cats_copies’ and ‘dogs_copies’

  2. Copy the images of cats and dogs into these subdirectories so as not to change the original files.

  3. Remove images of size less than zero.

  4. Randomise the order of the contents of the file

  5. Split it into the test and validation sets

Can you tell me if this is the right approach and if not how do I go about it?

I have been doing this assignment for a long time and Im struglling to get anywhere with it.

Hello @Maheed_Miah ,

You are advised not to create separate directories for dogs and cats, which you have written in steps 1 and 2 again. Kindly rectify.

Follow the steps here:- Test your split data function not printing expected output - #5 by nilosreesengupta

Hope this helps.

With regards,
Nilosree Sengupta

How do I do it without using seperate directories. The cats and dogs images are both in 2 seperate directories so how can I treat them as one without writing the same code for both directories