Colab run taking too long, cant get pass split_data

I am currently doing the dogs vs cats assignment but the #Test split_data function part taking too long. I even left it running for more than 10 hours and it still has not finished. I only get the

666.jpg is zero length, so ignoring.

return so far and it still continue executing and has not finished.
I even sign up for Colab Pro and set the runtime to GPU and High RAM, but nothing changes. It is still taking too long. Can you guys help give me some advise please? Thank you.

Please click my name and message your notebook as an attachment.

Thanks for responding. Please check your message.

These are the steps to do inside split_data:

  1. For each file in source directory, if the file is of length 0, report the file as being skipped else add it to all_files.
  2. Once you have valid images files, shuffle and split into training and test sets.
  3. Copy training and testing files to respective directories.

Please pay attention to indentation. You are doing steps 2 and 3 from inside step 1, which is incorrect.

Oh you’re right! I stared at it for hours and did not notice that. So silly. Thank you so much for helping.

As a complete beginner, this was really helpful. However, it might help others like me to know that you don’t need to create a new folder named “all_files”. You can keep this information within a list named “all_files” and then you can manipulate the images according to this list and/or other intermediate lists that you might need in the future.

@SURENDRA_SRINIVAS
You have given me the assignment notebook with starter code. Please send the right file.

@SURENDRA_SRINIVAS
The files variable you use to keep track of valid files in the source directory should be cleared after every function invocation. Make it a local variable instead of a global variable.

Even after making it Local, I get the following ;

666.jpg is zero length, so ignoring.
11702.jpg is zero length, so ignoring.

Original cat’s directory has 12501 images
Original dog’s directory has 12501 images

There are 11250 images of cats for training
There are 11250 images of dogs for training
There are 1250 images of cats for validation
There are 1250 images of dogs for validation

Its not matching with the expected output.
@balaji.ambresh

There is a step in earlier part of the assignment that when executed will produce the correct results. If you have an active enrollment in this course, refresh the workspace and discover the step that will result in the right numbers by executing cells sequentially.

@balaji.ambresh

I have figured out the Issue and now the accuracy comes around 92%, But to complete the assignment it should be above 95%. I tried different architectures but none crossed 95%. What should i do now ?

Please try different optimizers and learning rates as well.