Failed test case: incorrect number of (training, validation) images when using a split of 0.5 and 12 images

Hello!

I submitted my assignment and apparently one of my functions failed.
The given reason for this failure is commented as:

Details of failed tests for split_data

Failed test case: incorrect number of (training, validation) images when using a split of 0.5 and 12 images (6 are zero-sized).
Expected:
(3, 3),
but got:
(4, 4).

I have no clue why the following function would split the 6 images into 4 and 2 with a split size of 0.5.

[code removed - moderator]

Any idea?

Please try creating 12 images with 6 of them having 0 size (use unix touch command) and see how many end up getting copied over when split is .5.

I am removing an item from a list which I am running my for loop on :smiley:
So the problem is solved :)))

3 Likes

Can you please elaborate?
I am getting the same feedback and I did not figure out what is the issue!

Hi can I see your split function?

Here is it

[code removed - moderator]

I think you should finish off your while loop before you can start with the for loops.

In the if branch you are checking the first item in your list. After deleting that you then random sample and do operations on a list which not all item are checked of.

If you finish off your while loop first and do not include for loops within the while loop, then you have the chance of cleaning your list first and then perform the split operation after.

I think you could try to remove one of the image += 1 and unindent your for loops :slight_smile:

Thanks, @Sule for the good feedback.

The first img += 1 is when we delete an element from the list, i.e. the list becomes one element shorter. The second img += 1 is the loop counter.

However, I guess the indentation is the cause. I will try and let you know!

Hi got same problem, how this function should look like ?
Here is how I implement it:
shufflelist
counter=0
for file in shufflelist:
if size is zero:
print(f"{file} is zero length, so ignoring.")
else:
counter += 1
if counter<=len(dir)*SPLIT_SIZE:
copyfile train
else:
copyfile validation

Please filter out empty files and then split the remaining files between training and validation datasets.