Failed test case for split_data

Failed test case: incorrect number of (training, testing) images when using a split of 0.5 and 12 images (6 are zero-sized).
Expected:
(3, 3),
but got:
(6, 0).

Here is my code:

def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):

  ### START CODE HERE
  # Shuffle list
  shuffled_source = random.sample(os.listdir(SOURCE), len(os.listdir(SOURCE)))

  # Find total number of files in training dir
  training_number = int(len(shuffled_source) * SPLIT_SIZE)

  i = 0
  target = TRAINING

  for item in shuffled_source:
    item_source = os.path.join(SOURCE, item)
    if os.path.getsize(item_source) == 0:
      print(f'{item} is zero length, so ignoring.')
    else: 
      copyfile(item_source, os.path.join(target, item))
      i += 1

    # Switch copy target to TESTING
    if i == training_number:
      target = TESTING

Don’t really understand what went wrong here, any help appreciated.

2 Likes

Hello @Ikkakujuu ,

Thanks for reaching out.

In the instructions, it’s said to take split = 0.9.So training data will have 90% and testing data will have 10%.

You can try it using 0.9 as split.

With regards,
Nilosree Sengupta

2 Likes

This is using the SPLIT_SIZE, in this case 0.9, and getting the number of training images.

But in the test case where the SPLIT_SIZE is 0.5 on 6 valid images, it doesn’t seem to be getting 3 as the number of training images.