Convolutional Neural Networks in TensorFlow - week 1

Nishanth_Rajkumar · August 20, 2022, 3:00pm

For the week 1 assignment of this course, one of the graded function tasks was to split the image dataset into ‘Training’ and ‘Validation’.

The following is my code:
def split_data(SOURCE_DIR, TRAINING_DIR, VALIDATION_DIR, SPLIT_SIZE):
“”"
Splits the data into train and test sets

Args:
SOURCE_DIR (string): directory path containing the images
TRAINING_DIR (string): directory path to be used for training
VALIDATION_DIR (string): directory path to be used for validation
SPLIT_SIZE (float): proportion of the dataset to be used for training

Returns:
None
“”"

source_images = os.listdir(SOURCE_DIR)
for img in source_images:
if os.path.getsize(SOURCE_DIR+img) == 0:
print(f"{img} is zero length, so ignoring.")
source_images.remove(img)
train_sample = random.sample(source_images, int(len(source_images)*SPLIT_SIZE))
for img in source_images:
if img in train_sample:
copyfile(SOURCE_DIR+img, TRAINING_DIR+img)
else:
copyfile(SOURCE_DIR+img, VALIDATION_DIR+img)

But this failed one of the test cases.
The test case:
Failed test case: incorrect number of (training, validation) images when using a split of 0.5 and 12 images (6 are zero-sized).
Expected:
(3, 3),
but got:
(4, 5).

I don’t understand why it failed. Request anyone to help me understand the error of my code logic here

paulinpaloalto · August 20, 2022, 5:46pm

You posted this under General Discussion. I don’t recognize the code from DLS, which is the specialization I’m familiar with. You’ll have better luck getting the attention of the right people if you post it in right category for the specialization and course you’re taking. You can move it by using the little “edit pencil” on the title.

Nishanth_Rajkumar · August 20, 2022, 6:03pm

Thank you for the edit suggestion.
I have made the change accordingly.
This is my first time navigating through this community page.

balaji.ambresh · August 20, 2022, 6:30pm

Removing objects while iterating through the same collection is a bad idea.
Please use a separate list for those images whose size is not zero.

Here’s an example where removing an object while iterating over the same list produces incorrect results:

>>> l = [str(i) for i in range(10)]
>>> import random
>>> random.shuffle(l)
>>> l
['6', '2', '4', '8', '9', '3', '7', '0', '1', '5']
>>> for item in l:
...     if int(item) % 2 == 0:
...             l.remove(item)
... 
>>> l
['2', '8', '9', '3', '7', '1', '5']

John_Pan · November 28, 2023, 8:03pm

I’m trying to understand why this fails.

Is it because

The For loop iterates over a list and it internally stores an incrementing index value.
The list that is being iterated over is losing length from the .remove()
The for loop increments index, which ends up skipping a value in the list since it had a value removed.

?

paulinpaloalto · November 28, 2023, 8:24pm

Yes, that is the point that Balaji was making in his earlier response. The list is changing underneath you as you iterate. You should build a new separate output list to hold the things that you don’t want to delete.

Or better yet, think of ways to implement this without a for loop. Either by “logical indexing” or by using a python “enumeration”. The enumeration is really a loop, but you can express it more simply and it supports the idea of subsetting the list because the subsetting operation does not happen “in place”. It’s the “in-place-ness” that’s getting you in trouble with the current implementation.

paulinpaloalto · November 29, 2023, 8:39pm

I experimented a bit and I have not yet figured out how to get “logical indexing” to work with lists, but it works with arrays. Here’s an example:

np.random.seed(42)
A = np.random.randint(0, 10, (12,))
print(type(A))
print(f"A = {A}")
B = A[A < 6]
print(type(B))
print(f"B = {B}")
<class 'numpy.ndarray'>
A = [6 3 7 4 6 9 2 6 7 4 3 7]
<class 'numpy.ndarray'>
B = [3 4 2 4 3]

Of course this is not exactly the problem you are trying to code, but I’m just showing the technique without writing out your solution for you. The point of that technique is how clean and expressive the code is.

Here’s an example of how to use an enumeration that works with a list:

Alist = list(A)
print(type(Alist))
print(f"Alist = {Alist}")
C = [Alist[ii] for ii in range(len(Alist)) if Alist[ii] < 6]
print(type(C))
print(f"C = {C}")
<class 'list'>
Alist = [6, 3, 7, 4, 6, 9, 2, 6, 7, 4, 3, 7]
<class 'list'>
C = [3, 4, 2, 4, 3]

That code has the advantage that it does not use “in-place” operations on the input list. I guess it’s a matter of taste whether you think that’s cleaner code than explicitly writing it as a for loop that appends to a new output list under the appropriate condition. But it is pretty “pythonic” FWIW.

Topic		Replies	Views
My split_data is taking more time to execute Convolutional Neural Networks in TensorFlow week-module-1	2	535	December 19, 2022
Splits the data into train and test sets Convolutional Neural Networks in TensorFlow week-module-1	2	570	June 16, 2022
C2W1 Assignment - Failed test case Convolutional Neural Networks in TensorFlow week-module-1	2	546	January 6, 2023
Incorrect number of (training, validation) images when using a split of 0.5 and 12 Convolutional Neural Networks in TensorFlow week-module-1	6	773	September 8, 2022
Programming Assignment - Exercise 1 and 2 Convolutional Neural Networks in TensorFlow week-module-1	6	509	March 18, 2023

Convolutional Neural Networks in TensorFlow - week 1

Related topics