C1W2 Ungraded Lab - ImageDataGenerator flow_from_directory Doubt

Hi!
In the following example taken from the exercise, we use class_mode=‘binary’ with the flow_from_directory method while having 3 classes.

train_generator = train_datagen.flow_from_directory(
        '/tmp/data/imbalanced/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

What is the difference between using “binary” or “categorical” in these kind of projects where we have more than 2 classes? Shouldn’t it be “categorical”?

Thank you and best regards!

1 Like

Your understanding of class_mode is correct. I’ve asked the staff to fix the notebook.

Turns out that train_datagenerator encodes labels as integers when there are more than 2 classes. Here’s the output of np.unique(train_generator.labels):

0
1
2

The bug is in keras.preprocessing.image.DirectoryIterator since it doesn’t check for compatibility between the number of classes and binary class mode.

keras.preprocessing.image.DataFrameIterator does check for the condition you pointed out.

Could you please file a ticket with keras on github?

Thanks.

1 Like

Hi @Vilabella! Did you try using categorical? Because I’m pretty sure it shouldn’t work with the loss function used, but let me know. I might be wrong about that.

I agree that this a poor naming choice for this because it gives the impression that it only supports two classes while in reality it gives 1-D encodings for n number of classes. If you used categorical you will get one-hot encodings (2-D) of the classes and if I remember correctly this is incompatible with the loss function being used. The use of one or the other really depends on the loss function you are using. @balaji.ambresh I don’t know if this is a bug per se or just a bad naming convention that they decided to use. This is kind of anecdotical but the other day I was reading an article that mentioned that ´ImageDataGenerator´ was never meant to be used in production but just as a convenience tool to test stuff but it ended up being used everywhere and now is pretty much the standard for these kinds of tasks :joy: looking at the docs it looks like they are deprecating it so we probably will update this notebook in a near future.

1 Like

@a-zarta

This is a bug since the expectation for the class labels is to be 1D binary labels as per the documentation: “binary” will be 1D binary labels. 0, 1 and 2 are not binary labels. Besides, the explanation of categorical says that the encoding is 2D one hot.

Check is made for DataFrameIterator for the number of classes (see code reference from my previous post). When using flow_from_dataframe method, say with mnist digits dataset with class_mode="binary", you’ll notice an error message like ValueError: If class_model="binary" there must be 2 classes. Found 10 classes. This error check is not performed when using flow_from_directory.

ImageDataGenerator as you mentioned is not recommended for production use since it doesn’t use tensors to perform operations. You can read about it here where the preprocessing package is deprecated in favor of tf.keras.utils.text_dataset_from_directory and tf.keras.utils.image_dataset_from_directory. Unfortunately, the exam requires the candidate to know ImageDataGenerator API.

Please see this staff ticket.

It’d be good to let the keras team know about this bug. If either of you agree that this is a non-issue, I’d be happy to file a ticket to the keras team.

Oh alright, then it sounds like this is a bug. Sure, I see no problem if you decide to file an issue to the keras team.

The notebook has been updated to use “sparse” instead.

Thanks for the update.