Doesn't Data Augmentation via Keras Preprocessing layers also augment the validation set?

John_Pan · December 7, 2023, 4:12am

As I understand it, data augmentation should only be used for the train set.

However, as I understand the code in c4 week 2, “Transfer learning with MobilenetV2 exercise” it implements data augmentation on both the train and validation set.

The data_augmenter() sequential model includes layers of RandomFlip and RandomRotation.

This data_augmenter model then becomes preprocessing layers prior to the frozen mobilenet model used for transfer learning. (see model summary picture)

When model.fit is called, aren’t both train_data and validation data are passed to it?

Doesn’t this mean that both train_data and validation_data are augmented?

Also, In the general case, now that Keras recommends using preprocessing layers instead of ImageDataGenerator(which can run independently on train and test data), doesn’t this mean that train and validation_data are always both augmented if you use the preprocessing layer design paradigm?

In the Tensorflow Developer courses, you see the expected design pattern. You use ImageDataGenerator to preprocess the train data with augmentation(rotation, flip etc) and validation differently with rescale only.

Am I missing something? do keras preprocessing layers somehow not function on the validation dataset?

John_Pan · December 7, 2023, 4:24am

Ah. I figured it out.

It’s PEBCAK because I didnt RTFM.

The documentation states that those specific layers are only active during training, not at inference time (when the validation dataset is being run).

TMosh · December 7, 2023, 4:24am

When model.fit is called, aren’t both train_data and validation data are passed to it?

Yes.
The use of the model.fit validation_data argument is given here (from the Keras Model documentation):

I don’t see anywhere in the notebook that the validation set is augmented. Please post a screen capture image If you know of one.

John_Pan · December 7, 2023, 4:28am

I wasn’t saying that the validation data would be used for training, I was saying that it was being run through preprocessing layers like the training data was at inference time. I didn’t see a way around that, since the preprocessing was baked into the model pipeline instead of being separate from the model.

However, the documentation states that those specific image data augmentation preprocessing layers do not run at inference time. (see my comment above). I find that behavior to be confusing and unintuitive, but that’s what it does I guess.

Topic		Replies	Views
C4 W2 Assignment 2 Exercise 2 (Alpaca model): Apply data augmentation to the inputs Convolutional Neural Networks week-2	3	259	March 27, 2024
Week 2 mobilenetv2 can't debug Convolutional Neural Networks	4	676	March 14, 2025
Transfer Learning and DA on my own set of images - Course 4, Week 2 Convolutional Neural Networks	5	544	July 10, 2022
Data augmentation on validation set Convolutional Neural Networks in TensorFlow week-1	2	496	March 14, 2025
Stuck in alpaca model Convolutional Neural Networks	6	571	July 21, 2021

Doesn't Data Augmentation via Keras Preprocessing layers also augment the validation set?

Related topics