Trained model on cifar10 performs poorly on real images

So I’m trying to train a model using the CIFAR10 dataset.

The problem is that while the performance of the model on validation and test sets are good (about 95-96%), the model fails to predict images downloaded from the internet (preprocessed like with the inputs). I know that there are a lot of similar questions already, as a matter of fact, I’ve already tried implementing those suggestions but none works for me so far. I don’t know what I did wrong.

Here’s my approach:

  • Data: test – 80%, validation – 10% (input of val_dataset in model.fit), test – 10% (input of model.evaluate)
  • Model (transfer learning): I use ResNet50 as the base model, use “imagenet” weights and didn’t freeze them.
    UpSampling2D((7,7)) to resize images from 32x32x3 to 224x224x3 → ResNet50 → Flatten → Dense (relu) → Dropout → Dense(10, softmax).

Is it because the resolutions of images from this dataset is too low, while the resolutions of images downloaded from the internet are much higher? I don’t know how to describe this, but even after resizing the downloaded images to 32x32x3 to feed in model.predict, they still look kinda different from the images in the dataset?

An image from the dataset:
enter image description here

Downloaded image (original):
enter image description here

Downloaded image (resized to 32x32x3):
enter image description here

Is that the issue? If not, can you tell me what I did wrong, or what I could do so that my model can perform well on real life images?

Many thanks.

Update: How I downsample the downloaded images (after suggestions):

#loading image
dir = "image.jpg"
im_array2 = np.array(Image.open(dir))

IMG_SIZE = 32

#calculate the dimensions to rescale while still keeping aspect ratio
r = IMG_SIZE / im_array2.shape[0]
dim = (IMG_SIZE, int(im_array2.shape[1] * r))

test_image = np.floor(tf.image.resize(im_array2, dim)).astype(int)

#crop the image to size 32x32
test_image = tf.image.resize_with_crop_or_pad(test_image, IMG_SIZE, IMG_SIZE)

#convert to float32 because that is the input type that I set, expand_dims because the input takes image in batches
im = np.float32(test_image)
im_array2 = np.expand_dims(im, axis=0)

#Normalize the image array to fit the input setting
im_array2 = im_array2/255

New result:
enter image description here

But the performance still isn’t improved.

Hello @CheeseCheddar

Can I know if you froze all the layers in base model? and where is your new model ?

Also your model.fit step, how many epoch training you did?

Also try to use sigmoid activation.

Regards
DP

Hi @Deepti_Prasad, thanks for your interest in helping me.

I didn’t freeze the layers of my base model. In my previous attempts I did freeze them though, but it failed to perform well on real images, so I tried not freezing them but the problem still persists. (Is this the right problem-solving mindset for this task?) The model architecture I’m using is
UpSampling2D((7,7)) to resize images from 32x32x3 to 224x224x3 → ResNet50 (without freezing the layers) → Flatten → Dense (relu) → Dropout → Dense(10, softmax).
Or did you mean the detailed code for the model?

I implemented early stopping so the training stops at epoch 32, with best weights obtained in epoch 24 (val_accuracy: 0.9630, accuracy in model.evaluate on the unseen test dataset: 0.9589)

Please correct me if I’m wrong, but does the fact that this model also performs well on the test dataset (unseen data) means that it is working how it is supposed to do, and the problem lies in the images that I downloaded from the internet not being from the same distribution as the dataset, and the preprocessing process of just resizing those downloaded images is not enough? Or am I making some fundamental mistakes somewhere in model designing?

Thank you in advance!

Hello @CheeseCheddar

Did you complete tensorflow advanced specialisation course 4?

although you did share this part

I didn’t notice your model compile statement where you apply optimizer, metric and loss??

Also the last dense layer where you applied softmax means you have more than two classes per se if your image is only about cats then that would be incorrect. I hope your image include two categories of image such as cats and dogs, and then you apply sigmoid activation and not softmax.

you are training on a small batch but using a more complex model. even if you are using the same model, I am not sure if your model architecture is completely correct.

In the assignment where we use cifar10, the batch size with shuffle buffer size was used as parameter and cifar10 model was used.

Then as you have used transfer learning, there is no bottleneck layer and also you are suppose to use convolution layer with padding. No flattening layer, or dense layer.

then use model compile, apply optimiser and loss and metric.

make sure with epoch, you need to include training_steps and validations steps like this :point_down:

parameters (feel free to change this)
train_steps = 50000 // BATCH_SIZE
val_steps = 10000 // BATCH_SIZE

Regards
DP