Colab 1 classifies every picture as a cat

In the first dog vs cat classifer colab, I uploaded a few images of dogs and all of them were classified as dogs (the softmax probability of a dog is always minuscule e.g. 1e-38).
Example of image: https://cdn.britannica.com/16/234216-050-C66F8665/beagle-hound-dog.jpg
Did I miss something or there is a problem?

PS: It will not run as is, I did change the following line of code:

# from keras.preprocessing import image
from tensorflow.keras.preprocessing import image

Hi @FunMiles,

I think that it is a problemm with the format of the image, It’ is a webp, but if you transform the same image to png or jpg you are going to get the correct classification.

I think that al the images used in training are in .jpg format. Is good to use the same format in inference if you want to get reliable predictions.

Regards!

I made a jpeg and a png of this image. Neither one worked correctly. In fact, the probability was the same. And this image was just one example. I tried others, such as my own jpeg of a dog.
Did you try to convert this picture as a jpeg and feed it to the classifier?

Oh yes! I tried!

When i download the original image i get the wrong result classifying it as a Cat. And I did a Screen capture of the image and saved it as a .png and then I get a the correct result classifying the image as a Dog.

Which accuracy are you getting with the validation images? I can’t remember exactly my, but it was more than 90%.

Accuracy with the validation images was over 95%.
I am attaching my screenshot:


And it gave me this:

Can you try with that image?

PS: This cropped version of this image identifies as a dog. So the new question is: What goes wrong with the original image vs this one?


PPS: For note, the same image in webp format does identify as a dog, which I expected. Tensorflow is Google, webp is a Google creation, so I would have been a tad surprised if tenserflow could not handle webp images and I would have expected an error message when loading the image if that had been the case.

I cropped the image too in my first try!

I think that we can be affronting a little bias in the training images. Have you ever heard about the tank recognition legend in deeplearnig?

“A cautionary tale in artificial intelligence tells about researchers training an neural network (NN) to detect tanks in photographs, succeeding, only to realize the photographs had been collected under specific conditions for tanks/non-tanks and the NN had learned something useless like time of day. This story is often told to warn about the limits of algorithms and importance of data collection to avoid “dataset bias”/“data leakage” where the collected data can be solved using algorithms that do not generalize to the true data distribution, but the tank story is usually never sourced.”
https://gwern.net/tank

It’s difficult to say what problem we have with the Training Dataset, maybe we have more cats with a green background and a sand floor than dogs.

Sorry to introduce the misunderstunding of the image format!

Yes I have heard that story a long time ago.
I am sure there is a bias of some form. It’s interesting, though frustrating, to hit it on the first lab.
Thanks for the help.
Cheers.

Object recognition is quite tricky, because the algorithm looks at every pixel, but your human brain has already learned to filter out specific characteristics and identify the subject of the image.

An algorithm that can do that is extremely sophisticated.

Another classic dilemma is the “cat or not cat?” example. There are a nearly infinite number of things which are not cats, so creating such a training set is problematic.

Please fix a defect in “C3_W1_Lab_1_transfer_learning_cats_dogs”. The scaling of X is missing during prediction, while it is applied during training and validation. Thus, please add scaling (3d line below):

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x /= 255. # this line has to be added for fixing the Defect related with scaling

I tested the fix, and it works: cats are cats, and dogs are dogs in my testing set!

This is much more satisfying. I am marking it as the solution.
There is also a need, as mentioned in my original post, to update the import to:

from tensorflow.keras.preprocessing import image

Now the beagle is happily a dog. :slight_smile:

@Pere_Martra, are you able to submit a ticket to have this issue fixed?

Thanks @TMosh and @James_J_Johnson and @FunMiles

I’m opening a ticket with your findings, hope they can update the notebook soon!

The fix proposed by James_J_Johnson worked for me.

New data URL:

#data_url = "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip"
data_url = "https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip"

Code fixes in last cell:

import numpy as np
from google.colab import files
from keras.preprocessing import image
import tensorflow as tf                # Added

uploaded = files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path = '/content/' + fn
  #img = image.load_img(path, target_size=(150, 150))  # Removed
  img = tf.keras.utils.load_img(path, target_size=(150, 150)) # Added
  #x = image.img_to_array(img)            # Removed
  x = tf.keras.utils.img_to_array(img)    # Added
  x /= 255.                               # Added
  x = np.expand_dims(x, axis=0)

  image_tensor = np.vstack([x])
  classes = model.predict(image_tensor)
  print(classes)
  print(classes[0])
  if classes[0]>0.5:
    print(fn + " is a dog")
  else:
    print(fn + " is a cat")