In the first dog vs cat classifer colab, I uploaded a few images of dogs and all of them were classified as dogs (the softmax probability of a dog is always minuscule e.g. 1e-38).
Example of image: https://cdn.britannica.com/16/234216-050-C66F8665/beagle-hound-dog.jpg
Did I miss something or there is a problem?
PS: It will not run as is, I did change the following line of code:
# from keras.preprocessing import image
from tensorflow.keras.preprocessing import image
I think that it is a problemm with the format of the image, It’ is a webp, but if you transform the same image to png or jpg you are going to get the correct classification.
I think that al the images used in training are in .jpg format. Is good to use the same format in inference if you want to get reliable predictions.
I made a jpeg and a png of this image. Neither one worked correctly. In fact, the probability was the same. And this image was just one example. I tried others, such as my own jpeg of a dog.
Did you try to convert this picture as a jpeg and feed it to the classifier?
When i download the original image i get the wrong result classifying it as a Cat. And I did a Screen capture of the image and saved it as a .png and then I get a the correct result classifying the image as a Dog.
Which accuracy are you getting with the validation images? I can’t remember exactly my, but it was more than 90%.
PPS: For note, the same image in webp format does identify as a dog, which I expected. Tensorflow is Google, webp is a Google creation, so I would have been a tad surprised if tenserflow could not handle webp images and I would have expected an error message when loading the image if that had been the case.
I think that we can be affronting a little bias in the training images. Have you ever heard about the tank recognition legend in deeplearnig?
“A cautionary tale in artificial intelligence tells about researchers training an neural network (NN) to detect tanks in photographs, succeeding, only to realize the photographs had been collected under specific conditions for tanks/non-tanks and the NN had learned something useless like time of day. This story is often told to warn about the limits of algorithms and importance of data collection to avoid “dataset bias”/“data leakage” where the collected data can be solved using algorithms that do not generalize to the true data distribution, but the tank story is usually never sourced.” https://gwern.net/tank
It’s difficult to say what problem we have with the Training Dataset, maybe we have more cats with a green background and a sand floor than dogs.
Sorry to introduce the misunderstunding of the image format!
Yes I have heard that story a long time ago.
I am sure there is a bias of some form. It’s interesting, though frustrating, to hit it on the first lab.
Thanks for the help.
Cheers.
Object recognition is quite tricky, because the algorithm looks at every pixel, but your human brain has already learned to filter out specific characteristics and identify the subject of the image.
An algorithm that can do that is extremely sophisticated.
Another classic dilemma is the “cat or not cat?” example. There are a nearly infinite number of things which are not cats, so creating such a training set is problematic.
Please fix a defect in “C3_W1_Lab_1_transfer_learning_cats_dogs”. The scaling of X is missing during prediction, while it is applied during training and validation. Thus, please add scaling (3d line below):
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x /= 255. # this line has to be added for fixing the Defect related with scaling
I tested the fix, and it works: cats are cats, and dogs are dogs in my testing set!