Alpaca not recognized as llama?

Why is alpaca recognized mostly as ‘window screen’ or ‘jigsaw puzzle’ by MobileNet pretrained over ImageNet but never as llama (using just the original last block of 3.1 section code in the Jypyter notebook)? The comment there says “This is because MobileNet pretrained over ImageNet doesn’t have the correct labels for alpacas, so when you use the full model, all you get is a bunch of incorrectly classified images.”. But that does not explain why it is not even once recognized as llama which is very similar to alpaca but rather as a bunch of nonsense. Is it because the image resolution or generally input distribution is very different than used by the original classifier so it sees it as if through a window screen or a puzzle? Even then I would expect at least some llama hits but there are none even if I use top=5…

Interesting question! It’s not the resolution issue, because there is no such thing as a model that handles different resolutions. Every model is trained on a specific image size and type and it either works or doesn’t on that image type and size. If you want to feed images of a different type and size to the model, you first have to convert them to the format it was trained on.

You’re right that you would intuitively expect that it would label alpacas as some other type of animal that it was trained on rather than random unrelated objects. An alpaca looks more like a llama or a camel than it does a window screen. Are you sure you’re using the complete model? Did they tell us anything about the properties of that pre-trained model? What is its prediction accuracy on the training and test sets that were used to train it? It might be worth doing a visual comparison between a sampling of the training set versus our alpaca images. Is there anything obviously different about the two datasets other than the animals themselves?

We need more information/investigation to answer your questions, but I agree there is something to be explained here.

Another possible source of trickery is the class index to human readable text translation at the very end. I can’t see the code since I don’t have an active subscription, but often a simple dictionary lookup is used to do this at the end of an exercise. If the dictionary was different, the human readable label would be wrong even if the numeric class index had been correct.

Thanks for your prompt answers! I believe I know the answer now: The code in that section of the programming assignment does not use data preprocessing (normalization). If I simply use division by 255 then I get a lot of llama or camel predictions, as expected!

I think it may be useful to adjust the programming exercise accordingly? ie add input normalization in the section 3.1 (and adjust the text accordingly as well)? Just to note, 3.1 is a section with no graded functions / exercises, it just shows the creation and predictions of the pre-trained model but it fails to use the input normalization.

1 Like