I uploaded my own test image ( 2 fingers) but the pre-trained Resnet model gave an incorrect result. It outputs y=2 while the correct output should be y=1.
It is mentioned that the model might give incorrect results on one’s own test image and asks us to identify the cause.
It gives us the following Hint: It might be related to some distributions.
However, I am unable to figure out what it means. Can someone explain’ the cause of the error?
Looks like pretty clean and clear. So, if we train our network with these data, then, we also need to expect the same level of input data for inferences. That’s Tom suggested.
Then, the next thing is what we can do for training to cover broader range of input data ?
Here is one paper that worked on a sign recognition.
They used different models with different sizes of data. They found that a weight initializer and augmentation technology could improve the quality of F score, which is the ratio among True positives, False negatives and False positives. (Higher is better.)
So, we should try to use different initializer other than random_uniform for identity_block. And, we can also slightly transform data with zoom, rotation, height/width shift and channel_shift (color).
Here is an example.
But, I think this is not enough. I think a big challenge is this very small image size, 64x64x3.
From the above paper, we probably need 224x224x3, or at least 192x192x3 to get a high quality result.
Thank you @anon57530071 for the explanation.
Just to paraphrase what you said: We need could use different initialization and data augmentation for training to improve our model.
I also think that the small size of the image might be a problem here.
Is it because the ResNet model which we are using is pre-trained on a higher resolution image and when we are giving it a lower resolution image to predict, it is likely to predict it incorrectly?
Yes, different weight initialization and data augmentation can improve our model.
And, as you thought, image size is a big problem. Default image size for ResNet50 in tensorflow is 244x244x3. (VGG19 is as well.)
For your 2nd question, as we code weight initializers for all, this is our model, not a pre-trained model.
Of course, you can use pre-trained model like this.
Typically, for educational course, small images are used due to hardware and time restrictions. For our personal testing, it may be better to use slightly larger images.