Course 4 Week 2 Programming Assignment 1 Optional Question

Did anyone get reasonably correct results on their own images with the model that classifies the images of numbers in the sign language.

The outputs of the model on my own images are completely incorrect. Of the six images I tried ALL were incorrect. I even tried rotated my images clockwise by 90 degrees but still the same result. Not sure if I am doing something wrong. It does not seem to be lighting or clarity of picture as mentioned in the note of the question. That seems surprising for a model that claims to have greater than 70% accuracy.

Centering, lighting, scaling, color tones, and overall image characteristics are important when providing your own images.

Note that this is Week 1 Assignment 2, not Week 2 Assignment 1. I will edit the title using the little “edit pencil” to correct that and the tag.

There are several possibilities here. The first thing to check is that you performed the same scaling on your sample input images as was done with the training data, as shown in this cell that was given to you:

X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

If you did that correctly and still got no correct predictions, then the other more general possibility is that the dataset here is not large enough to get a generalizable model for an image problem like this. We have only 1080 training samples.

Also consider Tom’s points about how similar the composition of your sample images is to the ones in the training set. Print a few of both and visually compare them.

Thank you @paulinpaloalto and @TMosh for your responses.

I rechecked and I am actually asking about Week 2 Assignment 2 only. This is for building a ResNet and train it using the SIGNS dataset.

I will try to compare and match up the scaling of my samples with the training set.

Oh, sorry, you’re right that the Residual Networks assignment also uses the same “signs” dataset as the W1 A2 assignment does. Residual Networks is W2 A1. I restored the title to the way you had it. Sorry again.

But the points that Tom and I made are still applicable. Let us know what you think after considering those points.

Just refreshing my memory on the Residual Networks assignment, the model we train in the notebook for 10 epochs gets 95% training accuracy and 80% test accuracy. Notice that they also give us a pre-trained model that produces 95% test accuracy. They don’t give very much detail about how that pre-trained model was trained. E.g. did it use a larger training set than the 1080 sample version we used? Or did they just run the training for a lot more epochs?

In your experiments, were you using the pre-trained model or the model that you actually trained in the notebook for 10 epochs?

Looks like this optional exercise uses the pre_trained_model. The code was already there. I was just updating the input image name and running the code.
prediction = pre_trained_model.predict(x2)
The code seems to be setting the target image size as 64, 64. May be my aspect ratio is not correct as it was uploaded AS IS taken from a mobile phone.

Well, that given code block displays the image in its 64 x 64 reduced form. What does that look like with your cellphone created image? You could paste a screenshot here. In principle, one wouldn’t think that aspect ratio would make that much difference in recognizing shapes that are pretty distinctive as in this case, but maybe my intuition there is just incorrect.

You could also perform some more direct experiments about aspect ratio: at least on an iPhone, you can crop an image in the Photo App. So take a picture of your hand and then crop the image so that it is essentially square, but still contains the complete image of your hand. Then the downsampling to 64 x 64 shouldn’t affect the aspect ratio. Does that make a difference?

Science! :nerd_face:

Supposed to be 0; Classified as 2 (not sure why?)

nn_0_64x64reduced

Original image: 1183x1183 resolution

Interesting. I would say that your picture looks very similar to the ones they show from the training dataset.

Do you have more examples of misclassifications? Is there any pattern? E.g. everything gets classified as 2?

I can provide examples, if you would like, but every one of them is getting misclassified to different classes (not necessarily to 2). Either a problem with the model (it is pre trained though so nothing I could have done wrong there!), or something else wrong I am doing.

0 classified as 2 (incorrect) already showed before.
1 classified as 1 (correct)


2 classified as 1 (incorrect)

3 classified as 2 (incorrect)

4 classified as 2 (incorrect)

5 classified as 2 (incorrect)

Thanks very much for the detailed response. It looks like you did a great job with creating the square images.

My next thought was to try the pre-trained model with some of the actual training images and see how it does. Then I took a look under the “images” subdirectory and they give you 3 other images for 1, 3 and 5. I tried predicting them and (just like in your examples) only the sample for 1 gets predicted correctly.



But there also is something odd going on with how the image library is downsampling those images. I looked at them in their native state and they are getting rotated by the “load + downsample”. Not sure if that might be part of the problem. Do you see your images in the correct orientation when you run that prediction cell as I showed above?

Hello @paulinpaloalto , thank you taking a detailed look at the issue. In the beginning when I was uploading my images at 16:9 aspect ratio, (uploading as is from mobile) they were getting rotated (90 degrees counterclockwise) exactly like you have shown.

I tried to rotate that (16:9 aspect ratio) image 90 degrees clockwise (because the output seemed to be rotating 90 degress counterclockwise) to see if that helps. Interestingly, the image shown was still the same (now 180 degrees rotated counterclockwise).

When I tested on completely square (1:1) aspect ratio images there was no rotation. There was downsampling, which is expected. But the outputs were incorrect as I mentioned before. Out of 0 to 5 only 1 was classified correctly.

I am bit surprised as the validation script shows close to 95% accuracy. It would be interesting to see the test set that was used to arrive at this 95% accuracy.

You can display individual test set images like this:
image

I ran this code and the only output class I got was 0. There is definitely something wrong that I am not able to understand. This is with the pre_trained_model that is just loaded. I get the same “Class = 0” result if I use the model that I built as part of the assignment and got 100% grade.

print(Y_test.shape)
for i in range(len(X_test)):
    img = X_test[i,:,:,:]
    # imshow(img)
    # show()
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = x/255.0
    x2 = x 
    # print('Input image shape:', x.shape)
    prediction = pre_trained_model.predict(x2, verbose=0)
    OR
    prediction = model.predict(x2, verbose=0)
    # print("Class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ", prediction)
    print("Sample #" + str(i+1) +" = Class: " + str(np.argmax(prediction)) + "; Expected Y: " + str(Y_test[i]))```

Try just using “p = model(X_test)”, without a for-loop or any of your pre-processing, and then use argmax on axis = 1.

image

1 Like