Prediction accuracy is related to input size

hello
i created a simple neural network for handwritten digits recognition with digits dataset in scikit learn. i implemented functions like initialize_parameters , forward_propagation, gradient_descent and other functions like what we did in the programming assignment and my model accuracy is 98 percent . but there is an issue with predictions i wrote a number and i test it but the prediction was wrong . i tried the first 4 image from X_train[:4] and the result was wrong like [2, 4, 7, 1] and when i also tried first 5 images X_train[:5] the result was wrong too and also the results for previouse images was also changes [3, 6, 1, 1, 5].
turned out when the size of the input is small the accuracy is very low but when the input size is for example 100 the model can predict most of them correctly. i’m sure that i implemented all functions correctly
do you know whats the problem

The problem as far as I understand is “when the input dataset is small, accuracy is bad”, right?

Assuming all else is implemented right and the training, validation and testing dataset elements are of the same distribution (meaning they are of similar nature)!

I guess that when using small set then the the model has not seen those specific images during training because the accuracy of the model should be a final aggregate of what is wrong and right outputs! I would say; training further the model and making sure that the distribution of images is the same during all stages should improve the accuracy for small sets too!

multi_layer_nn.pdf (494.8 KB)
if it’s possible please look at my code

I have not looked at your code.

In order to identify handwritten digits, the standard training set has 500 examples per digit (5,000 examples in total).

This lets the model handle a lot of variation in the shapes of the digits.

If you don’t include enough training examples, the model will not be very useful.

I don’t understand what you mean here. Looking at your output, it shows 100% accuracy on the training set and 98% on the test set. And then you are checking predictions on the training set, so how could they all be wrong if the model has 100% accuracy on the training set. The size of the training set is 1203, so then the question what the resolution is on the accuracy computation. Even if it’s 0.5% (which would be pretty low resolution), 0.005 * 1203 = 6. So it’s not plausible that you randomly test 5 sample predictions and they are all wrong.

I think this indicates a bug somewhere. Either your accuracy numbers are wrong or there is something wrong with your methodology for testing predictions made by the model.

Also just as a general point, notice that the data here are pretty low resolution. It’s not the standard MNIST handwritten digit dataset, in which the images are 28 x 28 greyscale images. The input images here are 8 x 8. Once you get things working here, it might be worth redoing this with the real MNIST data. That data set is quite a bit larger as well.

thanks for your reply. i have a predict function that takes data and parameters. the parameters are weight and biases after training. as you can see in the image the y_train are the correct labels but when i predicting first two element of x_train the result is wrong but when i predict first 10 element of x_train just one of them is wrong . i wonder why this is happening for example as you can see the result for first two element is [9, 2] but the result for first three element is [9, 6, 8] i dont know why the label for second image chanfged from 2 to 6 while params are fixed

There must be something wrong with the predict logic. Why doesn’t it give the same results if you run it twice with the same inputs? That logic must be broken. You have scientific evidence that it does not work in a predictable way (pun intended). Now your job as a scientist is to explain that behavior.

X_train[:,0] is the first input to every one of those invocations of predict and you get a total of 3 different answers for the first output prediction. So how can that happen?

My first step would be to run a really pure experiment. Just run this multiple times:

predict(X_train[:,0], params)
predict(X_train[:,0], params)
predict(X_train[:,0], params)

What happens when you do that? If that works reliably, then try this:

predict(X_train[:,:3], params)
predict(X_train[:,:3], params)
predict(X_train[:,:3], params)

Science! :nerd_face:

the outputs for predict(X_train[:,:anynumber], params) output is always fixed
i think here is the problem when i y_hat (the output of softmax)
for predict(X_train[:,:2], params)

[[9.2785972e-01 7.2140321e-02]
 [9.3808633e-01 6.1913725e-02]
 [1.9603858e-03 9.9803966e-01]
 [1.9122391e-04 9.9980885e-01]
 [9.9997079e-01 2.9211138e-05]
 [3.3404419e-01 6.6595578e-01]
 [9.9990594e-01 9.4033923e-05]
 [8.9336297e-04 9.9910659e-01]
 [9.9303973e-01 6.9602509e-03]
 [2.7961735e-02 9.7203833e-01]]

we know in softmax the sum of column must be one but i dont know why here the sum of row is one
and when the input size is one like predict(X_train[:,9:10], params)
the out put always is zero because the output of soft max is always
[[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]]

If you are using TF softmax, in TF everything is in “samples first” orientation at least by default, right?

But if your training worked, you must have handled that correctly in that case. So what is different about this case?

I took a look at the code you sent us yesterday and I’m now worried that the problem is actually earlier. Rather than use TF Dense layers, you construct them by hand with matrix multiply and do it the way Prof Ng does with features by samples orientation of the data. But then you use TF softmax and categorical_crossentropy with the default options, which I fear means you may not be getting the results you think you are. So before going any further with the predict code, I would put instrumentation into your forward prop code to make sure you have implemented that correctly.

Also note that you convert y_train to be “one hot”, but I don’t see anywhere that you do that same thing to y_test.

I’m just looking at the PDF file that you posted earlier on this thread in all the above analysis.

i write the softmax function instead of tf.nn.softmax and the problem solved

def softmax(z):
    return tf.exp(z)/tf.reduce_sum(tf.exp(z), axis=0)

and also i find the correct implementation of my code to use tf.nn.softmax with chatgpt it just changed the shapes and it is a bit confusing

also thanks for your answers :pray:

That’s great that you got things to work. I didn’t actually try coding it your way, but my reading of the documentation is that you could also have solved it by supplying the axis = 0 argument to tf.nn.softmax and to the cross entropy loss function as well.

1 Like