Training model to identify digits - issue with incompatible shape

Hey all,

I’ve been working on training a simple model to identify digits - similar to the week 2 lab. However, I am using the MNIST dataset that can be imported with tensorflow.keras.datasets:

from tensorflow.keras.datasets import mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

The resulting X_train dataset has a shape of (60000, 28, 28) because they are 28x28 pixel images. I then “un-roll” the images to be 1d arrays, so the new shape is (60000, 768). I do the same “un-rolling” procedure with the X_test testing data.

After doing this, I specifiy my model, compile it, and fit the training data. All of this seems to go fine.

However, when I try to predict a digit from the test set, like so:

prediction_p = model.predict(X_test2[0])

I run into the following issue:

WARNING:tensorflow:Model was constructed with shape (32, 784) for input KerasTensor(type_spec=TensorSpec(shape=(32, 784), dtype=tf.uint8, name=‘dense_18_input’), name=‘dense_18_input’, description=“created by layer ‘dense_18_input’”), but it was called on an input with incompatible shape (None,).

It then proceeds to give me a further error.

As far as I can tell, all of the shapes of my training and testing data are what they should be, so I’m not sure what to do to fix this issue.

Here is my complete code so that anyone can reproduce the error I am receiving:

import math
import numpy
import matplotlib.pyplot as plot
import tensorflow as tf

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import linear, relu, sigmoid

from itertools import chain


#Load the dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

#Show a few of the digits, just for the sake of a sanity check
for i in range(9):
    plot.subplot(330+1+i)
    plot.imshow(X_train[i], cmap=plot.get_cmap("gray"))
plot.show()

#Unroll the 2d arrays into 1d arrays
X_train_temp = []
for i in range(0, 60000):
    X_train_temp.append(numpy.array(list(chain.from_iterable(X_train[i]))))
X_train2 = numpy.array(X_train_temp)

Y_train2 = numpy.array([Y_train])
Y_train2 = Y_train2.T

X_test_temp = []
for i in range(0, 10000):
    X_test_temp.append(numpy.array(list(chain.from_iterable(X_test[i]))))
X_test2 = numpy.array(X_test_temp)

#Create a neural network model
model = Sequential([
    Dense(25, activation="relu"),
    Dense(15, activation="relu"),
    Dense(10, activation="linear"),
], name = "digit_id_model")

#Specify the loss function and also 
#indicate to use the Adam's optimizer for the learning rate
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(0.001)
)

#Fit the model
model.fit(X_train2, Y_train2, epochs=40)

#Now let's try making some predictions
for i in range(0, 10):
    prediction_p = model.predict(X_test2[i])
    yhat = numpy.argmax(prediction_p)
    print(f"Label: {Y_test[i]}, Prediction: {yhat}")

Any help would be appreciated! Thanks!

1 Like

So you loaded X_test from Keras, but you’re predicting with X_test2.

Why?

Because I reshaped both X_train and X_test, so the reshaped versions are X_train2 and X_test2. I trained on X_train2 and am testing with X_test2.

Try printing out the shapes of all of your training and test data sets.

Already done that. As I said previously, the shapes are as I expect them to be.

Hello David, I meet with a different error but the source of problems should be the same. Here is the error I see:

This is because you provided a dataset of size (60000, 768) for training, but a dataset (even it is just a sample) of size (768, ) for prediction. To pass the prediction, you need to give a dataset of size (1, 768) or (any number of samples, 768). For example, predictions are made on all test samples by,

Screenshot from 2022-09-10 07-57-33

If you prefer to predict on just one sample, you might do this

Screenshot from 2022-09-10 08-04-31

The additional square brackets makes it return a 2D array of shape (1, 768).

Lastly, if I may make a small suggestion, you might “unroll” the images with a single line:

Nice work, keep trying!
Raymond

Awesome thanks for that input. Since my X_test2 shape was already (10000, 768), I didn’t even think about having to reshape an individual element such as X_test2[0] to be something like X_test2[[0]]. Having done that, it does work.

Thanks!

1 Like

You are welcome David!