How to use model2 to predict one alpaca image instead of batch

I finished the programming assignment for the transfer learning on the alpaca model and out of curiosity, I wanted to try predicting a new image. This is what I tried:
image_location = ‘images/snowalpaca.png’

img = tf.keras.preprocessing.image.load_img(
img_array = tf.keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
image_var = tf.Variable(img_array)
predictions = model2(image_var)
#predictions = model.predict(img)

The output is:

tf.Tensor([[-4.2238126]], shape=(1, 1), dtype=float32)

Is this correct? How do I make sense of this prediction: -4.2238126?
Do I need to use this: tf.keras.applications.mobilenet_v2.decode_predictions(predictions.numpy())?

Or should I do something like this:

score = tf.nn.softmax(predictions[0])

“This image most likely belongs to {} with a {:.2f} percent confidence.”
.format(class_names[np.argmax(score)], 100 * np.max(score))

Something is deeply wrong there. The predictions are a softmax output, so how could it possibly be a negative number or a number with absolute value > 1?

They gave an example of how to apply the earlier model to a batch of inputs. Try copying that for model2 with a couple of things in mind:

  1. The images need to be normalized, not “raw” pixel values. They gave you the code to do that.
  2. Just declaring one image to be a tf.Variable is not the same thing as a TF batch. That’s a different class.

Does the preprocess_input normalize it? So do I add this code before the tf.Variable:
img_pred = preprocess_input(img_array)

I don’t have a batch. Just a single image. so I put the image_array through this first:

img_array = tf.expand_dims(img_array, 0) # Create a batch

I tried this:

image_location = 'images/snowalpaca.png'
input_shape = image_shape + (3,)

img = tf.keras.preprocessing.image.load_img(
img_array = tf.keras.preprocessing.image.img_to_array(img)
# data preprocessing using the same weights the model was trained on
img_array = preprocess_input(img_array)
img_pred = tf.expand_dims(img_array, 0) # Create a batch
image_var = tf.Variable(img_pred)
# img_preprocessed = tf.keras.applications.mobilenet_v2.preprocess_input(image_var)
predictions = model2(image_var)
#predictions = model.predict(img)

but I still can’t make sense of the prediction:

tf.Tensor([[4.0539403]], shape=(1, 1), dtype=float32)

Sorry it took me a couple of days to get back to this question.

Yes, the preprocess_input function does the normalization of the images to the range [-1, 1]. But note that we built that into the model2 function, right? Unlike the “base model” that we got from Keras, our model should take care of that for you. So you feed it “raw” unnormalized images with uint8 pixel values.

I think there are potentially two issues here:

Just adding a trivial “samples” dimension to the single image is not the same thing as making a “batch”. That is a different class in TF than a single Tensor, right? But maybe it just works and the real problem is the next one.

The other issue is that you can’t just directly print the return value of the model. They show us how to deal with that in one of the cells in the notebook:

base_model.trainable = False
image_var = tf.Variable(image_batch)
pred = base_model(image_var)

tf.keras.applications.mobilenet_v2.decode_predictions(pred.numpy(), top=2)

Try using the decode_predictions function the way they show and see if you get a more sensible result.

Sorry! I just tried the thing I suggested there and it doesn’t work. The decode_predictions expects the output to be from the ‘real’ model which has 1000 output classes, so it expects (samples, 1000) in one hot form.

Ok, here’s my final recipe:

# Try running predict on a single image

image_var = tf.Variable(augmented_image)
print(f"image_var shape {tf.shape(image_var)}")
plt.imshow(augmented_image[0] / 255)

pred_logit = model2(image_var)
pred = tf.math.sigmoid(pred_logit)

print(f"pred shape {tf.shape(pred)}")
print(f"pred {pred}")

I get this output:

image_var shape [  1 160 160   3]
pred shape [1 1]
pred [[0.98496515]]


Note that I happened to pick an image sitting in a tensor in the notebook that is already 4D with the samples dimension, so I did not have to change that. The issue is that we defined the model to have no activation at the output layer and then used binary cross entropy loss with from_logits = True mode. So when we do a predict with the model, we apparently need to manually add the sigmoid calculation. At least that’s my conclusion here.

1 Like

Thank you! I also tried this:

> def prepare_image(file):
>     img = tf.keras.preprocessing.image.load_img(
>     image_location, 
>     target_size=IMG_SIZE)
>     img_array = tf.keras.preprocessing.image.img_to_array(img)
>     img_array_expanded_dims = np.expand_dims(img_array, axis=0)
>     return tf.keras.applications.mobilenet.preprocess_input(img_array_expanded_dims)
> image_location = 'images/snowalpaca.png'
> img_pred = prepare_image(image_location)
> predictions = model2.predict(img_pred).flatten()
> # Apply a sigmoid since our model returns logits
> predictions = tf.nn.sigmoid(predictions)
> predictions = tf.where(predictions < 0.5, 0, 1)
> print('Predictions:\n', predictions.numpy())

That looks good, but I’m a bit worried about using preprocess_input there. As I commented in my previous reply, our model2 implementation includes preprocess_input as an internal step. I’m not sure whether that provided function is smart enough not to do the normalization calculation if it can tell that the input is already normalized, but my guess is that it’s not that smart.

Thank you for responding. Oh I see what you mean by it already being normalized when I call the model2.predict(). I’ll see if I can use just the img_array_expanded_dims as the input. Maybe it will improve the prediction as I was not getting good predictions.

I noticed there is also a mobilenet_v2.preprocess_input. Do you know the difference between the two?
Like this:
def preprocess(images, labels):
return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels