I used the following code but each time I run with different values of “m” I get the same prediction i.e 4. I don’t know what the problem is because the test accuracy from the assignment is 95%. What is the reason for this?
You are misinterpreting the structure of Y_test
. Print the shape:
print(f"Y_test.shape {Y_test.shape}")
Y_test.shape (120, 6)
So Y_test[0][m]
has nothing to do with sample m. The actual prediction is a 6 element softmax output and the corresponding Y_test
value is a one hot vector. Try this:
label_value = np.argmax(Y_test[m])
Sir, I tried it but still got the same results. ie A constant output. I don’t know what the problem is because the model has an accuracy of 94% in the assignment but still isn’t working out ??
For some reason that I haven’t figured out yet, running predictions on a single sample with this model just doesn’t work. But if you run it with a batch of inputs, it’s fine. Try this sample code:
# Predict on multiple samples
print(f"type(X_test) {type(X_test)}")
print(f"X_test.shape {X_test.shape}")
# all_preds = model.predict(X_test)
# all_preds = model(X_test, training = False)
all_preds = model(X_test)
print(f"all_preds.shape {all_preds.shape}")
my_list = (5, 12, 23, 42, 100, 119)
for ii in my_list:
print("ii = ", ii)
print("Class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ", all_preds[ii])
print("Predicted class = ", np.argmax(all_preds[ii]))
print("Label = ", np.argmax(Y_test[ii]))
plt.imshow(X_test[ii])
plt.show()
Notice that I have 3 different ways to invoke the model there and all work the same for me.
But if I rearrange things like this with a single sample predicted at a time, it doesn’t work at all. But note that I don’t always get 4 as the output. On a few, I get 5. Try it and let me know what you see:
# Predict on one sample at a time
print(f"X_test.shape {X_test.shape}")
my_list = (5, 12, 23, 42, 100, 119)
for ii in my_list:
print("ii = ", ii)
cur_sample = X_test[[ii]]
print(f"type(X_test[[ii]]) {type(cur_sample)}")
print(f"X_test[[ii]].shape {cur_sample.shape}")
# one_pred = model.predict(cur_sample)
# one_pred = model(cur_sample, training = False)
one_pred = model(cur_sample)
# one_pred = model.__call__(cur_sample)
print(f"one_pred.shape {one_pred.shape}")
print("Class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ", one_pred)
print("Predicted class = ", np.argmax(one_pred))
print("Label = ", np.argmax(Y_test[ii]))
plt.imshow(X_test[ii])
plt.show()
It turns out that this issue came up a while back and we figured out that it was a problem in how the notebook code handles the training
parameter for BatchNormalization. Here’s a big thread about it. I filed a bug about this with the course staff at the time but it has not yet been acted on.
See here on why predict is recommended for prediction instead of relying on __call__
. It’s safe to use predict irrespective of the batch size as long you don’t need to track the gradients.
Yes the above code works when using a batch of images but still i don’t get why it doesn’t work on a single array.
Did you read the other thread I linked? (I admit it’s pretty long and not easy to get through.) It turns out you need to change the code for identity block and residual block so that it does not hardcode to training = True
, although I’m not sure that is a complete solution.
The point is that BatchNormalization malfunctions if you are in training mode with only a single sample.
Thanks for the link. I agree it’s probably better to have just one preferred method for prediction mode. But there are conflicting stories here: if you look at the main Keras documentation about the Model class in the section explaining the predict()
method, they specifically say to only use predict()
for handling large amounts of input data. If that’s just a performance or resource usage question, then maybe we can ignore that and go with the simplicity strategy. In your link it made what seemed like a much more important point to understand: you can’t use predict()
if you need gradients computed. Not sure we’ll run into that case, but something to be aware of.
Sir, I tried to read that but I wasn’t able to grasp the concept. What happens when I change the training to false in the batch norm step? Sir, can I get a summarized explanation for that?
Have you also read the Keras documentation for BatchNormalization? BatchNorm has its own learnable parameters at every layer where it is applied that are separate from the weight and bias values that are trained by “fit()”. There are two modes in which BatchNorm can operate:
- Training mode: in this case, it computes the mean and variance of the current batch and uses those to do the normalization at each layer where it is applied.
- Inference mode: it uses the saved values for mean and variance that it computed during training to normalize the current data.
The documentation also points out that there are two different ways you can trigger the “training” mode behavior:
- It is supposed to figure it out automatically when you are running “fit()” of the overall model.
- You can still cause it to recompute the values when not in “fit()” mode by passing
training = True
.
I have not really invested enough effort to really understand how option 1) there works and also what the case is in which you’d want to use option 2). So I’m just giving my current understanding here of what is wrong in this exercise.
Notice that in the function definitions of identity_block
and convolutional_block
here, they define training
as a named parameter with a default value of True
. Then also notice that they never pass training
as a parameter when they call those “block” functions from the higher level model code. So that means they will always run BatchNorm in training mode, even during prediction (inference). My current belief is that this is simply a bug in the way they have implemented this. I filed this as a bug against the course back in January 2022, but nothing has happened on that bug report yet.
The place where you most clearly see the effect of this bug is exactly the case you mention: trying to predict on a single sample. If you are computing the mean and variance of one sample and then normalizing, that will always give the same result: \mu = 0 and \sigma = 1, right? So it basically eliminates the actual information in the single sample of data and always gives the same prediction. Not useful.
So now the question is what’s the right way to fix this bug? If you change the function definitions so that the default value is training = False
, then at least it works when you predict on a single sample. But in just a few experiments, it looks like the training doesn’t give as high accuracy as it did before. So I’m worried that somehow that is over-riding the behavior in “fit()” mode and effectively disabling BatchNorm or somehow making it not work as well. The other approach I can think of is to add the handling of the training
parameter at all levels from the model on down and then explicitly control it, but that’s a lot bigger source code change to the notebook.
Unfortunately I do not have time to investigate this further right now and probably won’t for at least another couple of weeks. My fear is that figuring this out will probably require examining some TF/Keras source code to understand how the behavior of “fit()” interacts with BatchNorm. If anyone else has the time and motivation to dig deeper on this, please share anything that you learn.