I agree with @paulinpaloalto - batch-norm has to save the mean parameters it stores during training to then apply these statically during inference. It definitely doesn’t make sense to vary the batch-norm parameters after training otherwise because - as we’ve seen here - you end up with a predict function that gives different predictions depending on how many examples you are handing it.
I’ve gone ahead and removed the training=true option from all the identity_blocks and convolution_blocks and it behaves as you’d expect in terms of giving the same prediction for the same example regardless of whether you just predict X_train[3] or return all predictions then select the third one.
One slightly unexpected side effect I’ve learnt from all this: when running the code with training=true hardwired (i.e. the code for the assignment) you can run for 10 epochs and get pretty good test and train accuracies (using the code below to return these).
prediction_train = model.predict(X_train)
print("Train accuracy = ", np.mean( np.argmax(prediction_train, axis=1) == np.argmax(Y_train, axis=1)))
prediction_test = model.predict(X_test)
print("Test accuracy = ", np.mean( np.argmax(prediction_test, axis=1) == np.argmax(Y_test, axis=1)))
Running this gives an output of:
Epoch 10/10
34/34 [==============================] - 1s 24ms/step - loss: 0.0867 - accuracy: 0.9741
Train accuracy = 0.9833333333333333
Test accuracy = 0.9166666666666666
But if you only run the (I think) corrected code with all the training = true calls removed, you have to train longer. If you only train 10 epochs you end up with apparently good accuracies from the model.fit() output but terrible accuracies reported afterwards - see output below of the 10th epoch, final batch and results of running the above code straight after.
Epoch 10/10
34/34 [==============================] - 1s 23ms/step - loss: 0.2694 - accuracy: 0.9241
Train accuracy = 0.30925925925925923
Test accuracy = 0.2916666666666667
You have to run this version longer - e.g. 20 epochs to get a good training (and test) accuracy.
My intuition for this pretty confusing behaviour is that, in the model run in class - because training is switched on all the time in the batch norm calls, when I call calculated train accuracy straight after the model has been fit, I’m using the most recently updated batch-norm parameters - which mean I get a really similar result (98%) to the last model.fit() number (97%).
But batch-norm parameters should not keep updating - they should be averages based on the overall training process. My hunch is that, when we’ve removed training=true (i.e. running what I think is the correct code) the running averages / variances haven’t yet settled down - meaning even though accuracy looks good we actually haven’t fit the model after 10 epochs. Only running even longer and evaluating the full training accuracy (not the accuracy of the last mini-batch) can fix this. You can also pull down the batch size to 16 then run for 10 epochs and get a stable result.
If you’ve followed all that well done - I’ve been puzzling over this for days now! I’m posting a full MWE to execute what I think is the saved resnet50.h5 model in the lectures. Let me know what you think - interested to hear your responses and thoughts on this.
MWE - possible corrected ResNets50.h5 model code.
import tensorflow as tf
import numpy as np
import scipy.misc
from tensorflow.keras.applications.resnet_v2 import ResNet50V2
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet_v2 import preprocess_input, decode_predictions
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.keras.models import Model, load_model
from resnets_utils import *
from tensorflow.keras.initializers import random_uniform, glorot_uniform, constant, identity
from tensorflow.python.framework.ops import EagerTensor
from matplotlib.pyplot import imshow
from test_utils import summary, comparator
import public_tests
def identity_block_c(X, f, filters, initializer=random_uniform):
"""
Implementation of the identity block as defined in Figure 4
Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
training -- True: Behave in training mode
False: Behave in inference mode
initializer -- to set up the initial weights of a layer. Equals to random uniform initializer
Returns:
X -- output of the identity block, tensor of shape (m, n_H, n_W, n_C)
"""
# Retrieve Filters
F1, F2, F3 = filters
# Save the input value. You'll need this later to add back to the main path.
X_shortcut = X
# First component of main path
X = Conv2D(filters = F1, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
### START CODE HERE
## Second component of main path (≈3 lines)
X = Conv2D(filters = F2, kernel_size = f, strides = (1,1), padding = 'same', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X) # Default axis
X = Activation('relu')(X)
## Third component of main path (≈2 lines)
X = Conv2D(filters = F3, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X) # Default axis
## Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
X = tf.keras.layers.Add()([X, X_shortcut])
X = Activation('relu')(X)
### END CODE HERE
return X
def convolutional_block_c(X, f, filters, s = 2, initializer=glorot_uniform):
"""
Implementation of the convolutional block as defined in Figure 4
Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
s -- Integer, specifying the stride to be used
training -- True: Behave in training mode
False: Behave in inference mode
initializer -- to set up the initial weights of a layer. Equals to Glorot uniform initializer,
also called Xavier uniform initializer.
Returns:
X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
"""
# Retrieve Filters
F1, F2, F3 = filters
# Save the input value
X_shortcut = X
##### MAIN PATH #####
# First component of main path glorot_uniform(seed=0)
X = Conv2D(filters = F1, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
X = Activation('relu')(X)
### START CODE HERE
## Second component of main path (≈3 lines)
X = Conv2D(filters = F2, kernel_size = f, strides = (1, 1), padding='same', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
X = Activation('relu')(X)
## Third component of main path (≈2 lines)
X = Conv2D(filters = F3, kernel_size = 1, strides = (1, 1), padding='valid', kernel_initializer = initializer(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
##### SHORTCUT PATH ##### (≈2 lines)
X_shortcut = Conv2D(filters = F3, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis = 3)(X_shortcut)
### END CODE HERE
# Final step: Add shortcut value to main path (Use this order [X, X_shortcut]), and pass it through a RELU activation
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
return X
def ResNet50_c(input_shape = (64, 64, 3), classes = 6):
"""
Stage-wise implementation of the architecture of the popular ResNet50:
CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
-> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> FLATTEN -> DENSE
Arguments:
input_shape -- shape of the images of the dataset
classes -- integer, number of classes
Returns:
model -- a Model() instance in Keras
"""
# Define the input as a tensor with shape input_shape
X_input = Input(input_shape)
# Zero-Padding
X = ZeroPadding2D((3, 3))(X_input)
# Stage 1
X = Conv2D(64, (7, 7), strides = (2, 2), kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3)(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)
# Stage 2
X = convolutional_block_c(X, f = 3, filters = [64, 64, 256], s = 1)
X = identity_block_c(X, 3, [64, 64, 256])
X = identity_block_c(X, 3, [64, 64, 256])
### START CODE HERE
## Stage 3 (≈4 lines)
X = convolutional_block_c(X, f = 3, filters = [128, 128, 512], s = 2)
X = identity_block_c(X, 3, [128, 128, 512])
X = identity_block_c(X, 3, [128, 128, 512])
X = identity_block_c(X, 3, [128, 128, 512])
## Stage 4 (≈6 lines)
X = convolutional_block_c(X, f = 3, filters = [256, 256, 1024], s = 2)
X = identity_block_c(X, 3, [256, 256, 1024])
X = identity_block_c(X, 3, [256, 256, 1024])
X = identity_block_c(X, 3, [256, 256, 1024])
X = identity_block_c(X, 3, [256, 256, 1024])
X = identity_block_c(X, 3, [256, 256, 1024])
## Stage 5 (≈3 lines)
X = convolutional_block_c(X, f = 3, filters = [512, 512, 2048], s = 2)
X = identity_block_c(X, 3, [512, 512, 2048])
X = identity_block_c(X, 3, [512, 512, 2048])
## AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"
X = AveragePooling2D((2, 2))(X)
### END CODE HERE
# output layer
X = Flatten()(X)
X = Dense(classes, activation='softmax', kernel_initializer = glorot_uniform(seed=0))(X)
# Create model
model = Model(inputs = X_input, outputs = X)
return model
# load data
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
# Normalize image vectors
X_train = X_train_orig / 255.
X_test = X_test_orig / 255.
# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
print(tf. __version__)
# run model
model = ResNet50_c(input_shape = (64, 64, 3), classes = 6)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs = 20, batch_size = 16, verbose = 2)
# check train accuracy is stable - i.e. similar to final epoch of model.fit()
prediction_train = model.predict(X_train)
print("Train accuracy = ", np.mean( np.argmax(prediction_train, axis=1) == np.argmax(Y_train, axis=1)))
prediction_test = model.predict(X_test)
print("Test accuracy = ", np.mean( np.argmax(prediction_test, axis=1) == np.argmax(Y_test, axis=1)))
# save my model
model.save('SIGNS_resnet_model_20_epochs')
# load back in
# pre_trained_model = tf.keras.models.load_model('SIGNS_resnet_model_20_epochs')
# check that predict() function acts as you'd expect it to
i=3
prediction_3_direct = model.predict(X_test[[3]])
prediction_3_from_all_preds = model.predict(X_test)[3]
print("Class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ", prediction_3_direct)
print("Class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ", prediction_3_from_all_preds)