Why choosing mixed7 as the last layer?

The notebook for the first lab of week 1 (“C3_W1_Lab_1_transfer_learning_cats_dogs.ipynb”) assumes “mixed7” is the last layer.
However, looking at the model summary, we can see the last layer is actually “mixed10” (see below partial output…).
Can someone please explain why?
I wish to understand how to identify which is the correct layer to pick when looking at a large model like this one.

Thanks,

Julius

Model: “inception_v3”


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 150, 150, 3) 0


conv2d (Conv2D) (None, 74, 74, 32) 864 input_1[0][0]


batch_normalization (BatchNorma (None, 74, 74, 32) 96 conv2d[0][0]

activation_68 (Activation) (None, 7, 7, 192) 0 batch_normalization_68[0][0]


activation_69 (Activation) (None, 7, 7, 192) 0 batch_normalization_69[0][0]


mixed7 (Concatenate) (None, 7, 7, 768) 0 activation_60[0][0]
activation_63[0][0]
activation_68[0][0]
activation_69[0][0]


conv2d_72 (Conv2D) (None, 7, 7, 192) 147456 mixed7[0][0]


batch_normalization_72 (BatchNo (None, 7, 7, 192) 576 conv2d_72[0][0]


activation_72 (Activation) (None, 7, 7, 192) 0 batch_normalization_72[0][0]


conv2d_73 (Conv2D) (None, 7, 7, 192) 258048 activation_72[0][0]


batch_normalization_73 (BatchNo (None, 7, 7, 192) 576 conv2d_73[0][0]


activation_73 (Activation) (None, 7, 7, 192) 0 batch_normalization_73[0][0]


conv2d_70 (Conv2D) (None, 7, 7, 192) 147456 mixed7[0][0]


conv2d_74 (Conv2D) (None, 7, 7, 192) 258048 activation_73[0][0]


batch_normalization_70 (BatchNo (None, 7, 7, 192) 576 conv2d_70[0][0]


batch_normalization_74 (BatchNo (None, 7, 7, 192) 576 conv2d_74[0][0]


activation_70 (Activation) (None, 7, 7, 192) 0 batch_normalization_70[0][0]


activation_74 (Activation) (None, 7, 7, 192) 0 batch_normalization_74[0][0]


conv2d_71 (Conv2D) (None, 3, 3, 320) 552960 activation_70[0][0]


conv2d_75 (Conv2D) (None, 3, 3, 192) 331776 activation_74[0][0]


batch_normalization_71 (BatchNo (None, 3, 3, 320) 960 conv2d_71[0][0]


batch_normalization_75 (BatchNo (None, 3, 3, 192) 576 conv2d_75[0][0]


activation_71 (Activation) (None, 3, 3, 320) 0 batch_normalization_71[0][0]


activation_75 (Activation) (None, 3, 3, 192) 0 batch_normalization_75[0][0]


max_pooling2d_3 (MaxPooling2D) (None, 3, 3, 768) 0 mixed7[0][0]


mixed8 (Concatenate) (None, 3, 3, 1280) 0 activation_71[0][0]
activation_75[0][0]
max_pooling2d_3[0][0]


conv2d_80 (Conv2D) (None, 3, 3, 448) 573440 mixed8[0][0]


batch_normalization_80 (BatchNo (None, 3, 3, 448) 1344 conv2d_80[0][0]


activation_80 (Activation) (None, 3, 3, 448) 0 batch_normalization_80[0][0]


conv2d_77 (Conv2D) (None, 3, 3, 384) 491520 mixed8[0][0]


conv2d_81 (Conv2D) (None, 3, 3, 384) 1548288 activation_80[0][0]


batch_normalization_77 (BatchNo (None, 3, 3, 384) 1152 conv2d_77[0][0]


batch_normalization_81 (BatchNo (None, 3, 3, 384) 1152 conv2d_81[0][0]


activation_77 (Activation) (None, 3, 3, 384) 0 batch_normalization_77[0][0]


activation_81 (Activation) (None, 3, 3, 384) 0 batch_normalization_81[0][0]


conv2d_78 (Conv2D) (None, 3, 3, 384) 442368 activation_77[0][0]


conv2d_79 (Conv2D) (None, 3, 3, 384) 442368 activation_77[0][0]


conv2d_82 (Conv2D) (None, 3, 3, 384) 442368 activation_81[0][0]


conv2d_83 (Conv2D) (None, 3, 3, 384) 442368 activation_81[0][0]


average_pooling2d_7 (AveragePoo (None, 3, 3, 1280) 0 mixed8[0][0]


conv2d_76 (Conv2D) (None, 3, 3, 320) 409600 mixed8[0][0]


batch_normalization_78 (BatchNo (None, 3, 3, 384) 1152 conv2d_78[0][0]


batch_normalization_79 (BatchNo (None, 3, 3, 384) 1152 conv2d_79[0][0]


batch_normalization_82 (BatchNo (None, 3, 3, 384) 1152 conv2d_82[0][0]


batch_normalization_83 (BatchNo (None, 3, 3, 384) 1152 conv2d_83[0][0]


conv2d_84 (Conv2D) (None, 3, 3, 192) 245760 average_pooling2d_7[0][0]


batch_normalization_76 (BatchNo (None, 3, 3, 320) 960 conv2d_76[0][0]


activation_78 (Activation) (None, 3, 3, 384) 0 batch_normalization_78[0][0]


activation_79 (Activation) (None, 3, 3, 384) 0 batch_normalization_79[0][0]


activation_82 (Activation) (None, 3, 3, 384) 0 batch_normalization_82[0][0]


activation_83 (Activation) (None, 3, 3, 384) 0 batch_normalization_83[0][0]


batch_normalization_84 (BatchNo (None, 3, 3, 192) 576 conv2d_84[0][0]


activation_76 (Activation) (None, 3, 3, 320) 0 batch_normalization_76[0][0]


mixed9_0 (Concatenate) (None, 3, 3, 768) 0 activation_78[0][0]
activation_79[0][0]


concatenate (Concatenate) (None, 3, 3, 768) 0 activation_82[0][0]
activation_83[0][0]


activation_84 (Activation) (None, 3, 3, 192) 0 batch_normalization_84[0][0]


mixed9 (Concatenate) (None, 3, 3, 2048) 0 activation_76[0][0]
mixed9_0[0][0]
concatenate[0][0]
activation_84[0][0]


conv2d_89 (Conv2D) (None, 3, 3, 448) 917504 mixed9[0][0]


batch_normalization_89 (BatchNo (None, 3, 3, 448) 1344 conv2d_89[0][0]


activation_89 (Activation) (None, 3, 3, 448) 0 batch_normalization_89[0][0]


conv2d_86 (Conv2D) (None, 3, 3, 384) 786432 mixed9[0][0]


conv2d_90 (Conv2D) (None, 3, 3, 384) 1548288 activation_89[0][0]


batch_normalization_86 (BatchNo (None, 3, 3, 384) 1152 conv2d_86[0][0]


batch_normalization_90 (BatchNo (None, 3, 3, 384) 1152 conv2d_90[0][0]


activation_86 (Activation) (None, 3, 3, 384) 0 batch_normalization_86[0][0]


activation_90 (Activation) (None, 3, 3, 384) 0 batch_normalization_90[0][0]


conv2d_87 (Conv2D) (None, 3, 3, 384) 442368 activation_86[0][0]


conv2d_88 (Conv2D) (None, 3, 3, 384) 442368 activation_86[0][0]


conv2d_91 (Conv2D) (None, 3, 3, 384) 442368 activation_90[0][0]


conv2d_92 (Conv2D) (None, 3, 3, 384) 442368 activation_90[0][0]


average_pooling2d_8 (AveragePoo (None, 3, 3, 2048) 0 mixed9[0][0]


conv2d_85 (Conv2D) (None, 3, 3, 320) 655360 mixed9[0][0]


batch_normalization_87 (BatchNo (None, 3, 3, 384) 1152 conv2d_87[0][0]


batch_normalization_88 (BatchNo (None, 3, 3, 384) 1152 conv2d_88[0][0]


batch_normalization_91 (BatchNo (None, 3, 3, 384) 1152 conv2d_91[0][0]


batch_normalization_92 (BatchNo (None, 3, 3, 384) 1152 conv2d_92[0][0]


conv2d_93 (Conv2D) (None, 3, 3, 192) 393216 average_pooling2d_8[0][0]


batch_normalization_85 (BatchNo (None, 3, 3, 320) 960 conv2d_85[0][0]


activation_87 (Activation) (None, 3, 3, 384) 0 batch_normalization_87[0][0]


activation_88 (Activation) (None, 3, 3, 384) 0 batch_normalization_88[0][0]


activation_91 (Activation) (None, 3, 3, 384) 0 batch_normalization_91[0][0]


activation_92 (Activation) (None, 3, 3, 384) 0 batch_normalization_92[0][0]


batch_normalization_93 (BatchNo (None, 3, 3, 192) 576 conv2d_93[0][0]


activation_85 (Activation) (None, 3, 3, 320) 0 batch_normalization_85[0][0]


mixed9_1 (Concatenate) (None, 3, 3, 768) 0 activation_87[0][0]
activation_88[0][0]


concatenate_1 (Concatenate) (None, 3, 3, 768) 0 activation_91[0][0]
activation_92[0][0]


activation_93 (Activation) (None, 3, 3, 192) 0 batch_normalization_93[0][0]


mixed10 (Concatenate) (None, 3, 3, 2048) 0 activation_85[0][0]
mixed9_1[0][0]
concatenate_1[0][0]
activation_93[0][0]

Total params: 21,802,784
Trainable params: 0
Non-trainable params: 21,802,784


Hi Jcplerm, after my rerunning the notebook of C3_W1_Lab_1_transfer_learning_cats_dogs.ipynb, I can understand your issue:

  1. The last layer will be 'mixed 10 ’ . This is because you run the code of ‘pre_trained_model.summary()’, which is actually InceptionV3 model.

  2. The last layer will be ‘mixed 7’ from ‘InceptionV3’ as you required if you run the code of ’ model.summary()’, which excludes the added layers.

Moreover, our choosing mixed7 as the last layer, which is as a reference; you may choose another one as the last layer.

Wish this can help you.

Hi @jcplerm,

In addition to @lbt, all the layers down the mixed7 are very specific on the original dataset that the base model was trained on. It is thus important to retrain all the layers (from mixed8…last layer) on the new dataset, or remove them.

To make it more clear, usually, in any CNNs, the first layers are going to have the low-level features (like lines, edges), middle layers have middle-level features, but the top layers or those that are above the classification head are going to have high-level features (like faces, legs, generally all meaningful parts of an image). A sure way to improve the performance of a model (built from a pretrained one) is to retrain these top layers on the new dataset (being cat and dog), or remove them which is what is done in the notebook. So, in the notebook, mixed7 was taken as a reference layer and we added more fully connected layers to complete the model.

There is no right answer for what layer to take as the last layer, but generally depending on the architecture you have, you are going to look in those top payers, experimenting with different choices. Take mixed6 or mixed8 and see the change it makes in the results.

Here is also an official TensorFlow notebook that you can use to learn more about transfer learning and finetuning.

2 Likes