In C1_W3_Lab_1_improving_accuracy_using_convolutions it is suggested that we try removing the 2nd of 2 Conv layers to see how this will affect training. I had expected the speed to increase, but the accuracy to decrease. Instead, I saw improved accuracy without the second conv layer.
The base model was:
# Define the model
model = tf.keras.models.Sequential([
# Add convolutions and max pooling
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Add the same layers as before
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
And when fit using Adam optimization and sparse categorical cross entropy loss, I saw this performance:
MODEL TRAINING:
Epoch 1/5
1875/1875 [==============================] - 9s 4ms/step - loss: 0.4747 - accuracy: 0.8303
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.3139 - accuracy: 0.8852
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2732 - accuracy: 0.9001
Epoch 4/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2414 - accuracy: 0.9111
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2185 - accuracy: 0.9186
MODEL EVALUATION:
313/313 [==============================] - 1s 3ms/step - loss: 0.2717 - accuracy: 0.8993
However, upon removing the 2nd conv layer, I saw this improved performance:
MODEL TRAINING:
Epoch 1/5
1875/1875 [==============================] - 9s 4ms/step - loss: 0.4500 - accuracy: 0.8369
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.3004 - accuracy: 0.8895
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2533 - accuracy: 0.9058
Epoch 4/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2211 - accuracy: 0.9172
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1972 - accuracy: 0.9271
MODEL EVALUATION:
313/313 [==============================] - 1s 3ms/step - loss: 0.2692 - accuracy: 0.9001
Why should I get a better fit to training data with fewer free parameters? Why is the 2nd conv layer, apparently, harmful?