C1_W3_Lab_1: Odd Result When Removing Conv Layer

In C1_W3_Lab_1_improving_accuracy_using_convolutions it is suggested that we try removing the 2nd of 2 Conv layers to see how this will affect training. I had expected the speed to increase, but the accuracy to decrease. Instead, I saw improved accuracy without the second conv layer.

The base model was:

# Define the model
model = tf.keras.models.Sequential([
                                                         
  # Add convolutions and max pooling
  tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),

  # Add the same layers as before
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

And when fit using Adam optimization and sparse categorical cross entropy loss, I saw this performance:

MODEL TRAINING:
Epoch 1/5
1875/1875 [==============================] - 9s 4ms/step - loss: 0.4747 - accuracy: 0.8303
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.3139 - accuracy: 0.8852
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2732 - accuracy: 0.9001
Epoch 4/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2414 - accuracy: 0.9111
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2185 - accuracy: 0.9186

MODEL EVALUATION:
313/313 [==============================] - 1s 3ms/step - loss: 0.2717 - accuracy: 0.8993

However, upon removing the 2nd conv layer, I saw this improved performance:

MODEL TRAINING:
Epoch 1/5
1875/1875 [==============================] - 9s 4ms/step - loss: 0.4500 - accuracy: 0.8369
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.3004 - accuracy: 0.8895
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2533 - accuracy: 0.9058
Epoch 4/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.2211 - accuracy: 0.9172
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1972 - accuracy: 0.9271

MODEL EVALUATION:
313/313 [==============================] - 1s 3ms/step - loss: 0.2692 - accuracy: 0.9001

Why should I get a better fit to training data with fewer free parameters? Why is the 2nd conv layer, apparently, harmful?

A change from 89% to 90% is not really significant. You might see that much difference just re-training an existing model several times even without changing it. This is due to deep models having non-convex cost functions, so the random weight initialization will potentially lead to slightly different solutions each time.