Advice on the best 2DConvolution model architecture for image classification?

I have a binary classification problem with a small dataset of thermal images, detecting if an image has a human or not.

my data shapes are like this:

print('Train: X_train_images=%s, y_train_labels=%s' % (X_train_images.shape, y_train_labels.shape))
print('Validation: X_val_images=%s, y_val_labels=%s' % (X_val_images.shape, y_val_labels.shape))
Train: X_train_images=(3932, 100, 100, 3), y_train_labels=(3932, 1)
Validation: X_val_images=(800, 100, 100, 3), y_val_labels=(800, 1)

so I have set an input shape as

input_shape = X_train_images.shape[1:]
# (100, 100, 3)

and my model like this:

model_cnn = tf.keras.Sequential([
    Conv2D(100, (3,3), padding = 'SAME', activation = 'relu', input_shape =(input_shape)),
    Conv2D(60, (3,3), padding = 'SAME', activation = 'relu', input_shape =(input_shape)),
    MaxPooling2D(2,2),
    Conv2D(30, (3,3), padding = 'SAME', activation = 'relu', input_shape =(input_shape)),
    Conv2D(20, (3,3), padding = 'SAME', activation = 'relu', input_shape =(input_shape)),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(20, activation = 'relu'),
    Dense(1, activation = "sigmoid"),                       

])

model_cnn.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_13 (Conv2D)           (None, 100, 100, 100)     2800      
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 100, 100, 60)      54060     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 50, 50, 60)        0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 50, 50, 30)        16230     
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 50, 50, 20)        5420      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 25, 25, 20)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 12500)             0         
_________________________________________________________________
dense_6 (Dense)              (None, 20)                250020    
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 21        
=================================================================
Total params: 328,551
Trainable params: 328,551
Non-trainable params: 0

cnn_model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

Yet I am not sure if this is the best architecture for the problem at hand?
My accuracy on the validation set and the test set, is 71.9% and 95.3%, respectively.

Here is my validation accuracy and loss plots.

Shall I add more convolutional layers, regularisation layers (dropout() etc) ?
Is it OK to add them after the convolutional layers like so?

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_21 (Conv2D)           (None, 100, 100, 100)     2800      
_________________________________________________________________
dropout_3 (Dropout)          (None, 100, 100, 100)     0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 100, 100, 60)      54060     
_________________________________________________________________
dropout_4 (Dropout)          (None, 100, 100, 60)      0         
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 50, 50, 60)        0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 50, 50, 30)        16230     
_________________________________________________________________
dropout_5 (Dropout)          (None, 50, 50, 30)        0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 50, 50, 20)        5420      
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 25, 25, 20)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 12500)             0         
_________________________________________________________________
dense_10 (Dense)             (None, 20)                250020    
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 21        
=================================================================
Total params: 328,551
Trainable params: 328,551
Non-trainable params: 0

I am also not sure if I should be increasing input dimensions? my original image dimensions were 300 x 400 pixels, and I have loaded them as a 100 x 100 x 3 tensor via a manually defined function.
I have posted a separate question on dimensions.

Thank you very much for any advice on how to improve my model

Have you seen this ?

yes, I have done 3 weeks out of 4 of this course(am now on week4). it does not mention regularisation at all.
I am trying to apply what I have learned on this course on my university assignment.

Is it common to decrease the number of conv filters as you go deeper?

I was looking at this video, CNN example in Week 1. so it goes from 32x32x3 to 28x28x6 to 14x14x6 to10x10x6 and 5x5x16.

that is why I have tried to do a similar thing decreasing the number in my model (from 100x100 to 50x50, to 25x25)?

is it wrong? I’m confused.

Please look closely at the output of Conv layers which is written below the conv box. The number of channels in the output is equal to the number of filters used in that conv layer. For example, Conv 1 takes as input 32x32x3 and applies 6 filters, each of dimension 5x5 with a stride of 1 to output 28x28x6

1 Like

I understand all this… he’s using the formula, ((n + 2p -f) /s) +1, to calculate the number like 28 x28 for the first convolution layer.

I still don’t understand how I should be defining the dimensions of the convolutional layers in my example?

I should be increasing the number of channels, say from 3 to 6 to 16 in each convolutional layer, as in his example, in my model?

does it look better this way?


Model: "sequential_15"
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_61 (Conv2D)           (None, 100, 100, 32)      2432      
_________________________________________________________________
max_pooling2d_33 (MaxPooling (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_62 (Conv2D)           (None, 50, 50, 64)        51264     
_________________________________________________________________
max_pooling2d_34 (MaxPooling (None, 25, 25, 64)        0         
_________________________________________________________________
conv2d_63 (Conv2D)           (None, 25, 25, 128)       73856     
_________________________________________________________________
max_pooling2d_35 (MaxPooling (None, 12, 12, 128)       0         
_________________________________________________________________
flatten_16 (Flatten)         (None, 18432)             0         
_________________________________________________________________
dense_33 (Dense)             (None, 40)                737320    
_________________________________________________________________
dense_34 (Dense)             (None, 20)                820       
_________________________________________________________________
dense_35 (Dense)             (None, 1)                 21        
=================================================================
Total params: 865,713
Trainable params: 865,713
Non-trainable params: 0

my plots look like this now with some spikes, and it is still overfitting.

thank you very much.

Since you’ve taken deep learning specialization, what should you do when a model overfits?

add regularisation? dropouts?
that brings me back to what I have asked earlier.

Finding a good architecture requires tuning hyperparameters by you. There are tools like automl-google that’ll do the network architecture search at the cost of computing power.