C1W1_Difference between GlobalAveragePooling and AveragePooling

In DenseNet architecture we have used GlobalAveragePooling2D . Can someone explain what is difference between GlobalAveragePooling2D and AveragePooling ?


You can see the entire list of pooling layers here [Pooling layers]

If you compare the arguments accepted by AveragePooling2D [AveragePooling2D layer]
versus GlobalAveragePooling2D [GlobalAveragePooling2D layer] you will see that the former accepts a pool_size argument :
pool_size : integer or tuple of 2 integers, factors by which to downscale

GlobalAveragePooling2D does not.

Also provided on those linked pages are the differences between the output shapes for the two.

Output shape

  • If data_format='channels_last' : 4D tensor with shape (batch_size, pooled_rows, pooled_cols, channels) .
  • If data_format='channels_first' : 4D tensor with shape (batch_size, channels, pooled_rows, pooled_cols) .

Output shape

  • If keepdims =False: 2D tensor with shape (batch_size, channels) .
  • If keepdims =True:
    • If data_format='channels_last' : 4D tensor with shape (batch_size, 1, 1, channels)
    • If data_format='channels_first' : 4D tensor with shape (batch_size, channels, 1, 1)

@Maverick06, The main use of pooling layer is to reduce the number of features. In average pooling we will only do for small blocks of the input. But in global average pooling we will do for the whole input.

For a single image, without considering the batch size let’s consider an input of dimension 4 (height) x 4 (width) x 4 (channels), if you apply global average pooling it will take the average value for the whole 4 (height) x 4 (width). It will take average for all 16 values and it will be done for all 4 channels seperately. The result will be 1 x 4 (channels), if you need to need to preserve dimension, you can add keep_dims as true and you will get (1 x 1 x 4). Now you can understand why it’s called global. It is mainly used to replace fully connected layers in CNNs as it looks more like that.

In average pooling, you will not take for whole height and width dimension, you will do for small blocks in it by mentioning the stride and pool_size. So if you mention stride as 2 and poolsize as 2 x 2 and if you apply for this example you will get 2(pooled_height) x 2(pooled_width) x 4(channels) as output.

1 Like

Thanks @bharathikannan , got it.