I do not understand why we need to do the GlobalAveragePooling2D() before Flatten(). It seems we give up really a lot of information by just collapsing the whole palette into one average number… How does it work?

Why not just flatten the Resnet50 output directly and feed into dense layer?

How much a pooling layer reduces the size of the input to the output depends on the parameters you choose for the stride and filter size. The typical case is reduce the size by half (e.g. f = 2 and s = 2). You can try implementing the network both ways (with and without the average pooling layer) and see how it performs in terms of training cost and prediction accuracy. Of course you’ll have to adjust the parameters of the following dense layers in the case that you omit the pooling layer in order to get to the defined size of the output of your network.

With any decision like this that you see in a published network, what you can surmise is that the designers and authors probably did quite a bit of experimentation and tuning to arrive at an architecture that works well enough for their purposes and does not have training costs and storage costs that are prohibitive. But this is an experimental science and you can always do further research and experimentation to adapt one of these model architectures to a new problem that you are trying to solve. Maybe the original authors missed something or your problem is more complex and you need more data resolution and to extend the number of later layers in the network in order to solve your problem. Of course there is always Moore’s Law in play and that is not a trivial consideration: ResNet50 was first published in 2015 and what would have been considered a “prohibitive” cost in terms of cpu time then might well not be now.

Also @Dennis_Sinitsky

To extend the capabilities of the pretrained ResNet50 model, we introduce additional layers in the ‘classifier_layers’ function. Specifically, we incorporate a Global Average Pooling layer, which helps in spatially summarizing the features, followed by Fully Connected layers to increase the model’s learning capacity.

While Flatten() reshapes your tensor into a 1D vector, GlobalAveragePooling2D() performs an average pooling operation, reducing the size of your tensor. The choice between the two depends on your specific use case and the architecture of your neural network.

So the Chose of using GlobalAveragePooling is to first reduce the size rather than reshaping and then flatten for better model learning capacity.

Feel free to ask if more doubts.

Regards

DP