Week 1 lecture on Resnet50 usage

Dennis_Sinitsky · February 15, 2024, 3:57pm

I do not understand why we need to do the GlobalAveragePooling2D() before Flatten(). It seems we give up really a lot of information by just collapsing the whole palette into one average number… How does it work?
Why not just flatten the Resnet50 output directly and feed into dense layer?

paulinpaloalto · February 15, 2024, 4:14pm

How much a pooling layer reduces the size of the input to the output depends on the parameters you choose for the stride and filter size. The typical case is reduce the size by half (e.g. f = 2 and s = 2). You can try implementing the network both ways (with and without the average pooling layer) and see how it performs in terms of training cost and prediction accuracy. Of course you’ll have to adjust the parameters of the following dense layers in the case that you omit the pooling layer in order to get to the defined size of the output of your network.

With any decision like this that you see in a published network, what you can surmise is that the designers and authors probably did quite a bit of experimentation and tuning to arrive at an architecture that works well enough for their purposes and does not have training costs and storage costs that are prohibitive. But this is an experimental science and you can always do further research and experimentation to adapt one of these model architectures to a new problem that you are trying to solve. Maybe the original authors missed something or your problem is more complex and you need more data resolution and to extend the number of later layers in the network in order to solve your problem. Of course there is always Moore’s Law in play and that is not a trivial consideration: ResNet50 was first published in 2015 and what would have been considered a “prohibitive” cost in terms of cpu time then might well not be now.

Deepti_Prasad · February 16, 2024, 12:38pm

Also @Dennis_Sinitsky

To extend the capabilities of the pretrained ResNet50 model, we introduce additional layers in the ‘classifier_layers’ function. Specifically, we incorporate a Global Average Pooling layer, which helps in spatially summarizing the features, followed by Fully Connected layers to increase the model’s learning capacity.

While Flatten() reshapes your tensor into a 1D vector, GlobalAveragePooling2D() performs an average pooling operation, reducing the size of your tensor. The choice between the two depends on your specific use case and the architecture of your neural network.

So the Chose of using GlobalAveragePooling is to first reduce the size rather than reshaping and then flatten for better model learning capacity.

Feel free to ask if more doubts.

Regards
DP

Topic		Replies	Views
GlobalAveragePooling2D Advanced Computer Vision with TensorFlow week-1	1	599	February 1, 2023
Why Flatten after GlobalAveragePooling2D in C3 W1 Lab2 Classifier? Advanced Computer Vision with TensorFlow week-1	11	629	November 28, 2022
C1W1_Difference between GlobalAveragePooling and AveragePooling AI for Medical Diagnosis week-1	3	588	December 30, 2021
Question about the layer configurations in Lab2 Advanced Computer Vision with TensorFlow week-1	2	530	September 27, 2021
About the use of the layer.Flatten Custom and Distributed Training with TF week-4	5	557	September 22, 2022

Week 1 lecture on Resnet50 usage

Related topics