Does ResNet50 work best with (224, 224, 3) images? I have a images of size (160, 160, 3) and I think increasing their size to (224,224) would decrease quality of images. What should i do?
I have images from 2 different sources of sizes (1024x1024) and (216x178). I applied augmentation wherein i added gaussian noise, blur and JPEG compression and the usual rotations and all and resized them to 160x160. The reason I shared these details is, I feel that after doing so many augmentations I have already reduced the quality of images and now if I increase the size then it would further decrease their quality. Am I correct?
Should i just resize all of them to 224x224 or simply give the 160,160 size images to ResNet50?
Also, I know only cv2.resize(…) method, is there nay other efficient way of doing this?
Lots of questions in this thread, some of them I think have subjective answers (what quality is good enough?). But this one is easy, and the answer is ‘No’. No, you cannot just feed 160x160 input to a default ResNet50. What you can do, however, is treat an entire ResNet50 model architecture as a Layer, and add your own Layer(s) to accept your custom input size and output 224x224. You can either treat it as a complete train from scratch or transfer learning use case. Alternatively, you can modify ResNet50 layer(s) to accept the different size input, but then you need to account for the x32 downsampling and make sure the dimensions still make sense throughout the architecture.
Are you resizing down from original sizes to 160x160? Cropping? The choice to go to 160x160 with the augmented data isn’t obvious to me.
Thanks for replying, but I didn’t understand this part, are you saying to add some extra convolutional layers to increase the size of my image to 224,224 and then feed that to ResNet50? If so, then how should i do that?
As far as 160,160 size is concerned, I initially thought of using my custom Dense CNN (inspired from a paper) which required an input of 160,160. Moreover, as i said some sources had images of size 216,178 so I thought of reducing them further so that i can save some computer power as well.
Lastly, I didn’t crop the images and simply resized them because the dataset didn’t have much background in it.
Can you clarify whether you are considering starting with a pre-trained ResNet50 model, or just utilizing the architecture and doing all the training yourself?
In any case, the approach is similar, the difference is in how you incorporate the ResNet layers and weights. Regarding the use of inputs smaller than a defined model architecture, or dealing with irregularly sized inputs, see for example
I went through this paper and it is suggesting to use zero-padding instead of interpolation. Are you recommending the same thing?
I will be using pre-trained ResNet50 model without the top layer. Also, I will unfreeze the last layer (entire 5th layer). Is it fine? any suggestions you have for me in this regard?
My recommendation is to do what works best. I don’t mean that sarcastically. I don’t believe that there is one approach or rule that is right for all domains, data sources, engineering budgets, and operational environments. Rather, you should expect to arrive at an acceptable result through empirical analysis. Remember that what the image quality looks like to your eyes is irrelevant. What matters is prediction accuracy and its impact on the four quadrants of the truth table performed at a frame rate acceptable for your particular application.
Hi, so I tried fine-tuning resnet50 and I didn’t find much difference whether I used the 160,160 images or used zero-padding to get image of size 224,224 . Though when i resized them to 224,224 (interpolation) the accuracy decreased further. Hence i worked with the original 160,160 size images.
But the model was overfitting and I tried adding dropout layers and got a validation accuracy of 92%. Then i simply designed my own Dense CNN which worked better than resnet50 and got a validation accuracy of 96%.