Does the h:w ratio matter for image recognition?

In the cat and dog classification example, the images are resized to 150x150. However, the original height:width ratio may not be 1:1.


It is not neccessarily that the ratio always be 1:1 cats and dogs is already a well defined dataset, resizing only makes it easier for computation. Sometimes resizing can also result in poor performance as you loose information while resizing. Bigger images, more information, improved performance, however it comes at the expense of computation.

A simple method can be to make the H:W ratio 1:1 by filling the smaller sides of the image with the nearest color around, then rescale to the same size. The computation cost should be less than common augmentation, right? Is the reason not to do this is to introduce some kind of “augmentation” by aspect distortion, in order to help reducing overfitting?

It could be, filling or cropping is often considered as a preprocessing step(which is done usually beore augmentation), once the processing is done then you have a proper dataset which could be used directly or could be used with augmentation if neccessary.

Again, there are multiple methods like, batch normalisation, dropouts, regularisation, layer normalisation, etc. to help refuce overfitting. Data Augmentation is done to increase the volume of data. Most of the time these kind of decisions are made specific with regard to the application. There is no rule of thumb for this. In general, droputs are the ones that i would go for to reduce overfitting.