Image resolution for classification problem

Recently, I encountered a problem while building a logistic regression model… the problem made me to try different resolution of the same image data. Although it didn’t solve my problem, it got me thinking,

Q1: is resolution a hyperparameter?

Q2: is there any rationale behind chosing the resolution for a particular image classification problem?

Q3: Does high resolution improve performance of algorithm?

I noticed that with resolution of 64×64, the images were almost “too blured” for me to recognize but the computational speed was fast compared to resolution of 1000×1000 (this hard a better resolution).

Interesting questions! Here are some thoughts and I hope others will also chime in on this:

Yes, the resolution of the images is a crucial choice that you have to make as the system designer. So it is what Prof Ng calls a “hyperparameter”, meaning a value that you have to choose rather than being able to learn using in algorithm. Note that with all the types of Deep Learning Algorithms we learn about here (Fully Connected Nets in Course 1, Convolutional Nets in Course 4 and Sequential Models in Course 5), the model must be trained on a single input format. That means you need to pick the type of image (RGB, CYMK, greyscale …) and the number of pixels in each dimension. So you need a systematic way to make that decision.

Typically with modern cameras, the raw input is quite high resolution. The fundamental tradeoff is between prediction accuracy and the overall compute and storage cost. If compute power and storage were free and you didn’t care how long the training took, then you would use the highest resolution that is available. But cost and time always do matter in the real world, so the question is figuring out how far you can go in terms of “downsampling” the images without sacrificing the prediction accuracy of your model. Of course all decent image processing libraries support “resize” as an operation.

The first “high level” approach is exactly what you alluded to: just look at the images in various resolutions with your human eyes. Can you distinguish the features that you want your algorithm to detect? This is an example of the concept of “Bayes Error” that Prof Ng discusses at several places in Course 2 and Course 3: you can’t expect an algorithm to do better than a human can do at the same task at least in general. Although there is at least one famous case in which that’s not true, but it’s not clear whether this has anything to do with “resolution”: opthalmologists have always believed that it was impossible to tell the sex of a patient by examining a retinal scan image, but a machine learning model was developed that can do it.

So the first step is to look at downsampled images at various scales and pick the smallest image size in which you can easily see enough to recognize whatever it is that you want to train your algorithm to recognize. You’ll then have lots more hyperparameters to choose, of course, before you can start running the training and seeing whether your algorithm works well enough or not. If you get through all the other aspects of tuning your model and still have accuracy problems, you can always come back and try again with higher resolution images and see if that helps or not.

1 Like

Wow!.. this was enlightening! Trade-offs and choices… Thank you @paulinpaloalto