Running CNNs on rectangular images

Ayse_Burcu_Ozkaptan · June 6, 2022, 10:47am

Hello everyone,
From what I have seen, CNNs we have been working on have been trained on square images. In week 3’s first assignment however, the predictions are made on rectangular images. It’s said that YOLO’s network was trained to run on 608x608 images. If we are testing this data on a different size image – for example, the car detection dataset had 720x1280 images – the last step in the assignment rescales the boxes so that they can be plotted on top of the original 720x1280 image.

How does this work actually? What is happening under the hood? The filter sizes, strides, basically everything has been adjusted to a particular input size. So, if we were to run YOLO in one of our projects, how do we go around? What does preprocessing do?

Many thanks and have a nice day!

balaji.ambresh · June 6, 2022, 11:23am

One way to use the existing yolo model would be to resize your image to the input shape that’s accepted by the model. See preprocess_image function defined in utils.py. The image is read and resized to (608, 608, 3).

Ayse_Burcu_Ozkaptan · June 6, 2022, 3:50pm

Thanks a lot. I took a look at the function. From its documentation, as much as i understood, it downsamples the image and apply bicubic interpolation (still not very clear as to why interpolation would be needed while downsampling. If we were to enlarge the image, that would make a lot more sense).
The main thing that confuses me is the fact that the shape of the image was shrunk, so did all the objects in the image, i.e the aspect ratio was not preserved. So, if the algorithm was trained on “normal” looking objects, like cars for example, how can it work properly on the cars that have different aspect ratios which make them look longer and slimmer? Are the image detection algorithms invariant to changes in aspect ratios?

Thanks a lot in advance!

paulinpaloalto · June 6, 2022, 4:41pm

“downsampling” can be done in a straightforward way or in a more sophisticated way by including interpolation. Just picking (e.g.) every other pixel doesn’t necessarily give you the best quality results and you also have to deal with “quantization effects” (suppose the ratio of shrinkage does not evenly divide the number of pixels in a given dimension). Image processing is a whole discipline unto itself with lots of “prior art”. Google is your friend.

Generally speaking there is nothing about CNN algorithms that requires the inputs to be square. If you are training your algorithm from scratch, then it’s a decision you as the system designer need to make about the shape and resolution of your input images. If you’re using Transfer Learning, then you are obviously constrained by the input definition of the system you are using as your starting point. It’s a good question whether you’ll have problems if you have to resize your images from non-square to square with the resulting distortion of aspect ratios. Aspect ratios are a big deal in YOLO with the anchor boxes and all. I don’t have any experience applying YOLO, but I hope someone else listening here will be able to shed some actual light on that question.

balaji.ambresh · June 6, 2022, 4:58pm

Does this post help?

Ayse_Burcu_Ozkaptan · June 6, 2022, 6:02pm

Thank you very much! The article you shared seems very interesting: just like you said google is my friend, but for sure IEEE isn’t I will try to get my hands on that article by asking around

for the second paragraph, it’s perfectly clear and thank you very much for sharing your valuable insights!

paulinpaloalto · June 6, 2022, 6:06pm

Oh, sorry, I just read the Abstract and didn’t click through, so wasn’t aware it requires a login. But the point is this is a topic that’s been around for a while (note that the publication date on that paper is 2011). I’m sure a few searches will turn up some “open source” material on this topic.

Ayse_Burcu_Ozkaptan · June 6, 2022, 6:56pm

No worries! I did research though. I had one potential good hit but it had some problems. I will probably contact the authors Thanks a lot for your kind help and time!

Topic		Replies	Views
Scaling images for the YOLO's Network Convolutional Neural Networks	2	560	June 27, 2021
A doubt on Week 3 Assignment - Car detection with YOLO Convolutional Neural Networks	1	345	October 11, 2023
How are the width and height in yolo determined? Convolutional Neural Networks	1	521	December 13, 2022
Programming Assignment: Autonomous Driving - Car Detection Convolutional Neural Networks week-3	3	73	October 15, 2024
Questions about YOLO Convolutional Neural Networks	13	2427	January 23, 2025

Running CNNs on rectangular images

Related topics