YOLO vs Convolutional Sliding Window

ai_curious · August 17, 2024, 10:16pm

While I have read that OverFeat paper referenced by Prof Ng in the discussion of Convolutional Implementation of Sliding Windows (CISW), but I haven’t ever worked with an implementation of it. I have spent a lot of time working to understand YOLO and implementing it myself from scratch. Here is my compare/contrast.

First, the basic network architecture of multiple convolutional layers successively downsampling the single input image is comparable. YOLO v1 had two fully connected layers after the convolutional layers, but I think they were removed in favor of a fully convolutional implementation with v2.

The biggest difference I see is in the number of predictions made, and what drives the number of them. In CISW, it appears that there can be a very large number of predictions made per object, and that this number is driven by the filter size (and stride?) of the convolutions.

This gaggle of bounding boxes is then “resolved” based on confidence down to a single box.

In YOLO, the number of predictions is driven by the grid size (S) and anchor box count (B). For each grid cell sized image region there can be at most B predictions, which I think is substantially lower than with CISW. Then, possible duplicates are pruned using non-max suppression. My understanding is that this results in at least as good localization and classification accuracy with much higher throughput. This explains why YOLO caught the world’s attention for years while CISW was merely a step that was rather quickly subsumed and surpassed.

Welcome your thoughts and corrections.

OverFeat paper on arxiv
https://arxiv.org/pdf/1312.6229

YOLO v1 on arvix
https://arxiv.org/pdf/1506.02640

NOTE: the YOLO authors briefly mention/contrast with OverFeature and reference it in their bibliography

NOTE: the architectures of the ConvNets in each of these papers owes debt to the seminal paper ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton

Krizhevsky et al paper at ACM
https://dl.acm.org/doi/pdf/10.1145/3065386

Topic		Replies	Views
Queries regarding YOLO and Sliding window Convolutional Neural Networks week-3	10	86	February 28, 2025
YOLO algorithm and Sliding window Convolutional Neural Networks	7	764	September 16, 2022
YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids) Convolutional Neural Networks	2	731	November 25, 2021
Week 3 Yolo Doubt About Sliding Window Convolutional Neural Networks	7	754	August 18, 2024
Questions about sliding window and YOLO Convolutional Neural Networks	4	739	January 12, 2022

YOLO vs Convolutional Sliding Window

Related topics