Same Dataset for Classification and Object Detection?

Lakshmi_Narayana · March 13, 2023, 2:07am

In the object detection introduction, Andrew says we first need to train the model on classification and then on sliding windows object detection.

I am assuming we should do the same with YOLO. Is that correct?
Can we just crop the objects from the object detection dataset and use it for classification?

gent.spah · March 13, 2023, 11:19am

I think that what he means is that in object detection first you have to find the zones of interest in the image and then classify the objects in those zones. The YOLO model does that by itself no need for you to determine the zones of interest beforehand.

Can you use cropping you say and then use classification? As far as I remember all the algorithms present in that course do that by themselves, no need you to do it manually.

ai_curious · March 13, 2023, 11:39am

This was because pre-YOLO treated the two pipelines as separate learning tasks: one classification, one regression. YOLO treated them both as regression, so both could be accomplished in the same (single) forward pass. Notice, however, that while this worked well for runtime predictions, it complicated training, and at least the early versions of YOLO approached the problem using transfer learning. That is, they trained first on classification, then modified the head of the network and further trained for localization.

From the YOL09000 / v2 paper…

Training for classification. We train the network on the standard ImageNet 1000 class classification dataset for 160 epochs using stochastic gradient descent with a starting learning rate of 0.1, polynomial rate decay with a power of 4, weight decay of 0.0005 and momentum of 0.9 using the Darknet neural network framework [13].

Training for detection. We modify this network for detection by removing the last convolutional layer and instead adding on three 3 × 3 convolutional layers with 1024 filters each followed by a final 1 × 1 convolutional layer with the number of outputs we need for detection. For VOC we predict 5 boxes with 5 coordinates each and 20 classes per box so 125 filters.

At runtime it uses the fully trained final classification + localization network architecture.

There is a reason almost all of the YOLO-related papers or blogs you find on the web include ‘…we started with a pretrained model…’

Topic		Replies	Views
How YOLO training sets are built? Convolutional Neural Networks coursera-platform	1	557	May 25, 2021
Questions about YOLO Convolutional Neural Networks coursera-platform	13	2448	January 23, 2025
Queries regarding YOLO and Sliding window Convolutional Neural Networks week-3 , coursera-platform	10	90	February 28, 2025
Week 3 Yolo Doubt About Sliding Window Convolutional Neural Networks coursera-platform	7	755	August 18, 2024
Object detection NN Convolutional Neural Networks coursera-platform	3	485	May 30, 2023

Same Dataset for Classification and Object Detection?

Related topics