Adapting YOLO for New Object Classes: Best Practices and Dataset Strategies

In one of the assignments involving YOLO, it was mentioned that the model is capable of detecting up to 80 predefined classes. But what if we need to detect a custom class not present in the original COCO dataset? For example, distinguishing pickup trucks as a separate class, or recognizing niche vehicles like golf carts or agricultural machinery, which may be important for autonomous driving applications.

What would be the best strategy for adapting YOLO to detect such custom classes?

Specifically:

  • Should we freeze most of the layers and fine-tune only the last few layers, or is it better to retrain the entire model from scratch (or from a pretrained checkpoint) on the new class?
  • How should we build the dataset? Is it better to start with object-centric images (e.g., the vehicle on a plain white or black background) and only later add context-rich images with real-world backgrounds? Or should we train on full scenes from the beginning?

I’d love to hear your insights on best practices for customizing YOLO for novel classes, especially in the context of real-world deployment like autonomous driving.

Generally speaking, for pre-trained models:

  1. You would freeze the output or up to a certain layer because low-level features can be similar for most objects, instead of retraining from scratch.
  2. The dataset should include real-life data similar to that on which the model will be tested in detection.
1 Like

Thanks for the answer! I’d like to clarify a few things about both the dataset strategy and layer freezing when adding a new class to YOLO (e.g., pickup trucks).

1. Dataset strategy:
My goal is to detect pickups in real-world conditions — for example, using CCTV footage in urban areas (low-res, occlusions, shadows, etc.).

Would it be better to:

  • (a) Start by annotating pickups directly in full-scene CCTV images (with other objects like pedestrians, cars, etc.), so the model learns them in real-world context?
  • (b) Or begin with object-centric images of pickups (on white/gray/black backgrounds), to help the model first understand their appearance, and only then fine-tune on complex scenes?

What would generalize better in deployment if the goal is reliable pickup detection in CCTV-like settings, especially with limited data?

2. Freezing layers in YOLOv8 training:
Let’s say I’m using YOLOv8 like this:

model = YOLO('yolov8x.pt')
model.train(data='data.yaml', ...)

If I just include my new class (e.g., “pickup”) in data.yaml and train like above, does this automatically mean YOLO will fine-tune all layers starting from the pretrained model?

Or should I explicitly freeze part of the backbone (e.g., freeze=10) to avoid overwriting the low-level features learned from COCO?
What is the tradeoff between freezing layers vs. training the full network when adding only one or two custom classes with limited training data?

Thanks in advance — I’d appreciate any insights or best practices from those who’ve done similar customization for real-world deployment.

I would say this one, the model needs to see real occurring data, it can learn by itself.

I haven’t done that myself, but normally you freeze the bottom layers. That’s what’s normally done.

The tradeoff is that you dont need to have a large dataset to perform a full training with freezing and using pretrained weghts. The full training option needs a lot of data, and you need to go through the entire model training as was done during the training of the original model. I suggest you check out Tensorflow Advanced Techniques Specialization; there are sample cases there.

1 Like

hi @pavelz

As @gent.spah mentioned freezing is usually done for initial layers as these layers are related to generic feature extraction where as the last few layers is significant in object detection, so the choice you wanting to freeze all layers or selective layers would be based on what kind of data you are working upon.

In case your new dataset is quite different from the original dataset then unfreeze the initial layers and fine tune it.

in case you are using yolov8x which comes with discriminative feature learning, probably you find frozen object head as the pre trained model has been trained and fine tuned with numerous task.

In custom based dataset, the more important task is data annotation, as yolo models are pretty well trained on coco dataset.

Freezing was also introduced to manage the computation cost or in case limited hardware capabilities. Remember freezing also allows you to train your model on larger training dataset.

if you check the ultralytics GitHub repo, all the information on how to train yolo model on your data is provided.

2 Likes