Programming Assignment - Applied YOLO


To start with, I am really impressed by this assignment. I do feel it has strengthened what I have learned from the lectures.

I decided to upload a picture from my diving trip and I have noticed that the shark was classified as a Plane :slight_smile:

The reason why I am writing this is because I am unsure in regards how to approach to training the model. I tried looking into the files in my lab environment, and many of them return an error when I try to open them.

What I am trying to say is that it was great assignment in terms of learning about how YOLO works but I am still not sure if I can apply this on a real world problem yet. There are many steps that were given to us, anchor boxes, weights and hyperparmameters (grid size, conv layers …)

Can I get more guidance in regards to how to add pictures that our model has not been trained on yet?

As you see, there is a class definition file, coco_classes.txt under model_data. It defined 80 classes. Unfortunately, shark is not in there.
Training YOLO needs additional efforts, of course. Most time consuming one is to define bounding box to your image files with a class name. Then, you need to change some configuration files and start training. For details, please visit Darknet.
There are several derivatives. PyTorch version, Keras version, etc. One example is, keras-yolo3.
You can also visit at YOLOv3
If you want to see YOLO implementation in this assignment, you will find yad2k/models/ and yad2k/models/

In addition, in this community, there is a good summary at Applying YOLO anchor boxes

The exercise you performed is exactly how to test against pictures the model has not been trained on. And to be honest, I would say it did quite well given the pictures, but more importantly the classes it had been trained on. That shark looks more like an airplane than anything else in the model’s limited experience of the world. As @anon57530071 correctly observes, because of the softmax() applied to the class prediction output from the network, it can only predict against a specific vector of classes on which it has been trained. It could never predict shark if it hasn’t been told that sharks exist and learned how to differentiate shark features from airplanes. Basically, softmax forces it to make its best guess, which it does, and airplane is the best it could have done under the circumstances.

Training YOLO from scratch is extremely non-trivial. It takes a lot of data and a lot of computation. And the more classes you include, the larger the output shape, the more parameters to learn…

You can get an idea from some text in the original papers where they mention many of the steps and tricks they learned to do, like separating the classification training from the localization training, playing around with input image resolution, performing several types of data augmentation, and training runtimes of ‘… about a week’ . Last year I spent a month trying to do it myself on my laptop and failed - you can find some of my experience in topics I created in this forum. Which is why many people start with a trained model as a baseline or at most do some fine tuning. Be sure to share your experience if you go down that path. Maybe I’ll learn what I was doing wrong all that time :sunglasses:

I had time to play with YOLO this week, and ported V3 with other derivatives run on my Jupyter notebook on my local PC with eGPU.

It now runs on the latest version of Tensorflow/Keras.

Then, back to your original question about adding one class, I tried to create two classes classification model for “shark” and “aeroplane”. Even my version supports PascalVOC and YOLO format as input, it was difficult to find annotated “shark” image in any form. So, I added YOLO annotation to shark images from Internet by using “labelImg” program. I also used “aeroplane” annotated data from VOC2012, which is PascalVOC format. I also used ImageNet data for both. Then, all are merged into a single TFRecord for training and validation.
The number of images are not big… Just 200 in total for a quick testing. So, training was quite straight forward.

I used “transfer learning”, i.e, loaded pre-trained YOLO weights for Darknet.

Here is the output observed in Tensorboard. It took about 50 min for 200 epochs.

Then, use this re-trained model for inference. Here is the result.

I still need to gather more images, but the result is not bad…

In net, If you select an appropriate version of YOLO for your environment/needs , then, you will quickly be able to get some level of results for object localization and classification. Of course, for the commercial use, we do need fine-grained tuning…

1 Like

I would really like to do this exercise on my own. I always wondered how do we go about labeling images before looking at Labellmg right now.

Is there a resource, article or a blog that you recommend that can walk us over doing the steps you have done?

My approach may be slightly different from what you want. I just get the source code and read. Then, I could get everything what I want.

My recommendation is to read Yolo paper as a starting point. At least two papers are worth to read. Those are Yolo v2 and v3.
The quality of v2 and v3 are not so high. So, you may want to check v5, v7,… newer versions. Actually, v5 is Pytorch based. If you just use it, it may be the easiest one.

And, this is the home of YOLO :slight_smile: , YOLO: Real-Time Object Detection

Regarding the data format for training and testing, as you know, there are two major data format that YOLO handles. (This also depends on YOLO version, though…)
For PascalVoC, V3 Keras TF2 version provides good information, i.e, which part of XML is used for YOLO. You can clone this. Then, ./yolov3-tf2/tools/ gives you required information to handle PascalVoC. (As I did not create a sample data by PascalVoC, I did not use it. But, VoTT is major as an annotation tool.)

For YOLO format, of course, LabelImg is major. It’s a very simple tool. I believe you can start with README quickly.

The most important thing is your choice of YOLO version. Looks like v7 ? is the latest one. Each has different characteristics including supported language, data format, and others. Please select carefully to fit to your objective.

I do not have a good list of links for web articles, but hope the above helps some.

1 Like

Thank you so much, you are the best! These are great starting points. That should be enough to get me started