Anyone training their own YOLO CNN ? What do your losses look like?

I’ve spent the last week building out my own implementation of a YOLO v2 CNN to run against the Berkeley Driving Data set I’ve mentioned in other threads. I’ll share a lessons learned in a bit. Seems like it is running now, and doing something, though exactly what I am not sure yet. I instrumented the code using TensorBoard and it is producing this chart for loss per epoch. The numbers feel really large, but can’t find anything to compare against. Have you trained your own from scratch, that is, not starting with pre-trained weights of any sort? What did you see? Thoughts and suggestions welcome.

Happy with the general shape of the curve, but the last ~15 epochs is pretty flat.

I realized after looking at this for a few days is the loss numbers for any CNN training are going to be proportional to the number of predictions being made. My network outputs 19*19*8*6= 17,328 values for each image. With 72 images in a batch, that’s 1.25 million predictions. Doesn’t take much of an error for each to produce some nominally large losses.

sometimes things just go very, very wrong…

Very inspiring @ai_curious! I haven’t had time to do my own implementation of YOLO yet, but it is definitely on my bucket list :smiley:

Batch and Epoch losses are proportional to the number of training examples in each as well as the scale of the values being used. If you use softmax for classification, your classification loss will always be order of magnitude 1. Similarly, if you are using sigmoid activation on the object / no object prediction, losses will be order of magnitude 1. Coordinates losses are not as straightforward, though. They depend on whether or not you are predicting location and size directly, or using an activation, such as \sigma, to constrain, as well as whether you are rescaling to image relative coordinates. I have anecdotal evidence to suggest that there is also a difference in loss for different training examples depending on whether my driving images are daytime and clear weather, nighttime, or raining.

My takeaway is that the loss values per batch or epoch are not comparable across projects because they depend on exactly how you are handling data during the training. HTH

1 Like