Instrumenting YOLO Training with TensorBoard

The YOLO loss function is complex. It computes loss derived from the bounding box coordinates, misclassification, and the object detection truth table. All those components get weighted and combined into a single loss value used by the optimizer, which unfortunately is the only default output of training iterations.

In order to better understand what was contributing to that loss value, I used TensorBoard to collect and report on the various components. First, here are the results:

The classification loss is 0! OK, I only have 1 type of object labelled in this data (cars). Softmax can’t screw that up even if it wanted to.

    predicted_class_probs = K.softmax(predicted[...,5:] #predicts class(es)
    truth_class_probs = K.softmax(truth[0:num_images,:,:,:,5:])  #GT class(es)
    classification_loss = classification_weights * K.square(matching_classes - predicted_class_probs). #vectorized
    classification_loss_sum = K.sum(classification_loss). #single val for this training batch

TensorBoard YOLO Object Loss
Objects Confidence Loss


No Objects Confidence Loss


Total Confidence Loss

    no_objects_loss  = no_object_weights  * K.square(sigmoid0 - predicted_presence)
    objects_loss     = has_object_weights * K.square(sigmoid1 - predicted_presence)
    confidence_loss = objects_loss + no_objects_loss. #vectorized
    confidence_loss_sum = K.sum(confidence_loss). #single val for this training batch

TensorBoard YOLO Coordinates Loss
Coordinates Loss

    coordinates_loss = coordinates_weights * K.square(truth_boxes - predicted_boxes). #vectorized
    coordinates_loss_sum = K.sum(coordinates_loss) #single val for this training batch


Total loss

        total_loss = 0.5 * (confidence_loss_sum + classification_loss_sum + coordinates_loss_sum). #total loss per batch.  TF rolls this up per epoch automagically

The TensorBoard code to produce these is fairly straigtforward.

%load_ext tensorboard

   #TensorBoard housekeeping
log_dir = './logs/' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(log_dir + '/metrics')
file_writer.set_as_default()

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

   #train model
history = model.fit(x_train, 
                y_train, 
                batch_size = TRAINING_BATCH_SIZE, 
                epochs=20,
                callbacks=[CustomTrainingCallbacks(),
                           tensorboard_callback])

   #inside the loss function
    tf.summary.scalar('confidence_loss_sum', data=confidence_loss_sum, step=self.step)
    tf.summary.scalar('classification_loss_sum', data=classification_loss_sum, step=self.step)
    tf.summary.scalar('coordinates_loss_sum', data=coordinates_loss_sum, step=self.step)
    tf.summary.scalar('no_objects_loss_sum', data=no_objects_loss_sum, step=self.step)
    tf.summary.scalar('objects_loss_sum', data=objects_loss_sum, step=self.step)
    tf.summary.scalar('coordinates_loss_sum', data=coordinates_loss_sum, step=self.step)

I’ll use this visualization to examine why confidence and no_objects seem to be responding to training, but coordinates and objects loss are not. Is there a bug in the loss function? Can I impact that with different weights or hyperparameters? Is it different for different training sets? etc

1 Like

Hi ai_curious,

Did you find the solution to this question. If so, please share at this platform so that others can have an idea on how you have worked out on that part?

Thanks!