*no_objects_loss = no_object_weights * K.square( - pred_confidence)*

Anyone have a good explanation for why one would take the negative of values you’re about to square? Is it just me?

*no_objects_loss = no_object_weights * K.square( - pred_confidence)*

Anyone have a good explanation for why one would take the negative of values you’re about to square? Is it just me?

Can you give a reference to where you saw this?

Is it from a lecture, if so please give the title and a time mark.

from keras_yolo.py

same code is in the original tf/keras port of Darknet circa 2017. In two cases it is clearly square of difference, so maybe the third one just a python shorthand for (0 - pred\_confidence) ? It seems like an unneeded computation.

I agree, that seems unnecessary.

I reformatted the code a little and ended up with this:

```
no_object_loss = no_object_weights * K.square( - predicted_presence)
object_loss = has_object_weights * K.square(1 - predicted_presence)
confidence_loss = object_loss + no_object_loss
```

Here’s what I think is going on. *predicted_presence* represents the YOLO CNN output predicting whether an object is detected or not. I was confused that the loss function wasn’t explicitly comparing to ground truth in this step, as it does for location and classification. However, notice that the ground truth for “has object” is a 1, while ground truth for “has no object” is 0. I think the first line above is equivalent to:

```
no_object_loss = no_object_weights * K.square(true_presence - predicted_presence)
```

*but only for the cells where there is no object in the ground truth*, ie true\_presence = 0. For these cells, the squared error is (0 - predicted\_presence)^2

Similarly, the second line can be thought of as:

```
object_loss = object_weights * K.square(true_presence - predicted_presence)
```

for the locations where true\_presence = 1. Or:

```
object_loss = object_weights * K.square(1 - predicted_presence)
```

EDIT - removing the line where I said I think the code is correct. I no longer am confident in that

1 Like

So I spent way more time looking at this one line of code than is healthy, and am back to thinking there is a problem here. In an attempt to validate my implementation as well as isolate the source of any loss I ran the *yolo_loss()* function with the same values for both *true* and *predict*. The exercise was helpful in removing the last hiccups in my code, but I wasn’t getting expected output from *confidence_loss* and this line of code is the reason.

Here is the equivalent line from the yadk toolkit code provided with the class exercise:

```
no_objects_loss = no_object_weights * K.square(-pred_confidence)
objects_loss = (object_scale * detectors_mask * K.square(1 - pred_confidence))
confidence_loss = objects_loss + no_objects_loss
```

Here is where values are assigned to *pred_confidence* inside yolo_head():

```
box_confidence = K.sigmoid(feats[..., 4:5])
```

Notice the use of *sigmoid()*. If the output of the YOLO CNN is 1, *box_confidence* is sigmoid(1). Let’s say for sake of argument that this is a location that does indeed have an object in it, so the prediction is a ‘correct’ one. Here’s what happens inside the loss function (NOTE: *box_confidence* is returned from *yolo_head()* and assigned to *pred_confidence* inside *yolo_loss()*, though there is a(nother) bug in the code here and the return objects are in the wrong order). So for this thought experiment the distance being computed in the subtraction is (1 - sigmoid(1)):

```
objects_loss = (object_scale * detectors_mask * K.square(1 - sigmoid(1))
```

*object_scale* is a non-zero constant. We’re asserting that *detectors_mask* has a 1 in this cell since it is a ‘correct’ prediction in terms of object presence. So, spoiler alert, *objects_loss* ain’t 0 either.

Even if you have a 100% perfect prediction, you still get non-zero *objects_loss*, a non-zero *confidence_loss*, and non-zero total *loss* returned to the optimizer by the *yolo_loss()* function. I’m struggling to understand how that could be correct. Am I missing something here?

ps: for the record, sigmoid(1) is 0.731058 and sigmoid(0) is 0.5

1 Like

and here’s the bug related to yolo_head() return objects:

```
pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(...)
def yolo_head(feats, anchors, num_classes):
"""Convert final layer features to bounding box parameters.
...
return box_confidence, box_xy, box_wh, box_class_probs
```

This code could never have been used as-is

@jonaslalin honestly I think I need to step away for a while until I can see this clearly. In the YOLO v3 paper JRedmon reiterates the prediction formulas as:

b_x = \sigma(t_x) + c_x

b_y = \sigma(t_y) + c_y

b_w = p_we^{t_w}

b_h = p_he^{t+h}

Pr(object) * IOU(b, object) = \sigma(t_o)

and then states:

*…ground truth … can be easily obtained by inverting the equations above*

So I would expect ground truth to look like this:

t_x = logit(b_x - c_x)

t_y = logit(b_y - c_y)

t_w = log(\frac{b_w}{p_w})

t_h = log(\frac{b_h}{p_h})

t_o = logit(Pr(object) * IOU(b, object))

But that isn’t what I find in the toolkit code. In keras_yolo.py the code looks like this:

```
detectors_mask[i, j, best_anchor] = 1
adjusted_box = np.array(
[
box[0] - j, box[1] - i,
np.log(box[2] / anchors[best_anchor][0]),
np.log(box[3] / anchors[best_anchor][1]), box_class
],
dtype=np.float32)
matching_true_boxes[i, j, best_anchor] = adjusted_box
```

Only the (w,h) components match the ‘inverted’ prediction equations. I am struggling to understand why. Further, in the yolo_loss function, the computations for object and classification loss are done directly on the predicted values eg t_w and t_h, while the coordinates loss uses e^{t_w} and e^{t_h}.

Object presence always uses \sigma(t_o) in the loss function, so how can that be compared with the value in the detectors_mask, which is 1. That will never yield a 0 loss, even if you assign the same data for both true and predict. Doesn’t it seem like sending the same data for both into the loss function would generate 0 loss? It doesn’t as written in the first code fragment above.

The papers are terse and the code is cryptic at best, fully undocumented at worst. Whatever time you initially think it will take to implement your own YOLO, budget for 10x

Lesson learned unfortunately, I haven’t had the opportunity to dig out the details of yolo implementation myself. Hopefully someone else here can assist.

Otherwise, one way might be to add issues in their GitHub repos and request additional documentation/clarification for implementation steps. Surely, more people probably have the same questions. Have you tried that way @ai_curious ?

I stared at and executed the yolo loss function locally enough times that I think I finally resolved my confusion over \sigma(t_o) versus t_o. The explanation is the same as the one for the location center coordinates t_x and t_y.

Object presence t_o is predicted such that 0 <= \sigma(t_o) <= 1. To derive the ground truth value \hat{t}_o You could invert that and say \hat{t}_o = logit(\sigma(t_o)). But it turns out you don’t actually ever need to use t_o directly. You only need to use \sigma(t_o), which we are defining to be the object presence prediction (or confidence…I think both terms are used in the various YOLO papers.)

For ‘correct’ locations, we want to compute the square error between the truth value, 1.0 , and the predicted value \sigma(t_o) and for the ‘incorrect’ locations compute the square error between the truth value, 0.0 , and the predicted value \sigma(t_o). In code this looks like:

```
object_loss = … K.square(1 - predicted_presence)
no_object_loss = … K.square(0 - predicted_presence)
```

So really the only question is why bother to subtract *predicted_presence* from 0 in the *no_objects_loss* term since we square it anyway.

3 Likes