Week 3, Assignment 1: `box_class_probs` in `yolo_filter_boxes`

I am working on the yolo_filter_boxes() (exercise 1) and am confused by the input data.

The input includes box_class_probs, which is a tensor of shape (19,19,5,80). So if I understand correctly, that represents an image with 19x19 units. For each element of the 19x19 square array there are 80 5-long vector with the elements representing obj_yes_or_no, 𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤 (one vector for each object type). For each 5-long vector for eacb element of the 19x19 square array there are 80 probability values.

So therefore if I take the slice box_class_probs[0,0,0,:], the 80-long vector I get is the list of probabilities for a single one of the units iu the 19x19 grid.

Is that all correct? Just want to make sure I haven’t gotten confused, I have a hard time thinking in more than three dimensions.

But when I print that vector, it doesnt look like probabilities. Instead of being values between 0 and 1 as I would expect for probabilities, they are like this:

tf.Tensor(
[ 4.8387423   7.328443   -0.7113974   6.6432576  -2.409118    3.4563496
 -1.0087581   6.6609774  -0.32172585 -1.3336947   2.7046719  -1.0027521
  1.2326477   1.7546655   6.731361   -3.2655444   3.0346055  -4.070446
  2.7174702   2.3567047   8.899788   -3.7231026  -3.8897858  -1.5394197
 -2.0705626  -5.0586677   2.354822    1.4210849   0.9612765  -0.7146739
 -1.6633306   3.8747072   4.1295757   2.9190474  -2.3499708   0.54025215
 -4.309061    5.383872    0.80084455  1.3144112  -0.94119895  3.6818004
  2.1280382  -5.601334    4.6845837  -2.130794    5.486091    4.6491737
  1.4902996  -3.1898413   1.4414312  -1.8833437   5.5230894   5.149619
 -2.2734754   5.937464    5.5342307  -0.68562555  3.8225415  -4.2720127
 -3.2602382   2.4359722   4.1098633  -0.02034247  4.6250935  -1.3381572
  3.6894886   6.286919   -0.08070612  0.62090886  0.5780692  -4.2318873
  2.5384698   7.987104    7.524409    1.2095237  -0.2506355   0.7551234
 -1.9012783   1.1722656 ], shape=(80,), dtype=float32)

Can someone help me identify where my disconnect is?

Does this help?

2 Likes

Yes that does help, thanks! I did miss that note and I agree it would be useful to have it before the exercise (or to just use realistic values).

1 Like

+1

Yes, visualizing things in more than 3 dimensions is always a challenge.

And in this case, you’ve got the additional obstacle that ai_curious’s link points out: the test cases here are synthetic and not realistic. Whoever created them must have been in a hurry and wasn’t thinking quite hard enough. They used random gaussian distributions to create the test cases, so the numbers don’t really make sense as probability values: each 80 unit vector should look like a softmax output with values between 0 and 1 which add up to 1. But if you write the code correctly, it should still work. E.g. you just need to use argmax to find the largest value for p_c * class. That still works, even if the test values don’t look like values you’d see with “real” data.

I’ll take it under advisement to file a suggestion that they could consider improving the test cases here. It wouldn’t be that hard for them to rewrite the test cases to at least use a uniform distribution between 0 and 1 for the probability values. Making them an actual softmax output might be overkill and wouldn’t really add any value.

They do a pretty thorough job in the earlier parts of the notebook of walking through and describing the data, so that might also be worth another read through if you’re still feeling a bit uncertain.

1 Like

Yeah, also come to think of it, same with the pixel dimensions, it makes no sense that they aren’t integers. But anyway, if it works I guess it doesn’t matter for the purpose of the exercise.