Autonomous_driving_application_Car_detection Exercise 1

Angel_Ortiz · October 31, 2022, 6:33pm

In exercise 1, I am having trouble creating the mask in this problem. I believe I have the correct dimensions for box_class_scores and box_classes. I printed the shapes. Here is the error I am receiving.

Not quite sure what I am doing wrong:

paulinpaloalto · October 31, 2022, 6:58pm

That probably means you’ve mixed up the computation of “scores” versus “classes”. Note that scores should be floating point values and classes would be integers. The error tells you that your box_class_scores are actually integers. So have a more careful look at that …

Angel_Ortiz · October 31, 2022, 8:14pm

I got it to work. I ended up using argmax for box_classes and np.max for box_class_scores. Also, I was using box_classes to calculate my box_class_scores. It took me a second to fully understand that we are supposed to be getting box_class_scores from box_scores. Can you provide further detail on why this is?

paulinpaloalto · November 1, 2022, 10:32pm

I added some print statements to my yolo_filter_boxes function to show the shapes and data types of the inputs and the generated values. Here’s what I get:

boxes.shape (19, 19, 5, 4)
boxes.dtype <dtype: 'float32'>
box_scores.shape (19, 19, 5, 80)
box_scores.dtype <dtype: 'float32'>
box_classes.shape (19, 19, 5)
box_classes.dtype <dtype: 'int64'>
box_class_scores.shape (19, 19, 5)
box_class_scores.dtype <dtype: 'float32'>
filtering_mask.shape (19, 19, 5)
filtering_mask.dtype <dtype: 'bool'>
sum(filtering_mask) = 1789

To understand why you use argmax to get the classes and reduce_max to get the scores requires understanding the meaning of the data.

The input tensor box_class_probs has shape 19 x 19 x 5 x 80. For each combination of the first three arguments (h, w, anchor_box) you get an 80 element vector that is essentially a softmax output for 80 types of object. So the entry out of those 80 that has the highest value tells us which class of object is the most likely one contained in the corresponding bounding box (which is different than anchor box). So what argmax gives you is the index of the entry with the highest value, right? So that’s a value between 0 and 79 (an integer) that identifies the class. Then the actual score corresponding to that is the maximum floating point value, but first multiplied by the corresponding “confidence” value. To get that you use reduce_max.

All this information about the structure and meaning of the data was explained in the notebook. If my explanation above is not enough, I suggest you read over the notebook from the beginning again with what I said above in mind.

Angel_Ortiz · November 2, 2022, 12:17pm

Thank you so much. I understand it a lot better now

Topic		Replies	Views
Autonomous_driving_application_Car_detection Convolutional Neural Networks coursera-platform	4	691	July 2, 2021
Course 4 week 3 assignment 1: yolo - error in yolo_filter_boxes Convolutional Neural Networks coursera-platform	4	561	September 16, 2022
Autonomous_driving_application_Car_detection YOLO filtering mask error Convolutional Neural Networks coursera-platform	3	587	January 22, 2022
DLS 4, week 3, Programming Assignment: Car detection with YOLO, Exercise 1 - yolo_filter_boxes Convolutional Neural Networks coursera-platform	5	632	December 16, 2021
Week3 Assignment 1 - yolo_filter_boxes Convolutional Neural Networks coursera-platform	2	555	November 24, 2021

Autonomous_driving_application_Car_detection Exercise 1

Related topics