How does the viz_utils plot object detection output?

Hi all,

How does the viz_utils.visualize_boxes_and_labels_on_image_array plot the predicted bounding boxes? I noticed that there are 100 predicted boxes per image, each with its own score (sorted by highest score first). Does the viz_utils.visualize_boxes_and_labels_on_image_array plot the highest score bounding box?

from object_detection.utils import visualization_utils as viz_utils

def plot_detections(image_np,
                    figsize=(12, 16),
    """Wrapper function to visualize detections.

    image_np: uint8 numpy array with shape (img_height, img_width, 3)
    boxes: a numpy array of shape [N, 4]
    classes: a numpy array of shape [N]. Note that class indices are 1-based,
      and match the keys in the label map.
    scores: a numpy array of shape [N] or None.  If scores=None, then
      this function assumes that the boxes to be plotted are groundtruth
      boxes and plot all boxes as black with no classes or scores.
    category_index: a dict containing category dictionaries (each holding
      category index `id` and category name `name`) keyed by category indices.
    figsize: size for the figure.
    image_name: a name for the image file.
    image_np_with_annotations = image_np.copy()
    if image_name:
        plt.imsave(image_name, image_np_with_annotations)

test_image_dir = '../../data/processed/gt/test'
test_images_fps = glob.glob(os.path.join(test_image_dir, "*.tif"))
test_images_np = []
for i in range(1, 20):
    image_path= test_images_fps[i]
    test_images_np.append(np.expand_dims(load_image_into_numpy_array(image_path), axis=0))

# Again, uncomment this decorator if you want to run inference eagerly
def detect(input_tensor):
    """Run detection on an input image.

    input_tensor: A [1, height, width, 3] Tensor of type tf.float32.
      Note that height and width can be anything since the image will be
      immediately resized according to the needs of the model within this

    A dict containing 3 Tensors (`detection_boxes`, `detection_classes`,
      and `detection_scores`).
    preprocessed_image, shapes = detection_model.preprocess(input_tensor)
    prediction_dict = detection_model.predict(preprocessed_image, shapes)
    return detection_model.postprocess(prediction_dict, shapes)

# Note that the first frame will trigger tracing of the tf.function, which will
# take some time, after which inference should be fast.

label_id_offset = 1
for i in range(len(test_images_np)):
    input_tensor = tf.convert_to_tensor(test_images_np[i], dtype=tf.float32)
    detections = detect(input_tensor)

        detections['detection_classes'][0].numpy().astype(np.uint32) + label_id_offset,
        category_index, figsize=(15, 20), image_name="test_image_output.jpg")




1 Like

Hello Alex,

object detection for visualisation is done with set parameters of API.

When we run the model, usually the resultant values are tensor objects

The key name detection_scores, the associated value is a tf.Tensor, containing the model’s predictions.

Then we write a code to convert the tensor value into regular numpy arrays.

We use a list comprehension and call .numpy on each tensor, which will retrieve the numpy array for each of those tensors.

Then call results.keys, you can see the list of keys from the model’s output.

Three of these keys are standard, which means they will appear in the outputs of all object detection models using this API and these three are detection_scores, detection_classes, and detection_boxes.

So then, if you want to draw your boxes with their labels and scores on the image, you can just call the visualization utils API for this.

In this case, it’s visualize_boxes_and_labels_on_image_array. At a minimum, you should set the parameters as follows.

The first is the numpy array containing the image. If it’s a single image, it will be indexed at 0, as well the results.

Next are the detection_boxes, followed by the detection_classes. The class integers that are output by the model are integers that start counting from 0.

The actual data labels start counting from one. So in this particular case, you can add an offset of one to convert the model class integers to the class IDs that are used by the data set.

The detection_scores say how confident the model is in its predicted classes. The category_index is what you created when you map the label numbers to their text.

You’ll usually want to set use_normalized_coordinates to true. The bounding boxes stored in result detection boxes have coordinates whose values range from 0 to 1. So they’re considered normalized coordinates. So for example, the bottom right corner of a bounding box may be 1 across and 0.5 down.

To turn these coordinates into pixel coordinates that will be overlaid on top of the image, this function will convert the normalized coordinates into the denormalized coordinates.

For example, the normalized coordinates may become 256 across and 128 down. Setting use_normalized_coordinates to true lets the function know that the bounding boxes you give it are normalized, so it knows to scale up to the coordinates of the image.

The minimum score threshold determines which labels are shown and drawn with bounding boxes. You will give the function the detection scores which say how confident the model is in each predicted object.

For any scores that are below this threshold, the bounding box won’t be visualized by this function.

So it is not about highest score first but creating a object detection API with set parameters and threshold which each of these bounding boxes are passed and the one which are below this threshold, bounding box won’t visualise them.

Hope this clears your doubt!!


1 Like

Hi DP,

Thanks for the detailed response! I will review in full today, but it clarified alot of my questions.
Can you expand on why there are 100 detections (with corresponding scores)? Is this because of the “region sampling” the OD API does? Sometimes there are many box predictions that pass the threshold (of 0.8), which then results in many boxes being drawn over the object.


Hello Alex,

Are you asking this as per the assignment for week 2?

If you read the previous comment, you will understand the object detection API is created in such a way that even if there are more than 100 detections it, the set parameters of detection_boxes, detection_classes and detection scores (which are considered to be ket standard for the output of all the object detection based on threshold of choice for eg. 0.8 as you stated if that part of bounding boxes scores is below the threshold parameter, the object will not visualise.

Yes, I am asking for the assignment per week 2. I understand the plotting function very well now thanks to your response. I am still unclear on why there are 100 separate predictions per image. Can I change this number to 50, or 150? Is this a parameter in the config file?

Hello Alex,

I just checked Course 3 week 2 assignment which indicates target size to be used is 150, so I am not sure which section of config. file you are mentioned. Can you share screenshot where it shows 100 separate predictions per image in the assignment.


Hi DP,
I was actually able to locate in the config file where you can change the number of boxes predicted. For faster_rcnn, you can see the attached below:
I was just trying to wrap my head around why there were 100 predictions, but realize this is a set parameter in the config file that can be changed. Thanks for your guidance!