Caltech Bird Detector

I have multiple doubts:

  1. What is the use of the 3rd bounding_boxes_on_image_array function or the 2nd function?
  2. What is the number n in the dimension s of boxes [N,4]
  3. In read_image_tfds why do we divide by 127.5 and then subtract -1 from image?
  4. Why is training_dataset an object but visualization_training_dataset used for length measurements?
  5. The boxes dimension does not signify if its ymin then xmin and so on

Please find my reply below.

  1. What is the use of the 3rd bounding_boxes_on_image_array function or the 2nd function? - utility function to draw a bounding box (a 2d numpy array) on image
  2. What is the number n in the dimension s of boxes [N,4] - N is the number of images
  3. In read_image_tfds why do we divide by 127.5 and then subtract -1 from image? - This is to normalize the original image to the desired scale.
  4. Why is training_dataset an object but visualization_training_dataset used for length measurements? - Could you clarity the question? I didn’t find the function visualization_training_dataset.
  5. The boxes dimension does not signify if its ymin then xmin and so on - Could you rephrase the question?
1 Like