When searching for papers or datasets related to object detection, you are likely to come across references to the Caltech 101 dataset. It is potentially interesting because it has both class labels and object bounding boxes. I downloaded and untarred it and took a quick look. Here are my initial observations.
Pro:
-
9,000 image files with class and bounding box labels. More than some of the toy datasets out there that have only a few hundred images, but still manageable for doing quick experiments
-
101 classes (they call them categories). Again, more than many toy datasets, but still easier to consume than say ImageNet’s 1000
-
image files are small, roughly 300x300, so they don’t take a lot of space, either on disk or in RAM, load quickly, process through the CNN quickly
-
adequate fidelity for most images I have looked at, though see below
Con:
-
the labels, or annotations they call them, are stored as MATLAB files. You’ll have to write a subroutine or use one from a library to open and read them. I used
scipy.io
-
the images are not all the same size, which is an extra headache to deal with. Resizing to a standard input shape for the CNN isn’t that big of a deal, but then you have to mess with the bounding boxes, too.
./CalTech101/Images/beaver_0045.jpg is: (300, 175)
./CalTech101/Images/beaver_0046.jpg is: (300, 203)
./CalTech101/Images/bonsai_0001.jpg is: (280, 300)
./CalTech101/Images/bonsai_0002.jpg is: (265, 300)
./CalTech101/Images/crayfish_0055.jpg is: (300, 147)
./CalTech101/Images/flamingo_0002.jpg is: (80, 300)
- most of the images have reasonable foreground and background complexity, but some have a completely white background, and some of the images are not from photos, but are cartoons.
So far I have only found single object images, so maybe not so interesting for YOLO.
Overall, it seems adequate for small experiments and self-enablement projects only. If you’re hoping to train your own autonomous vehicle, keep looking.
https://data.caltech.edu/records/mzrjq-6wc02
NOTE: TensorFlow offers a pre-built TF Dataset version but it is only for classification…doesn’t include the bounding boxes