ML Question on extracting features from sequence of images

VIVEK_Mehrotra · January 22, 2023, 3:35pm

What model would you recommend for creating one image to feed to a YOLO model? But that input image is to have features from a sequence of images captured by a camera.
Thanks

Elemento · January 22, 2023, 5:26pm

Hey @VIVEK_Mehrotra,
Welcome to the community. Please help us understand your question a bit more.

As per the above statement, I thought that you want to artificially create data for an object detection model. But the next statement, says that these generated images should have features from sequence of images, each.

For starters, what kind of features are you referring to here? To the best of my knowledge, I haven’t heard about the concept of extracting features from a sequence of images to create a new image.

Cheers,
Elemento

paulinpaloalto · January 22, 2023, 5:32pm

I think we need a bit more information to know how to answer this. Are you talking about training a YOLO model or just using a YOLO model that has already been trained? If the latter, then there is nothing special you have to do other than perhaps resize the input image to the size and format that the model expects. A trained YOLO model takes individual images and returns the identifications of the objects it is able to recognize in each image, in the complex form described in the lectures and the assignment.

Training a YOLO model is (as you would expect) a much more complex task, given that it requires labelled images, which in the case of YOLO is pretty complex, including anchor boxes, bounding boxes (not the same thing) and object classes. There are a number of really great threads here on the forums about YOLO from one of our fellow students who has invested serious time and effort working with YOLO and has shared the knowledge gained. If you want to dig deeper than what you can learn from the lectures and the YOLO assignment, here’s a good thread to start with and it provides links to some of the others.

VIVEK_Mehrotra · January 22, 2023, 6:13pm

Thank you for clearing my understanding. One input image does not have all the information. A sequence of images may carry the complete information. I am talking about using the model, not training.
Looks like you are suggesting each input image would be processed by the model and the final outputs then looked at and deduced outside the process of the ML model? Thank you for sharing.

ai_curious · January 22, 2023, 8:06pm

YOLO is designed to solve a different problem. It takes images input one at a time, runs each one through its forward propagation and outputs a set of predictions regarding object localization and classification from that image and that image alone. The YOLO pipeline does not deal with fusing features or information from multiple input images.

+1 on what @paulinpaloalto suggests above, which is the community needs more information about what information derives from multiple images. Object movement, perhaps?

If it is object movement, or tracking, you might find some useful ideas here: OpenCV Object Tracking - PyImageSearch

Looking forward to learning more

VIVEK_Mehrotra · January 22, 2023, 9:21pm

License plate for instance

Juan_Olano · January 22, 2023, 10:13pm

Once in production, will you be comparing the license plate with one stored in a database to, say, grant access?

There is a similar lab in Deep Learning Specialization for detecting faces to grant access.

VIVEK_Mehrotra · January 22, 2023, 11:03pm

Not comparing from database, but a ML problem of predicting licence numbers on the fly after training from an input stream of frames

VIVEK_Mehrotra · January 22, 2023, 11:04pm

Was wondering if a model could be applied to the stream first and then a single image to YOLO

Juan_Olano · January 22, 2023, 11:17pm

Curious about it: by “predicting license numbers on the fly” you mean "predicting what’s the next license to pass by a certain place, for instance?

For instance, in a shopping center with an entrance… this system would predict what’s the next license plate coming in? or going out?

Juan_Olano · January 22, 2023, 11:20pm

You can have a video stream, yes, and you can identify the license plates of the vehicles, yes, and assuming enough resolution, you can read the license plates, all this with YOLO, and even more efficiently and accurately with YOLO v8. And this would be a real time identification.

Check out the image below - real time identification of objects.

This GitHub is an example of YOLO used to detect license plates.

And this other project reads the plates.

Another option to implement what I think you are trying to do (Read license plates) is by using OpenCV, a very good library for computer vision.

VIVEK_Mehrotra · January 23, 2023, 12:18am

Thank you very much, Juan. This ML community is so helpful…

VIVEK_Mehrotra · January 23, 2023, 12:19am

No not predicting the next one. Just the current one from a stream

Juan_Olano · January 23, 2023, 12:31am

Got it. Then, some of the above links will certainly help you.

ai_curious · January 23, 2023, 1:00am

It sounds to me like there is no information from the sequence of images. You grab a frame from the video, process it. Grab another frame, and process it independently. You are not trying to track object movement from frame to frame, which is something an autonomous vehicle, for example, needs to do. Do an interweb search on machine learning license plates and you’ll find a bunch of hits.

Juan_Olano · January 23, 2023, 1:21am

Check out this youtube:

Live and moving tracking of objects. This one was done with YOLO v3

Does it look like what you are looking for?

ai_curious · January 23, 2023, 2:23am

I’m not convinced that any version of YOLO does tracking, which requires localization, classification and identification. By that I mean it has to know that the object classified as a dog in frame 1 and the object classified as dog in frame 2 are the same object. Or be able to distinguish and follow all the individual humans at the subway from frame to frame. Not just put a class labelled frame around them. In addition, I am not aware that any version of YOLO can read characters on a license plate. To the best of my knowledge it can only localize and classify. Very confident of this for v1, v2 and YOLO 9000, but I will reread the v3 and v4 papers at earliest opportunity.

Juan_Olano · January 23, 2023, 2:52am

May be a combination of models?

With YOLO v8 you can certainly identify the license plates in real time, in motion. So the next step is to extract the data from each license plate. If YOLO cannot read the license plates, we can extract them and pass them to OpenCV to read them.

Since Yolo v8 was released, there’s been plenty of posts with sample applications, some include the license plates project provided above.

@ai_curious what do you think?

PS: I’ve been looking at previous versions of YOLO and even from YOLO v3 there’s the capability to localize, classify and identity objects in real time, in motion. Here’s one more link, now with YOLO v3, showing how to do it.

ai_curious · January 23, 2023, 3:50am

I do think it will require an ensemble to read characters on license plates. One part or model to localize the plate, one part to do OCR on the alphanumeric of each plate image region.

On the linked v3 how-to video at the 22:50 mark the narrator says ‘running it as frames’ which is what I mention above. Each set of predictions is run independently on a single frame image input. Because the predicted bounding boxes from the sequence of frames appear to move when overlaid on the video, it looks like tracking is happening. But I don’t think it is…it is only localization (and visualization) happening quickly. I think it is only the human observer that is connecting the bounding box ‘movement’, like drawings on a stack of cards flipping past your eyes, and doing the ‘tracking’, not YOLO v3.

VIVEK_Mehrotra · January 23, 2023, 4:10am

Thank you both for this interesting piece of information. I am thinking detection followed by segmentation followed by OCR. Which models to use for the 3 pieces is not clear still.

Topic		Replies	Views
Data preparation for multiple objects in a single image Advanced Computer Vision with TensorFlow week-module-1	4	543	May 19, 2022
[Week 1] YOLOv4 is a one-to-one architecture, right? Sequence Models coursera-platform	2	523	May 22, 2022
YOLO w/ video AI Discussions ai-discussions	2	122	March 7, 2025
Grids in YOLO Algorithm Convolutional Neural Networks week-module-3 , coursera-platform	6	418	January 15, 2024
https://www.coursera.org/learn/convolutional-neural-networks/lecture/fF3O0/yolo-algorithm Convolutional Neural Networks coursera-platform	5	695	March 12, 2023

ML Question on extracting features from sequence of images

Related topics