Estimate Distance To Closest Object

Hello,
I have a ML project. I need to estimate the distance to the closest object in a set of images. I can only use scikit learn, and SVR is forbidden. I tried different things like Kneighbors, RandomForest, HistGradientBooster and a lot of different image preprocessing. my best is around mean absolute error of 12cm. My goal is 7.5cm. What do you guys think I should try?

What does your training data look like? Just looking at a 2D image with no indication of the scale, how is that even possible?

It’s around 1000 images with the right distance. And I have to train the model with does pirctures and distances.

Please show us a few examples of the images and the labels. Just considering this in the abstract, you’ve got two problems to solve (in order): 1) identifying which is the closest object in the image and 2) estimating the distance from the camera lens to the object.

To solve 1, you need something like semantic segmentation, which identifies objects in the image, or an object detection system like YOLO.

Since I am a new user, I can only upload 1 thing. And somehow I cant insert a link

Well this is the main (only the template. My code is not included). There is a other file where the image processing is. I have 3000 images and an excel file with the right distance to the corresponding image
main.py (632 Bytes)

With KNN and PowerTransformer, I achieved 12.5cm.

Sorry, there is no useful information in the main.py file. Note that all the functions being called there are custom functions that must also be provided with your assignment, meaning that there is nothing useful we can conclude from the fact that you call a function named load_dataset.

Maybe you could describe for us what some of the images look like. Are they all of similar scale? Do they contain recognizable objects for which there is a reasonable estimate of the size of the object (cars, people, cats)?

I’m really curious if there is something in the image quality (like the clarity of the edges of the labeled object) that the model might be associating with the distance.

Solving for distance using only a single flat 2D image seems like quite a difficult problem.

They are all like this to pictures, Indoor, with or without people, not really sharp.

Thanks for posting the images.

If this is going to work at all, it may have to do with the model internally identifying the converging edges in the images, and that this visual feature is correlated to some geometric perspective.

If there are recognizable objects in the images, those could be used to assess the relative scale and make a guess at the distance.

1 Like

Yes, thanks for the images. What are the labeled distances from those two images? In the first one, what is the closest object? Is it the shoe of the person in front? Or what appears to be a piece of art leaning on the wall to the right? Or does the floor or the baseboard count as an “object”? Inquiring minds :nerd_face: …

In the second picture it must be the front corner of the sign or whatever it is that is leaning on the wall to the right.

I agree with Tom that this seems like a pretty difficult problem. It doesn’t seem plausible that any “off the shelf” type of model is going to “just work” for this problem.

The distance for the first picture is 1.67m and for the second picture is 1.65m.
Yeah I think it does not make sense to give us this task and say that we have to use scikit (since there are other methods which are easier).
This file is for the image processing.
utils.py (2.8 KB)

They say it’s possible to get to 7.5cm so I think there should be a way but I tried for many hours and didn’t got under 12cm, that’s why I asked you guys for some help

That seems a very challenging project.

But I would hesitate to blame any limitations on using scikit learn. It has a lot of tools available.

What would you recommand trying?
The pictures are 300x300 pixels and my best is with a downsampling-factor of 30, PCA of 99% and KNN.

First thing that comes to mind is trying some sort of Convolutional Neural Network. That’s the go-to method if images are involved.

Those images don’t have a lot of detail, so don’t downsample. And no one uses PCA anymore. Computers are now plenty powerful to handle the full image.

PCA was a way to shrink a dataset so it required less processing, back when computers were small and slow.

If it matters how long it takes to train your model, you’ll probably need access to a GPU. If you only have to train it a few times in order to pass your assignment, then you can afford to let your CPU run training overnight.

So a Neural Network, no downsampling and no PCA. Some sort of preprocessing or just the raw image?

Is MLP good?

Tom was suggesting Convolutional NNs, which are very frequently used in image problems. But I am not familiar with scikit learn, so I just googled “does scikit learn support convnets” and it looks like the answer is no. So try a multilayer feed forward net (which I assume is what they mean by MLP) and maybe play with the hyperparameters (how many layers and how many neurons per layer). Also note that this is not a classifier, but a regression, so you’ll have to choose either no activation or ReLU at the output layer and a distance based cost function (e.g. MSE). But I hope I’m telling you what you already know on that front. :nerd_face: