Also BTW I’ve got to say that I’m not sure that I believe the training data is very good. In that second image of a hallway, I assume they mean that display case as the “nearest object” and that looks to me like it’s a lot more than 1.65m. I’d say more like 3m or 4m away. Or if they mean the sign that’s leaning on the wall on the right, that looks closer than 1m. But how can we infer the scale here? Hmmmmm.
Hey I thank you very much for your advise. I will run something over night.
But I agree that this is not the most suitable project for a beginner course. But I’m interested in it and want to get better.
Well, I let it run for around 24 hours with different paramter and it does not get under 20cm. KNN is still better by 8cm.
Personally, I don’t see how KNN has a hope of giving a high-accuracy model for this dataset.
Hello @Anony,
We may also focus on data processing.
First, from your posts so far, I am supposing we are predicting the distance to any object at the center of the image and the camera was the same one. ( I am imagining a device with a camera fixed with some invisble-light-based distance measurement units to produce the image and the label at the same time). This fixes which object is being predicted. This assumption may be wrong or not verifiable, and even though I based my following discussion on it, please also think about them without my assumption because they may apply in other cases, too ![]()
-
Can we multiply our training data by rotating them, because the distance shouldn’t change with rotation. Your images look square so it is easy to do all 90, 180, 270 degree rotation!
-
If the data have a wide range of brightness and again since it doesn’t affect distance, we may further multiply by having copies of different brightnesses. Usually it is easier to dim bright ones down because making it brighter may saturate. Extreme dimming/brightening may not be good, too.
-
Is color really that important? Would gray scale be sufficient and getting rid of unnecessary noise?
Techniques above fall into the category of “data augmentation” which is a keyword you may use for further search ![]()
If my assumption is correct, then even though sending the image to a model is not a bad idea, we can also try to send features instead. What kind of features?
-
My initial consideration here is simple: we use the sizes of the objects around to guess the distance, so any feature extraction techniques that may get rid of the details but focus on object detection may help. I suggest you to search for what other tools are available to you but if you can also use “scikit-image”, then Histogram of Oriented Gradients may be one of your candidates, because if you look at the resulting image therein, it get rids of many details.
-
“Local binary pattern”, on the other hand, speaks for the distribution of “textures” in your images. It may be interesting because close stuff tends to show clearer texture?
-
You see, in the two techniques above, one produces a 2D map and one a 1D historgram, so when thinking about features, they don’t have to be in any particular D. It can be 0D - meaning some scalar descriptors such as the overall brightness (rationale being that if the lighting is rather constant, then some closer objects may appear darker) and in this case you may want to convert your image from RGB (if it is) to like HSV because the “V” means brightness. Further, if my assumption is correct and the center point is the key, you may pass an average brightness in that region. Or we can think differently, instead of just the brightness, we compute some “Color moments” to see the spread of brightness or the skewness of brightness, and so on.
So, we can have processed 2D images, 2D maps, 1D or scalar data, while we can blind try any techniques, we can also try to give each of them a rationale like my examples to aid us understand the outcome.
Don’t forget to explore more other techniques! ![]()
Cheers,
Raymond
PS: even though scikit learn (extending to include scikit image in my discussion) doesn’t give us Convolutional Neural Network, we can still benefit from the core idea of CNN by using, for example, the two techniques - HOG & LBP - I suggested above because they both compute local regions of image.
I asked and skimage is not allowed and he told us that “you dont need very advanced stuff” whatever that means.
Here’s what i tried so far.
MLP, HistGradBooster and KNN. I used this with a wide range of paramteres, different downsampling-factors(bcs 300 * 300 * 3 are a lot of features and it takes really long), different PCA, PowerTransformer, standartScaler. I used HOG, mirrored the images and nothing works. Surpisingly, the best is KNN, downsampling-factor of 35, Powertransformer, PCA(n_components=0.99) and no preprocessing the images. To be completely honest, I don’t know why this is the best
How many units and layers and what activations did you use in your multi-layer NN?
Hello, @Anony,
I understand your last response’s focus is on tuning the model parameters, but I can’t further that without inspecting your data.
If I were you and I was running out of modeling hyperparameters to try (though I probably wouldn’t really run out because they are likely unbounded), I would look at data processing, for a few reasons:
- It’s part of the machine learning workflow
- Not all of them requires scikit-image nor intensive work
- It’s my freedom to do whatever possible to learn. I don’t need permission for that.

Some think that “I should let the model do all processing for me”, but data processing, to me, is a process for me to learn how I may model the data better in the future, because I can see how it changed the data and model performance, plus traditional data processing approach is mostly explainable and not dark-box.
![]()
Just my opinion ![]()
Cheers,
Raymond