Lidar and camera fusion for object distance estimation in self-driving vehicles

Hey guys, I’m working on the perception of a self-driving car and I will like to estimate the distance between my car and other objects beside it. So I don’t know how to do the fusion of the camera data and lidar data so as to get the object detection and classifications of the camera and the distance estimation of the lidar. I’m thinking of doing a projection of the lidar 3D points cloud on to the camera image but after getting the image how can I calculate the distance using the image’s pixel ??

I read somewhere that If i know the object dimensions like width in the real world I can calculate the scale factor then calculate the distance but in my case I don’t know the objects dimensions.

It’s an interesting question, of course. At a first principles level, this is just 3d geometry. If the “PoV” of both sensors were the same point, then it would be straightforward, wouldn’t it? In that case you could literally map the lidar points onto the image and then use the object detection and you’re basically done. Well, maybe there’s a bit of distortion created by the camera lens that you need to compensate for and you’d probably need some sanity checks: e.g. you have lidar points with very different distance values mapping to the same object identified in the camera image. But things get bit more complicated because the PoV of the two images is probably not the same and might even be quite a bit different, say on the order of 1 meter or more.

But it must be the case that you’re not the first person to be faced with this problem. Have you tried doing a search for papers on this? Maybe the key info is proprietary, but you’ve got to believe that Waymo, Aurora, Zoox and others have solved this problem. Surely they’ve written a paper or made some conference presentations to brag about their technology that would at least point you in a direction, even if they don’t release the design or the code.

@Ndulkalion I agree with what Paul said as POV is going to be one of your major points of concern.

My thought is: Why not just calibrate it ?

I.e. Try to position the camera/LIDAR as close as possible on your Z axis (or, I guess, however you plan to use it, but keeping the same axis for starters makes this a lot simpler to find the relationship) in a fixed position and put yourself together a target (something with reg marks or similar). Then derive images from the camera and the lidar at the same time. Repeat this step a number of times at known distances from the sensor(s).

Doing this should give you a pretty good dataset to work with in understanding the function between the two sensors.

P.s. Reg marks are so that you actually get some ‘useful’ (i.e. easily interpretable info from your camera image), rather than just ‘this object is there’.

Yes. I did some research about that… There is an interesting paper talking about the fusion of these sensors and have some good results so I’m trying to implement what they did even though they didn’t gave the codes. The paper name is " LiDAR and Camera Fusion Approach for Object Distance Estimation in Self-Driving Vehicles" by G Ajay Kumar and Jin-Hee Lee. But upto now I didn’t succeed.

1 Like