The PXCProjection::ProjectDepthToCamera function

linhuan_h_ · ‎08-19-2016

Hi, all.

I want to know the actual formula derivation of this function ProjectDepthToCamera(). How to map a point (u, v) in image plane to a 3D point (x, y, z) in camera coordinate.

Can anyone help me?

Thanks a lot

samontab · ‎08-19-2016

Well, since RealSense SDK is closed source, it's hard to know exactly what approach is used, but I'll try to shine some light on the subject for you.

In the ideal case based on a perfect pinhole camera, you are reprojecting a point in the image plane, which only has two coordinates, into a 3D world location. This of course means that you have an infinite number of solutions composed of all the 3D points along the line formed by joining two points: the optical centre of the camera, and the location of the pixel in the image plane. If you want to know the location of the object in 3D, you need more information.

This extra information can be encoded in many ways and represents depth, which basically gives you the extra bit of information that you need. You already know the direction of where the object is, and with depth you know where in that line the object is in 3D.

This is the ideal case, where you have closed solutions for the location of the object. Have a look at the pinhole camera model for the equations.

Now, in the real world, cameras have non-perfect lenses, which create distorted images. There are some models that you can apply to reduce barrel distortion, and other issues. You need to apply these formulas first to create an image that is closer to the ideal case.

Then, you also have issues with the method for estimating depth, which will depend on the actual method that you're using. You will need to calibrate the camera, and correct the images accordingly, to generate data closer to the ideal case.

There are different methods to model the distortions, and calibrate your cameras, and some of them are not a closed formula but instead they minimise certain error metric to get a good enough solution.

linhuan_h_ · ‎08-31-2016

samontab wrote:

Well, since RealSense SDK is closed source, it's hard to know exactly what approach is used, but I'll try to shine some light on the subject for you.

In the ideal case based on a perfect pinhole camera, you are reprojecting a point in the image plane, which only has two coordinates, into a 3D world location. This of course means that you have an infinite number of solutions composed of all the 3D points along the line formed by joining two points: the optical centre of the camera, and the location of the pixel in the image plane. If you want to know the location of the object in 3D, you need more information.

This extra information can be encoded in many ways and represents depth, which basically gives you the extra bit of information that you need. You already know the direction of where the object is, and with depth you know where in that line the object is in 3D.

This is the ideal case, where you have closed solutions for the location of the object. Have a look at the pinhole camera model for the equations.

Now, in the real world, cameras have non-perfect lenses, which create distorted images. There are some models that you can apply to reduce barrel distortion, and other issues. You need to apply these formulas first to create an image that is closer to the ideal case.

Then, you also have issues with the method for estimating depth, which will depend on the actual method that you're using. You will need to calibrate the camera, and correct the images accordingly, to generate data closer to the ideal case.

There are different methods to model the distortions, and calibrate your cameras, and some of them are not a closed formula but instead they minimise certain error metric to get a good enough solution.

Thanks, samontab. Your answer is really good. I thought the realsense 3D camera had been already calibrated. Can I use the stream calibrate results to deduce the formula that transfroms the pixel point into a 3D point?