Re: Coordinate system used for the color and the depth streams on SR300

KK7 · ‎08-24-2017

I'm using SR300 and trying to understand the output from `rs_extrinsics` correctly.

When I get the transformation between two streams such as the color and the depth by using `get_extrinsics_to` or `get_device_extrinsics`, which coordinate system is used for these streams?

Image coordinate system like this (https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/manuals_clip0138_zoom69.png https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/manuals_clip0138_zoom69.png ) where Y is down

or this default coordinate system (https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/cameracoordinatesystemuserfacing_zoom63.png https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/cameracoordinatesystemuserfacing_zoom63.png) where Y is up?

The transformation I got from the color to the depth is as follows, so I guess it uses the default coordinate system, but I want to make sure my understanding is correct or not.

rotation [[ 0.99999791 0.00196758 -0.0005486 ]

[-0.001966 0.99999398 0.00285708]

[ 0.00055421 -0.00285599 0.99999577]]

translation [[-0.02549983]

[-0.00113199]

[-0.00392998]]

idata · ‎08-24-2017

Hello kazoo_kmt,

Thanks for reaching out!

Please let us check this, we'll get back to you as soon as we can.

Pedro M.

idata · ‎08-28-2017

The coordinates systems used by librealsense (which is where the call to rs_extrinsics comes from) are described here: https://github.com/IntelRealSense/librealsense/blob/master/doc/projection.md# extrinsic-camera-parameters

The following is an extract from that document:

"...Pixel Coordinates

Each stream of images provided by librealsense is associated with a separate 2D coordinate space, specified in pixels, with the coordinate [0,0] referring to the center of the top left pixel in the image, and [w-1,h-1] referring to the center of the bottom right pixel in an image containing exactly w columns and h rows. That is, from the perspective of the camera, the x-axis points to the right and the y-axis points down. Coordinates within this space are referred to as "pixel coordinates", and are used to index into images to find the content of particular pixels.

Point Coordinates

Each stream of images provided by librealsense is also associated with a separate 3D coordinate space, specified in meters, with the coordinate [0,0,0] referring to the center of the physical imager. Within this space, the positive x-axis points to the right, the positive y-axis points down, and the positive z-axis points forward. Coordinates within this space are referred to as "points", and are used to describe locations within 3D space that might be visible within a particular image.

Extrinsic Camera Parameters

The 3D coordinate systems of each stream may in general be distinct. For instance, it is common for depth to be generated from one or more infrared imagers, while the color stream is provided by a separate color imager. The relationship between the separate 3D coordinate systems of separate streams is described by their extrinsic parameters, contained in the rs_extrinsics struct. The basic set of assumptions is described below:

1. Imagers may be in separate locations, but are rigidly mounted on the same physical device

• The translation field contains the 3D translation between the imager's physical positions, specified in meters

2. Imagers may be oriented differently, but are rigidly mounted on the same physical device

• The rotation field contains a 3x3 orthonormal rotation matrix between the imager's physical orientations

3. All 3D coordinate systems are specified in meters

• There is no need for any sort of scaling in the transformation between two coordinate systems

4. All coordinate systems are right handed and have an orthogonal basis

• There is no need for any sort of mirroring/skewing in the transformation between two coordinate systems..."

Hopefully this document will help clear your doubts.

Pedro M.

KK7 · ‎08-28-2017

Hi Pedro, thank you for replying. Unfortunately, the answer is still unclear to me. Based on the link you shared, it says

The translation field contains the 3D translation between the imager's physical positions, specified in meters

However, I don't know how you defined `imager's physical positions`. It doesn't say either `Pixel coordinates` nor `Point coordinates`.

Based on what I got from the output of `rs_extrinsics`, the coordinates should be the positive x-axis points to the left, the positive y-axis points up, and the positive z-axis points forward, but please confirm or argue this.

idata · ‎08-29-2017

According to https://github.com/IntelRealSense/librealsense/blob/master/doc/projection.md# extrinsic-camera-parameters, these are the ways pixel coordinates and point coordinates work:

Pixel coordinates:

[0, 0] => Top left pixel of the image.

[w-1, h-1] => Bottom right pixel of the image (in an image containing exactly w columns and h rows).

Point coordinates:

[0, 0, 0] => Center of the physical imager.

The positive x-axis points to the right, the positive y-axis points down, and the positive z-axis points forward.

Let me know if that helps, I'll be more than glad to answer any question you might have.

Pedro M.

KK7 · ‎08-29-2017

I hope you could answer my question directly. I meant that the document said `imager's physical positions`, but it's unclear whether it's based on which coordinates.

Also, if we assume it's based on the point coordinates, the output from `rs_extrinsics` doesn't make sense because the Depth/IR sensor is located on +25mm right from the Color sensor, though the output shows "-0.02549983" which means 25mm left. That's why I'm thinking the coordinate system used for rs_extrinsics is not `Point coordinates` (and of course not `Pixel coordinates`).

idata · ‎08-30-2017

Thanks a lot for sharing this information with us. Please let us analyze it to see if we can determine what might be happening. If we are able to find anything useful, we'll make sure to share it with you in this thread.

Pedro M.

idata · ‎09-01-2017

Hi kazoo_kmt,

We have an update for your case.

The reason the value is -.025 is because the reference point is the depth sensor (also known as imager) which is to the right of the RGB sensor. The RGB sensor is to the left, negative x direction, of the depth sensor. The coordinate system is the point system based on meters.

I hope this helps.

Pedro M.

KK7 · ‎09-01-2017

You could try but the output is from color to depth, not from depth to color. So, I believe the reference point in this case should be the color.