Mapping units of depth image and meters

is it possible to get the scale for depth value (pixel value) in order to convert it to meters
for example: pixel with value of 62 * depth_scale => z value in real world coordinate

i am using c# and java wrapper

thank you

The 16-bit depth values are expressed in mm. So a value of 1600 means the pixel is 1.6m away from the sensor (or 1600mm).

thanks for your reply
but if i have 8-bit depth values?

The depth data from the camera is delivered as 16-bit values. If you only have 8-bit values then you’ve converted the data yourself (or used a library that converted it automatically). Without knowing how the data was converted (e.g. taking just the high or low 8 bits, linear re-scaling from the range [0,65535] to [0,255]) it’s impossible to suggest anything concrete.

hey, all i have to do is to multiply by 8 the depth value and i got z point in cm units
how can i calculate x and y points in the real world?

Once you have the distance to the object, figuring out the XYZ coordinates is just a bit of fairly simple trigonometry. There may be a coordinate mapper class in the new SDK; I haven’t played around with it much yet (and Orbbec is kind of light on the API documentation). But here’s how to do it manually**


First off, we need two constants: the vertical and horizontal fields of view of the sensor. Conveniently these are given on the product pages:

  • 60° horizontal
  • 49.5° vertical

From that and the image size we can get the angular size of each pixel:

  • 60° / 640 = 0.09375 degrees per pixel horizontally = phiX
  • 49.5° / 480 = 0.103125 degrees per pixel vertically = phiY

(If you don’t need super-fine accuracy, you could probably round both of those to 0.1 degree per pixel, and it wouldn’t throw things off too badly.)

Now is where the math comes in. I’m going to assume that (0,0,0) is the camera’s position itself, and that the pixel we’re measuring to is at position (i,j) in the image with a value of d (in known units, whether that’s mm, cm, m, inches, furlongs, or whatever). I’m also going to assume that the image is 640x480 pixels. Finally, we define X and the horizontal distance to right of the camera, Y is the vertical distance above the camera, and Z is the distance measured straight out from the front of the camera. As Jackson points out below in this thread, the 16-bit value of the pixel at (i,j) is the distance Z in mm.

Calculating X:
First, we need to get the angle to the pixel. That’s simply theta = phiX * (i-320); the degrees per pixel multiplied by the number of pixels from centre. This forms a right-angle triangle (drawn in crude ASCII art as a top-down view)

    X
.---------. <-- the object we're measuring to in the real world
|        /
|Z     /
|    / d
|**/ <-- theta is measured here
|/
. <-- the camera position

We know that tan(theta) = X/Z. Therefore we can calculate X by simply re-arranging the equation:
X = tan(theta) * Z

Calculating Y:
Exactly the same as above, only using phiY, and (j-240) instead:
theta = phiY * (j-240)
Y = tan(theta) * Z

EDIT, May 4:Updated the math to use Z in the calculations instead of d; I had previously assume the sensor reported the straight-line distance to the object d, not the perpendicular distance from the camera plane Z. This has been corrected.

1 Like

sorry for the late response. thank you very much for your effort, i will try it and let you know

hey, sorry again for the late response.
i am a bit confused, pixel depth value represents z value of the pixel or d from the sensor? and how did you calculate 60° / 640 = 0.09375

The Astra has a horizontal field of view of 60°. You can calculate the angular size of each pixel by dividing the field of view by the number of pixels in the image. So, for example, if the image is 640 pixels wide, then each pixel represents 60°/640 = 0.09375°/pixel.

EDIT: removed incorrect information that was corrected by Jackson, below.

Hi av2016, chrisib,

The pixel value represents the Z value, which is the distance from the plane of the sensor, as if the sensor was an infinite wall. If there is a need for the straight line distance to the object, trigonometry calculation is required.

1 Like

Ah! Good to know. I’ll revise my math from above to correct for my incorrect assumption.

Previous posts edited to remove the incorrect information. The math to calculate X and Y has been updated to use tan(theta) instead of sin(theta), as we’re working with Z as a known value instead of d.

Thanks for the correction, Jackson!

Bro this is not working. This method is not up to mark. I tried to implement but getting some random values.

There must be a way to use the CoordinateMapper but I find no documentation for the same in unity.

Astra.CoordinateMapper mapper = depthStream.CoordinateMapper();

Vector3D worldPoint = mapper.MapDepthPointToWorldSpace(new Vector3D(640 / 2.0f, 480 / 2.0f, 1500));

giving some kind of error like

"
‘DepthStream’ does not contain a definition for ‘CoordinateMapper’ and no accessible extension method ‘CoordinateMapper’ accepting a first argument of type ‘DepthStream’ could be found (are you missing a using directive or an assembly reference?)

"

how to solve this issue?

float phiX = (float)0.1875;
float phiY = (float)0.20625;
float thetaX = phiX * (p - 160);
float thetaY = phiY * (q - 120);

float x3D = (float)((depthVal) * (Math.Tan(thetaX)));
float y3D = (float)((depthVal) * (Math.Tan(thetaY)));

but this is not working.

Here’s a nicer example, using an actual screenshot. Let’s calculate the (X,Y, Z) coordinate of my fingertip, from this screenshot:

Ignore the white text at the bottom; it’s not used in this example

First, let’s go with what we currently know:

Z: the distance from the camera plane to my finger. This is the pixel value in the depth frame, and for this example let’s say it’s 1400. This means the Z coordinate is 1400mm.

x: the horizontal distance from the center of the frame to my finger in the depth frame, measured in pixels. In this example it’s 206 pixels.

y: the vertical distance from the center of the frame to my finger in the depth frame, measured in pixels. In this example it’s 198 pixels.

FoV_w: the horizontal field of view of the camera. This is provided by Orbbec, and is 60 degrees.

FoV_v: the vertical field of view of the camera. This is provided by Orbbec, and is 49.5 degrees.

W: the width of the frame, 1280

H: the height of the frame, 960

O: the origin for our calculations. It’s the center of the frame, and in this example is (640, 480), because the frame is 1280x960.

We have all the information we need, so let’s first calculate X, the real-world horizontal distance from the center of the camera’s field of view to my finger, in mm.

The horizontal angular distance from my finger to the center of the camera’s field of view, theta_w, is calculated by:
theta_w = FoV_w / W * x = 9.656Âș

Using some simple trigonometry we know that tan(theta_w) = X/Z. Therefore:
X = Z * tan(theta_w) = 238.206mm

Using the exact same process, but substituting for the vertical measurements we get:
theta_v = FoV_v / H * y = 10.209Âș
Y = Z * tan(theta_v) = 252.136mm

Rounding to the nearest mm, this gives us
(X, Y, Z) = (238, 252, 1400)

Converting to meters this means my finger is:

  • 1.4m away from the camera
  • 0.25m above the camera
  • 0.24m to the right of the camera (left/right may vary depending on whether you have mirroring on or not)

@pkr97, in your case, where did you get phiX, phiY, thetaX, and thetaY from? When you say you get “random values” how random are you getting?

Remember that the Astra’s depth stream may have holes, so you should ignore pixels where the Z value is 0. Similarly, you should expect some shimmer, as the depth values change from frame-to-frame within a few mm.

As for the CoordinateMapper class, where did you read that this class should exist? Can you post a link to the relevant documentation?

Thanks for the calculation and other stuff. I’m gald to have received a quick response.

I saw that coordinateMapper class in c++ as well as Unity examples. There is implementation of coordinate mapper class in c++ but not in Unity.

I’m working in Unity.

The job is to get depth as well as 3D coordinate of an object in the scene that I’m able to detect.

can you please tell me about getting depth of a particular pixel position in unity?

I used indexing method but that’s also not working.

// just to clear the order of operation ambiguity.

Z = 1400mm

theta_w = (FoV_w / W) * x = (60Âș / 1280) * 206 = 9.656Âș
X = Z * tan(theta_w) = Z * tan(9.656Âș) = 238.206mm

theta_v = (FoV_v / H) * y = (49.5Âș / 960) * 198 = 10.209Âș
Y = Z * tan(theta_v) = Z * tan(10.209Âș) = 252.136mm

(X, Y, Z) = (238, 252, 1400) // Rounded values

I calculated phiX, phiY using

The last thing that I need is how to get depth value from _depthFrameData?

here the depth values are stored in _depthFrameData.

where, _depthFrameData = new short[640 * 480];

frame.CopyData(ref _depthFrameData);

Ofcourse I need to use indexing but what way of indexing is correct?

I use, index = (x * frame.Width) + y

Thanks

To get the depth value from a given pixel, you just need to read the value of that pixel in the DepthFrame. The easiest way to do that is probably to copy the raw depth data into a short[]. This simple class does that for you, and has a GetRangeAt(row,col) method which will return the Z value of the given pixel in mm.

public class DepthFrameWrapper {
    public DepthFrameWrapper(DepthFrame frame) {
        DepthData = new short[frame.ByteLength/2];
        frame.CopyData(ref DepthData);
        Height = frame.Height;
        Width = frame.Width;
    }

    private short[] DepthData {get; set;}
    public int Height { get; private set;}
    public int Width { get; private set;}

    // if you use x/y instead of row/column coordinates,
    // then column = x, row = y
    public short GetRangeAt(int row, int column) {
        return DepthData[row * Width + column];
    }
}

Hey @chrisib
I tried to implement according to your advice and following is the code.

/* ####################### */

                        _depthFrameData = new short[frame.ByteLength/2]; // 16bit image
                        frame.CopyData(ref _depthFrameData);

                        cx, cy = object position in RGB image

                        Width = frame.Width;
                        Height = frame.Height;
                        double FoV_W = 60.0;
                        double FoV_H = 49.5;

                        short depthVal = _depthFrameData [ cy *  Width + cx ];

                        double theta_W = (FoV_W / Width) * (cx - Width / 2);
                        double theta_H = (FoV_H / Height) * (cy - Height / 2);

                        //print(theta_W);
                        //print(theta_H);

                        double z3D = (double)depthVal;
                        double x3D = (double)(z3D * Math.Tan(theta_W));
                        double y3D = (double)(z3D * Math.Tan(theta_H));

                        print("x: "+x3D +", y: "+ y3D+", z: "+ z3D);

/* ####################### */

With the above code I think I’m able to get depth value (depthVal, i.e. z3D.
After applying formula for x3D & y3D I’m unable to get exact position. The coordinates are neither stable not acceptable.

The camera is placed 1.5m from the wall and the approximate height of camera is 40cm.

With this setup i’m getting values like
x: -897.447922630559, y: -468.100220860615, z: 595

where as the approximate values are +/-150mm, +/-150mm, 600mm.

What could be wrong? I believe calculation formula is OK!

Thanks


1 Like

Hi Chrisib,
I would like to know that the joint.worldposition we get from Orbbec is completely in mm or x and y in pixels ?