RAiV - From Stereo to Depth

One of the main design aims of Robotic Ai Vision (RAiV) is the estimation of depth map. In this post, at last, we are going to estimate depth map of the scene by using OpenCV python library.

Introduction to Depth from Stereo Image

The estimation of depth map is a difficult process; and RAiV is ready for this task out of the box.

The stereo camera pair of RAiV is synchronized and calibrated. The calibration parameters, which are key to estimation of the depth map, are accessible by the user's python code. So, all you have to do is:

Get the calibration parameters
Acquire the stereo image pair
Estimate the depth map

If you would like to rush to the depth estimation with Python you can jump to the chapter "Depth Mapping Made Easy". In the next part, we will take a deeper look at RAiV's design features and the theory behind depth estimation.

Why 65mm Baseline?

In stereo vision, we have two cameras that are horizontally separated from each other. The horizontal separation distance is called baseline. When we capture images from these cameras, we get the same scene from different viewpoints. The captured images will have the same scene with the same objects/features. However, the objects/features will have horizontally different pixel coordinates. The pixel coordinate distance among the same objects/features is called disparity. If an object/feature is close to the camera, this distance on images will be bigger. And If the object/feature is away from the camera, this distance will be smaller. This depth-disparity relation is formulated with the formula below:

$$ Z = \frac{B \times f}{d} $$ $$ Z: Depth\:(mm),\:f: Focal\:Length\:(mm),\;B: Baseline\:(mm),\;d: Disparity\:(mm) $$

Depth from Disparity Formula.

In RAiV, we decided to mimic human vision. To achieve this, we designed RAiV's stereo camera pair to have the baseline of 65mm, which is the average baseline length of human eyes (aka. average interpupillary distance).

With respect to the formula above, the selection of this baseline has direct consequences. However, to make this effect more clear, we need another formula to convert the disparity to pixels:

$$ f_{pixels} = \frac{f_{mm}}{pixelsize} $$ $$ f_{pixels}: Focal\:Length\:(pixels),\;f_{mm}: Focal\:Length\:(mm),\;pixelsize: Physical\:Size\:of\:One\:Pixel\:(mm) $$

Focal Length Conversion Formula.

By using the formulas above, we can write formula to calculate disparity (in pixels) from depth (in mm):

$$ d_{pixels} = \frac{B \times f_{mm}}{Z \times pixelsize} $$

Disparity(in pixels) from Depth (in mm) Formula.

The default RAiV parameters are:

Baseline (mm)	65 mm
Focal Length (mm)	2.8 mm
Pixel Size (mm)	0.0030 mm (3.0 µm)

RAiV can have lenses with the following focal lengths: 2.8mm, 3.6mm, 4mm, 6mm, 8mm, 12mm, 16mm and 25mm. Please mention your lens choice during the order.

With all of the information above we can create the following Depth(mm) versus Disparity (pixels) table:

Depth(mm)	Disparity (pixels)
200	303.33
300	202.22
400	151.66
500	121.33
750	80.88
1000	60.64
2000	30.33
3000	20.22
4000	15.16
5000	12.13

The cameras that are used in RAiV has 1600x1300 pixel resolution. In all of our depth estimation example codes, we are scaling the images to 800x650 pixel resolution for speed reasons. So, the with scaled resolution the default RAiV configuration is suitable for estimating depths between 300mm and 3000mm.

Depth Mapping Made Easy

Prepare & Upload the Code

RAiV is ready to estimate depth out of the box. The example code below:

Initializes the data pipeline interface
Initializes the stereo depth estimator
In a loop continuously:
- Gets stereo image pair from data pipeline
- Estimates the depth map
- Sends the depth map to a PC

You can find this example in our Github Repository with all the necessary modules. Please download the example code from the github repository and upload it to RAiV via the web interface.

import qCU_Net

# For accessing data pipeline
from qCU_Data import qCUData

# For Depth Estimation
from StereoDepthEstimator import StereoDepthEstimator
import depthUtils

# For sending data over TCP
import base64


def main():
    # Create interface
    theQCUData = qCUData()

    # Initialize shared memory
    if not theQCUData.init():
        print("Failed to initialize shared memory")
        return

    # Initialize OpenCV's depth estimation algorithms
    depthScale = 0.5
    depthMinMM = 300
    depthMaxMM = 600
    depthEstimator = StereoDepthEstimator(
        scale_factor=depthScale,
        # The depth values are in milimeters ("mm")
        min_depth=depthMinMM,
        max_depth=depthMaxMM,
    )


    # Enter object detection loop
    try:
        while True:
            # Get Ai data
            ai_data = theQCUData.getDataAi()
            if ai_data:
                if 'error' in ai_data:
                    print(f"Error occurred: {ai_data['error']}")
                else:
                    # NOTE: 1. We are processing AI output images. Stereo camera output can also be processed
                    #       2. Due to the stereo camera setup the output depth map size is 726x585

                    # Process Ai processor output images
                    depthMap = depthUtils.getDepthFromStereo(ai_data, memConfig, depthEstimator)

                    # Get colored depth map
                    coloredDepthMap = depthEstimator.depth_to_colormap(depthMap)
                    coloredDepthMap_b64 = base64.b64encode(coloredDepthMap).decode('utf-8')
                    # Build payload for the transmission
                    payload = {
                        "width": 726,
                        "height": 585,
                        "depth": coloredDepthMap_b64,
                    }
                    qCU_Net.send_data_to_server("192.168.10.2", 12345, payload)
            else:
                # Wait to avoid high CPU utilization
                time.sleep(0.1)
                    
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        print("Cleanup completed")


if __name__ == "__main__":
    main()

Live Action: Feed the Data Pipeline

Now to trigger the data pipeline, please press the "Snapshot" button. As soon as the image is displayed on the web interface, PC side receives the depth map and displays it.

Depth Estimation with RAiV

What is Next?

To begin 3D object positioning:

3D Object Detection: Bringing Depth to YOLO on the Edge

Check our Python SDK:

RAiV Python SDK

Check our Github Repository For Sample Codes:

Our Github Repository

From Stereo to Depth: Building Your First Depth Map with OpenCV