From Stereo to Depth: Building Your First Depth Map with OpenCV
One of the main design aims of Robotic Ai Vision (RAiV) is the estimation of depth map. In this post, at last, we are going to estimate depth map of the scene by using OpenCV python library.
Introduction to Depth from Stereo Image
The estimation of depth map is a difficult process; and RAiV is ready for this task out of the box.
The stereo camera pair of RAiV is synchronized and calibrated. The calibration parameters, which are key to estimation of the depth map, are accessible by the user's python code. So, all you have to do is:
- Get the calibration parameters
- Acquire the stereo image pair
- Estimate the depth map
If you would like to rush to the depth estimation with Python you can jump to the chapter "Depth Mapping Made Easy". In the next part, we will take a deeper look at RAiV's design features and the theory behind depth estimation.
Why 65mm Baseline?
In stereo vision, we have two cameras that are horizontally separated from each other. The horizontal separation distance is called baseline. When we capture images from these cameras, we get the same scene from different viewpoints. The captured images will have the same scene with the same objects/features. However, the objects/features will have horizontally different pixel coordinates. The pixel coordinate distance among the same objects/features is called disparity. If an object/feature is close to the camera, this distance on images will be bigger. And If the object/feature is away from the camera, this distance will be smaller. This depth-disparity relation is formulated with the formula below:
In RAiV, we decided to mimic human vision. To achieve this, we designed RAiV's stereo camera pair to have the baseline of 65mm, which is the average baseline length of human eyes (aka. average interpupillary distance).
With respect to the formula above, the selection of this baseline has direct consequences. However, to make this effect more clear, we need another formula to convert the disparity to pixels:
By using the formulas above, we can write formula to calculate disparity (in pixels) from depth (in mm):
The default RAiV parameters are:
| Baseline (mm) | 65 mm |
| Focal Length (mm) | 2.8 mm |
| Pixel Size (mm) | 0.0030 mm (3.0 µm) |
RAiV can have lenses with the following focal lengths: 2.8mm, 3.6mm, 4mm, 6mm, 8mm, 12mm, 16mm and 25mm. Please mention your lens choice during the order.
With all of the information above we can create the following Depth(mm) versus Disparity (pixels) table:
| Depth(mm) | Disparity (pixels) |
| 200 | 303.33 |
| 300 | 202.22 |
| 400 | 151.66 |
| 500 | 121.33 |
| 750 | 80.88 |
| 1000 | 60.64 |
| 2000 | 30.33 |
| 3000 | 20.22 |
| 4000 | 15.16 | 5000 | 12.13 |
The cameras that are used in RAiV has 1600x1300 pixel resolution. In all of our depth estimation example codes, we are scaling the images to 800x650 pixel resolution for speed reasons. So, the with scaled resolution the default RAiV configuration is suitable for estimating depths between 300mm and 3000mm.
Depth Mapping Made Easy
Prepare & Upload the Code
RAiV is ready to estimate depth out of the box. The example code below:
- Initializes the data pipeline interface
- Initializes the stereo depth estimator
- In a loop continuously:
- Gets stereo image pair from data pipeline
- Estimates the depth map
- Sends the depth map to a PC
You can find this example in our Github Repository with all the necessary modules. Please download the example code from the github repository and upload it to RAiV via the web interface.
import qCU_Net
# For accessing data pipeline
from qCU_Data import qCUData
# For Depth Estimation
from StereoDepthEstimator import StereoDepthEstimator
import depthUtils
# For sending data over TCP
import base64
def main():
# Create interface
theQCUData = qCUData()
# Initialize shared memory
if not theQCUData.init():
print("Failed to initialize shared memory")
return
# Initialize OpenCV's depth estimation algorithms
depthScale = 0.5
depthMinMM = 300
depthMaxMM = 600
depthEstimator = StereoDepthEstimator(
scale_factor=depthScale,
# The depth values are in milimeters ("mm")
min_depth=depthMinMM,
max_depth=depthMaxMM,
)
# Enter object detection loop
try:
while True:
# Get Ai data
ai_data = theQCUData.getDataAi()
if ai_data:
if 'error' in ai_data:
print(f"Error occurred: {ai_data['error']}")
else:
# NOTE: 1. We are processing AI output images. Stereo camera output can also be processed
# 2. Due to the stereo camera setup the output depth map size is 726x585
# Process Ai processor output images
depthMap = depthUtils.getDepthFromStereo(ai_data, memConfig, depthEstimator)
# Get colored depth map
coloredDepthMap = depthEstimator.depth_to_colormap(depthMap)
coloredDepthMap_b64 = base64.b64encode(coloredDepthMap).decode('utf-8')
# Build payload for the transmission
payload = {
"width": 726,
"height": 585,
"depth": coloredDepthMap_b64,
}
qCU_Net.send_data_to_server("192.168.10.2", 12345, payload)
else:
# Wait to avoid high CPU utilization
time.sleep(0.1)
except Exception as e:
print(f"An error occurred: {e}")
finally:
print("Cleanup completed")
if __name__ == "__main__":
main()
Live Action: Feed the Data Pipeline
Now to trigger the data pipeline, please press the "Snapshot" button. As soon as the image is displayed on the web interface, PC side receives the depth map and displays it.
What is Next?
To begin 3D object positioning:
3D Object Detection: Bringing Depth to YOLO on the EdgeCheck our Python SDK:
RAiV Python SDKCheck our Github Repository For Sample Codes:
Our Github Repository