Optical 3D Acquisition Methods: A Comprehensive Guide [Part 2]
Machine Learning Engineer
No items found.
Subscribe to newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Share this post
This blog post is the second part of our on-going blog series about 3D computer vision. If you haven’t read the first blog post, you can check it out here. This second article (Part 2) provides an overview of 3D optical acquisition methods. We cover the differences between various types of sensors and how they can benefit specific use-cases. We also cover different 3D data formats and storage options.
We’ve seen in the first blog post of this series how the ability to perceive and interpret the three-dimensional structure of the surrounding world is becoming increasingly important in a wide range of industries. But how can machines get this extra layer of information? There’s a wide range of optical 3D acquisition methods that enable them to capture (or estimate) depth and spatial information about their environments.
Figure 1 presents different acquisition techniques categorized into active and passive methods. Active methods require an external light source that emits a signal and measures the reflected or returned signal, whereas passive methods do not.
Following the structure of the pipeline presented in the previous post (also seen in Figure 2), in this second part we will be focusing on the capturing and storing of 3D data. We will specifically focus on four prominent methods, namely Stereo Vision, Structured Light, Time of Flight, and LiDAR. For each method, we will explore the operating principle, advantages, disadvantages, and real-world use cases where the technique excels. At last, we conclude by providing you with a decision map to help choose the most appropriate method based on various factors, as well as a brief discussion of future trends in 3D optical acquisition.
By providing a thorough understanding of these optical 3D acquisition methods, we aim to equip readers with the knowledge needed to make informed decisions when selecting the right technique for a specific application or industry.
TLDR: For those short on time or just to lazy to read the full blog post, at the end we provide a decision map that shows when to use each type of sensor based on different applications requirements and external factors. It summarizes the article in a short, compact and visual way.
Stereo Vision, also known as stereoscopic vision, is a passive 3D acquisition method that mimics the way humans perceive depth. It utilizes two or more cameras, positioned at a certain distance apart (known as the baseline), to capture images of the same scene from slightly different viewpoints. These images, called stereo pairs, are then processed by a stereo matching algorithm to identify corresponding points (features) in both images. The disparity between these points is calculated, which is the difference in their horizontal positions in the left and right images.¹
By leveraging the geometry of the camera setup and using triangulation, the depth (or 3D coordinates) of each point in the scene can be determined. The details of how depth is estimated is out of scope for this blog post, however, for those interested in a more mathematical treatment of the topic, you may want to check the following material.
Stereo vision sensors generate two primary types of data: stereo pairs (left and right images) and depth maps (disparity maps). By combining the depth information from the depth map with the original 2D images, a 3D representation of the scene can be reconstructed.
Valuable characteristics of Stereo Vision systems are:
Flexibility: Stereo Vision systems can be relatively simple to set up and can work well in a wide range of lighting conditions.
Real-time Capabilities: They can process depth information in real-time, which is useful for applications requiring immediate feedback or rapid decision-making.
Cost-Effectiveness: Compared to active methods like LiDAR or Structured Light, Stereo Vision systems tend to be more affordable, making them an attractive option for budget-conscious projects.
On the other hand, these usually suffer from:
Dependence on Texture: Stereo Vision relies on identifying corresponding features in multiple images. In scenes with low texture or repetitive patterns, this can be challenging, resulting in inaccurate or missing depth information. This problem is known as the stereo correspondence problem.
Sensitivity to Lighting Changes and Low-Light Conditions: This method relies on identifying corresponding features in multiple images. If the images are not clear due to the rapid or significant changes in illumination or just overall low-light, it will affect the accuracy of the depth estimation.
Limited Range: Depends on the baseline distance (between cameras), the resolution of the cameras and the algorithms used to estimate depth, but range is usually limited to a few meters. These sensors can usually accurately scan object up to the 5 meter range but accuracy of the measurements significantly decrease with range.
While the previously discussed drawbacks pertain to passive Stereo Vision, Active Stereo Vision techniques utilize a light source, such as a laser or structured light, to illuminate the scene being captured. This approach enhances stereo matching and enables the method to perform well in low-light settings. However, it comes at a higher cost due to the requirement of an extra component — the projector.
Stereo Vision is a popular acquisition method, mainly due to its flexibility and low cost. Real-world applications of stereoscopic vision are numerous and can be seen in:
Autonomous Navigation: Stereo Vision allows robots and autonomous vehicles to interact with their surrounding environment and perform obstacle detection and avoidance, thanks to its real-time capabilities and suitability for outdoor environments.²
Spatial Analytics: These sensors can be used to monitor spaces with a higher degree of spatial awareness (compared to traditional 2D computer vision), allowing solutions that effectively model spatial relationships between people, places and objects.³
In summary, Stereo Vision is a versatile and cost-effective 3D acquisition method suitable for a range of applications, particularly when real-time depth information is required. However, its dependence on texture and sensitivity to lighting changes can pose challenges in certain scenarios.
Structured light is an active optical 3D acquisition method that involves projecting a known pattern (often a series of stripes or a grid) onto the scene or object being scanned. The deformation of the projected pattern on the object’s surface is captured by a camera placed at a known position and orientation relative to the projector. The relationship between the projector, camera, and the deformation of the pattern allows for the extraction of depth information.⁴
The data generated by structured light systems include the captured 2D image with the deformed pattern and the resulting 3D point cloud or depth map, which represent the 3D structure of the scanned object or scene. Depending on the characteristics of the projected/encoded pattern, different algorithms can be used to decode the deformed pattern and compute the depth information.
Structured Light setups benefit from:
High accuracy and resolution: Structured Light systems can produce highly accurate and detailed 3D point clouds or depth maps, making them suitable for applications requiring precise measurements.
Robustness to different lighting conditions: As an active method, structured light can perform well in a range of lighting conditions, as it relies on its own light source. However, direct sunlight or bright indirect sunlight may interfere with the light source, thus these sensors usually work best in indoor settings.
Problems associated with these setups include:
Limited range: Structured Light systems typically have a limited working range and are more effective for close-range scanning or small objects. The optimal working distance is usually from 0.5–1.5 meters but some sensors can go up to 3 meters.
Sensitivity to surface properties: The performance of Structured Light systems can be affected by the surface properties of the scanned object, such as reflectivity or transparency, which can distort the projected pattern and cause noise or artifacts in the reconstructed 3D data.
Occlusions and shadows: Since the projector and the camera are placed at different locations, structured light systems can struggle with occlusions and shadows, leading to incomplete or inaccurate 3D representations.
Calibration: Typically Structured Light sensors require careful calibration and alignment.
Real world situations where this acquisition method thrives include:
Quality control and inspection: Structured light is widely used in manufacturing and industrial settings for quality control and inspection of components, ensuring accurate measurements and detecting defects.⁵
3D scanning for digital art and animation: Structured light systems are also employed in the creation of detailed 3D models for digital art, animation, and visual effects.
Dental and medical applications: In dentistry, structured light scanners are used to create accurate 3D models of teeth and jaw structures, while in medical applications, they can be utilized for body scanning and generating customized prosthetics.⁶
Time of Flight (ToF)
Time of Flight (ToF) sensors are an active optical 3D acquisition method that measures the time it takes for emitted light, usually infrared (IR) light, to travel from the sensor to the object and back. The ToF sensor emits light pulses (direct ToF sensors) or continuous waves (indirect ToF sensors), which are reflected by the object’s surface and then detected by the sensor. The sensor’s imaging lens collects the reflected light from the scene and converts it into depth data on each pixel of the array. The depth (or distance to the object) is calculated by knowing the speed of light and measuring the round-trip time of the light. This depth map is a 2D representation of the 3D structure of the scene, and it can be combined with additional data, such as RGB images from a separate camera, to create a more complete 3D representation.⁷
Good properties of ToF sensors are:
Real-time performance: ToF sensors can provide real-time depth information, making them suitable for applications requiring fast and dynamic 3D data acquisition, such as robotics or augmented reality. Some ToF sensors can function at up to 60 fps.
Wider range capabilities: ToF methods typically have longer range measurement capabilities compared to structured light or stereo vision, making them useful for navigation and object detection. Typically, these sensors can capture up to distances of 20 meters.
Robustness to environmental conditions: As an active method, ToF sensors can perform well in various lighting conditions, since they rely on their own light source.
Simplicity: ToF systems are generally simpler in design and implementation compared to other 3D acquisition methods, as they do not require complex matching algorithms or multiple cameras.
Looking at the down-side of these sensors, we have:
Limited accuracy and resolution: ToF sensors typically have lower accuracy and resolution compared to other 3D acquisition methods, such as structured light or stereo vision, which may not be suitable for applications requiring highly detailed 3D models.
Sensitivity to surface properties: The performance of ToF sensors can be affected by the object’s surface properties, such as reflectivity, color, or transparency, leading to inaccurate depth measurements.
Time of Flight sensors are commonly seen in:
Robotics and automation: Used in systems for tasks such as obstacle detection, navigation, and collision avoidance.⁸
Gesture recognition and human-computer interaction: ToF sensors are employed in devices like gaming consoles and smartphones for gesture recognition and interactive experiences.
Augmented and Virtual Reality: ToF sensors are utilized in augmented and virtual reality systems to enable real-time tracking of objects and environments, enhancing the user experience.⁹
LiDAR (Light Detection and Ranging) operates on the time-of-flight (ToF) principle, similar to ToF sensors. This means that it determines distance by calculating the round-trip time of light and the speed of light. However, LiDAR generally uses multiple laser beams (high-power light sources) and a rotating or oscillating mechanism to cover a larger area or achieve a full 360-degree view of the surroundings. The laser beams are typically aimed in a specific direction and angle, and the distance is measured for those coordinates. Because of this, the resulting data is a point cloud (and not a depth map) and a direct representation of the environment, providing accurate spatial information.
The data generated by LiDAR sensors include the raw timing and intensity information for each laser pulse and the resulting 3D point cloud that represents the 3D structure of the scanned environment. The point cloud contains the X, Y, and Z coordinates of each point in the 3D space, and in some cases, additional information such as intensity or color can be included.
The benefits of LiDAR include:
High accuracy and resolution: LiDAR sensors can produce highly accurate and detailed 3D point clouds, making them suitable for applications requiring precise measurements.
Long-range capabilities: LiDAR sensors can operate effectively at longer ranges, sometimes up to kilometer range, depending on the specific sensor and its configuration.
Robustness to environmental conditions: As an active method, LiDAR can perform well in various lighting conditions and is generally less affected by environmental factors such as fog, rain, or dust compared to other 3D acquisition methods.
Less desirable properties of these sensors are:
Cost and complexity: LiDAR systems can be more expensive and complex compared to other 3D acquisition methods, especially those using rotating or oscillating mechanisms for scanning.
Limited vertical field of view: Some LiDAR sensors may have a limited vertical field of view, which can result in incomplete or inaccurate 3D reconstructions in certain scenarios.
Autonomous vehicles: LiDAR sensors are widely used in autonomous vehicles for tasks such as mapping, obstacle detection, and navigation.¹¹
3D mapping and surveying: LiDAR sensors are employed in aerial, terrestrial, and mobile mapping systems for surveying and creating accurate 3D models of large areas, such as urban environments, forests, or infrastructure.
Agriculture and forestry: LiDAR sensors are utilized in precision agriculture and forestry applications to monitor crop health, estimate biomass, and assess forest resources.¹²
Now that we have looked into the different types of 3D acquisition methods. It is also important to think about the type of data these sensors generate and the best way to store it.
3D Data and Storage
The data collected by these sensors typically comes in one of these forms: depth maps or point clouds.
Depth maps: are commonly generated by Stereo Vision, Structured Light and ToF sensors. Because they store distance information in a 2D grid, it can be limiting in terms of data representation and analysis.
Point Clouds: represent 3D data as a collection of individual points with x, y, and z coordinates, which allows for more flexibility in visualization and manipulation.
From Depth Maps to Point Clouds
To generate a point cloud from a 2D depth map, the depth information (Z coordinate) of each pixel in the depth map is combined with the corresponding spatial information (X and Y coordinates) of the pixel in the sensor’s field of view. This process is called “back-projection” or “unprojection.”
The back-projection process involves applying the intrinsic and extrinsic parameters of the sensor, such as focal length, sensor resolution, and sensor pose, to convert the 2D depth map information into 3D coordinates. This process is usually implemented in software and is available in various open-source libraries like Point Cloud Library (PCL), Open3D, and OpenCV.
Point Cloud Storage Formats
There are two main categories of formats for storing Point Cloud data: ASCII and LAS/LAZ.¹³
ASCII formats use plain text files where the X, Y, and Z coordinates of each point are separated by a character, such as a space or a comma. These files may also include a table header with metadata and additional information for each point, such as intensity or amplitude. Common file extensions for ASCII files include TXT, XYZ, PTS, and PTX. OBJ files can also be used to store point cloud data, although this method can be inefficient for large datasets (OBJ is intended to store geometric properties of objects and will include unnecessary amounts of information for point cloud data).
In contrast, LAS/LAZ formats are binary file formats specifically designed for lidar data storage and exchange.
Given that this data is unstructured, it is common to store it in a Data Lake either on the cloud or on-premise, depending on your set up. Cloud-based storage services like Google Cloud Storage, Amazon S3 and Azure Blob Storage can be used to store and manage large point cloud datasets.
In this blog post, we have explored various optical 3D acquisition methods, including Stereo Vision, Structured Light, Time of Flight, and LiDAR. Each technique has its unique operating principles, advantages, and disadvantages, making them suitable for different applications and scenarios. The decision map below (Figure 9) provides an easy way to choose the most appropriate sensor to use, given a set of common business or practical requirements. Keep in mind that this decision map is a general guideline, and the best choice for a specific application may depend on various other factors.
In addition to the methods discussed, it is also worth noting the emergence of hybrid systems that combine multiple 3D acquisition techniques to overcome limitations and improve overall performance. Advancements in hardware and software will improve real-time processing of 3D data, enabling faster and more efficient analysis of scenes. Integration of 3D sensing technology and computer vision with other technologies such as augmented reality, virtual reality, and robotics, will create new possibilities for interaction and automation. And of course, as machine learning techniques continue to improve, we can expect to see more accurate and robust algorithms that will facilitate 3D reconstruction of complex environments, as well as object detection and tracking with more spatial awareness.
We hope this blog post has provided you with valuable insights into the world of optical 3D acquisition methods and will help you make informed decisions when selecting the appropriate technique for your needs.
 — Sanja Fidler. Intro to Image Understanding: Depth from Stereo. University of Toronto — CSC420, 2021.
 — D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Madison, WI, USA, 2003, pp. I-I, doi: 10.1109/CVPR.2003.1211354.