Jump to content

Structured light

From VR & AR Wiki

Structured light is a depth sensing and 3D-scanning technique in which a known pattern of light, such as stripes, a grid, or a dense field of dots, is projected onto a scene, and the way the camera sees that pattern deform across surfaces is used to compute the distance to each point.[1][2] Because the projected pattern is fixed and known in advance, the projector and camera form two corners of a triangle whose third corner lies on the surface, so depth at every illuminated point follows from triangulation.[3] In virtual reality and augmented reality, structured light has been one of the foundational ways to give a device a three-dimensional model of faces, objects, and rooms rather than a flat camera image, and it powered two of the most influential consumer depth sensors of the 2010s: the original Microsoft Kinect and Apple's TrueDepth camera behind Face ID.[4][5]

Principle

Projecting a narrow band of light onto a three-dimensional surface creates a line of illumination that appears straight from the projector's viewpoint but distorted when viewed from a camera placed off to one side. That distortion encodes the shape of the surface, and analyzing it reconstructs the geometry.[6] A structured light system therefore consists of two main parts a known distance apart, called the baseline: a projector that emits the pattern and a camera that observes it. Where each part of the pattern lands in the camera image depends on how far away the surface is, and the projected position is already known, so the depth of each point is recovered by triangulation between the projector ray and the camera ray.[3][7]

Unlike a time-of-flight camera, which measures the round-trip travel time of light, structured light does not time anything: it is a purely geometric method that infers distance from where the pattern appears, which is why it is closely related to passive stereo vision. The difference is that stereo relies on natural surface texture to find matching points between two cameras, whereas structured light supplies its own texture by projecting the pattern, so it can recover depth even on blank, featureless surfaces.[2][7] Most modern systems project in the near-infrared band so the pattern is invisible to the user and does not interfere with normal lighting.[4][5]

Pattern coding

The pattern a system projects determines how reliably and precisely the camera can identify which projected feature it is looking at. Several coding strategies are used to make that identification unambiguous.[6]

Coding method How it works Notes
Binary and Gray code stripes A sequence of black-and-white stripe patterns is projected, each finer than the last; the on/off value of a pixel across the sequence forms a binary code that uniquely numbers each stripe[6] Robust and unambiguous, but needs several projected frames, so the scene must hold still during capture[6]
Phase shifting A smoothly varying (sinusoidal) fringe pattern is projected several times, each shifted slightly, and the phase at each pixel is solved to locate it within a fringe[6] Reaches sub-stripe resolution, resolving detail at roughly one tenth of the stripe spacing[6]
Dense dot or speckle pattern A single fixed field of pseudo-random infrared dots is projected and matched against a stored reference pattern by image correlation[7] Can recover depth from one frame, which suits real-time and moving subjects; used by the original Kinect and Apple's TrueDepth camera[4][5]

Multi-frame approaches such as Gray code and phase shifting are extremely accurate for static objects but require the subject to remain motionless while the sequence of patterns is captured.[6] Single-pattern dot or speckle approaches trade some precision for the ability to produce a depth map from a single exposure, which is what makes them usable for live interaction and for capturing people in motion.[4][7]

Microsoft Kinect

The original Microsoft Kinect for the Xbox 360, released in North America on November 4, 2010, is the best-known structured light depth sensor and the device that brought the technique to a mass consumer audience.[4] Its depth technology was licensed from the Israeli company PrimeSense, which called the approach "Light Coding."[4][8] A near-infrared laser projector cast a fixed speckle pattern of dots across the space in front of the sensor, and a separate infrared camera captured how that pattern was deformed by the depth of objects in the scene; software then estimated distance from the deformation.[4] The sensor produced a 640 by 480 depth image at 30 frames per second with 11-bit depth values, and its usable depth range for standard play was roughly 1.2 to 3.5 meters.[4]

The Kinect made full-body motion tracking cheap and ubiquitous, and it was rapidly adopted beyond gaming by researchers and hobbyists for 3D scanning, robotics, and early VR and AR experiments, becoming a common bridge between the physical world and 3D software. The second-generation sensor, Kinect for Xbox One, released on November 22, 2013, abandoned structured light in favor of a time-of-flight camera, reflecting a broader industry shift toward time-of-flight for depth at the time.[4]

Apple TrueDepth and Face ID

Apple acquired PrimeSense, the company behind the Kinect's depth technology, in November 2013 for a reported 360 million US dollars, and went on to miniaturize structured light into a front-facing phone sensor.[8][5] The result was the TrueDepth camera, introduced with the iPhone X, which Apple announced on September 12, 2017 and released on November 3, 2017, where it powers the Face ID facial-authentication system.[5]

The TrueDepth system uses a dot projector built around a vertical-cavity surface-emitting laser to cast more than 30,000 invisible infrared dots onto the user's face, an infrared flood illuminator to light the face in the dark, and an infrared camera to read the resulting pattern and build a 3D depth map.[5] Apple describes the camera as "projecting and analyzing thousands of invisible dots to create a depth map of your face" together with an infrared image, which a portion of the device's Neural Engine inside the Secure Enclave converts into a mathematical representation for matching.[9] Because Face ID matches against depth information that does not exist in a flat photograph, the structured light map provides resistance to spoofing by printed images.[9] The same TrueDepth hardware also drives face-tracking features such as animated avatars (Animoji and Memoji) and ARKit's face mesh, directly connecting structured light to consumer AR.[5]

Use in VR, AR, and 3D capture

Structured light is widely used to digitize real faces, objects, and bodies into the 3D models that VR and AR content relies on. Because it can capture fine surface detail at high resolution, it is well suited to producing accurate scans for avatar creation, character models, and environment assets, as well as for industrial design, medical imaging, cultural heritage, and visual effects.[3][6] The whole field of view is patterned and captured at once rather than scanned point by point, so a structured light scanner can acquire a dense, detailed surface far faster than a single-point laser scanner.[3]

Several depth peripherals aimed at developers and 3D capture used structured light. Occipital's Structure Sensor, an infrared depth sensor that clips onto an iPad, was used for handheld 3D scanning of objects, rooms, and people and for AR applications.[6] Intel's early user-facing RealSense depth cameras, the F200 and SR300, used a coded-light (structured light) approach in which triangulation between projected coded patterns and a sensor produces the depth map, aimed at gesture, face, and short-range scanning.[10] Intel's later RealSense D400 series moved to active stereo.[11] In each case the projected infrared pattern lets the sensor recover depth on plain, untextured surfaces where passive stereo would fail.[7]

Comparison with other depth methods

Structured light is one of three dominant approaches to optical depth sensing, alongside passive stereo vision and the time-of-flight camera. Its defining strength is very high accuracy and spatial resolution at short range: in controlled, close-range conditions it typically delivers the best sub-millimeter detail of the three, which is why it dominates precision 3D scanning and facial capture.[7][2] Its accuracy is reference-distance-dependent, peaking near the calibration distance and degrading as objects move farther away, so it is generally best suited to working distances of at most a few meters.[7][2]

The technique's main limitations come from its reliance on seeing the projected pattern clearly. Bright sunlight or strong ambient infrared can wash the pattern out, which makes structured light unreliable outdoors and a poor fit for the mostly indoor, controlled lighting it prefers.[7][2] Multi-frame coding schemes also struggle with fast motion because the subject must hold still across the captured sequence.[2] By contrast, a time-of-flight camera supplies its own light and captures a whole frame in one cycle, so it handles dynamic scenes and darkness better and holds accuracy more evenly over longer ranges, while passive stereo needs no emitter and works in daylight but fails on textureless surfaces.[2][7]

Method Principle Strengths Weaknesses
Structured light A known projected pattern deforms over surfaces; depth comes from triangulation between projector and camera[1] Highest spatial resolution and short-range accuracy; recovers depth on blank surfaces[7] Pattern washed out by sunlight or ambient IR; limited range; multi-frame schemes are sensitive to motion[2]
Time of flight The round-trip travel time of emitted light is measured directly[2] Captures a full frame at once, works in darkness, more consistent over distance[2] Lower fine-detail resolution than structured light at close range[7]; multipath reflections can corrupt readings[12]
Passive stereo Disparity between two cameras a known baseline apart gives depth[2] No emitter, works outdoors in daylight, energy efficient[2] Fails on textureless or repetitive surfaces and in low light[2]

See also

References

  1. 1.0 1.1 "Understanding Depth Cameras: Structured Light, TOF, Stereo". DFRobot Wiki. https://wiki.dfrobot.com/tutorial/20145.
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 "Depth Sensing Technologies: Time-of-Flight vs. Structured Light vs. Stereo". PatSnap Eureka. https://eureka.patsnap.com/article/depth-sensing-technologies-time-of-flight-vs-structured-light-vs-stereo.
  3. 3.0 3.1 3.2 3.3 "3D Scanning 101: Structured Light 3D Scanning". Polyga. https://www.polyga.com/blog/3d-scanning-101-structured-light-3d-scanning/.
  4. 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 "Kinect". Wikipedia. https://en.wikipedia.org/wiki/Kinect.
  5. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 "Face ID". Wikipedia. https://en.wikipedia.org/wiki/Face_ID.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 "Structured-light 3D scanner". Wikipedia. https://en.wikipedia.org/wiki/Structured-light_3D_scanner.
  7. 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 "Structured Light vs iToF Depth Cameras". Orbbec. https://www.orbbec.com/blog/structured-light-vs-itof-depth-cameras/.
  8. 8.0 8.1 "PrimeSense". Wikipedia. https://en.wikipedia.org/wiki/PrimeSense.
  9. 9.0 9.1 "About Face ID advanced technology". Apple. https://support.apple.com/en-us/102381.
  10. "Intel RealSense SR300 Coded Light Depth Camera". IEEE. https://ieeexplore.ieee.org/document/8712544/.
  11. "Intel RealSense". Wikipedia. https://en.wikipedia.org/wiki/Intel_RealSense.
  12. "What is multipath interference? How to minimize it in Time-of-Flight cameras?". e-con Systems. https://www.e-consystems.com/blog/camera/technology/what-is-multipath-interference-how-to-minimize-it-in-time-of-flight-cameras/.