Time-of-flight camera
A time-of-flight camera (ToF camera) is a range-imaging system that measures the distance between the camera and points in a scene by timing how long emitted light takes to travel to the scene and back to the sensor. Because every pixel records a distance, the device produces a depth map of the whole field of view at once rather than scanning point by point. ToF cameras typically illuminate the scene with their own near-infrared light, so they work in darkness and do not depend on surface texture or visible lighting.[1][2]
Time-of-flight sensors are one of the main hardware approaches to Depth sensing used in virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems. They feed environment reconstruction, spatial mapping, and hand tracking. The depth camera in the Microsoft Kinect for Xbox One (Kinect v2) and in the Microsoft HoloLens 2 headset are both ToF sensors, and ToF depth modules have appeared in several AR-capable smartphones.[3][4]
Operating principle
The core idea is that light travels at a known, constant speed (about 300,000 km/s), so the distance to a surface can be recovered from the time light takes to make the round trip. For a measured round-trip delay t, the distance d is half the distance light covers in that time: d = c × t / 2, where c is the speed of light. The factor of one half accounts for the light travelling out to the surface and back.[5][2]
A ToF camera combines an active light source with an image sensor and timing electronics. The illumination is usually a near-infrared LED or a vertical-cavity surface-emitting laser (VCSEL), chosen because infrared light is invisible to the user and can be separated from ambient light with an optical band-pass filter over the lens. The image sensor is a CMOS array of light-sensitive pixels, and on-camera electronics convert the raw timing data from each pixel into a per-pixel distance value.[1][3] Because the timing has to be precise, the driver electronics must synchronise the light and the sensor very tightly; a phase or timing error of only a few picoseconds corresponds to a depth error of roughly a millimetre.[5]
There are two broad ways to implement the timing measurement:
- Direct time-of-flight (dToF), sometimes implemented as a flash LiDAR, emits short pulses of light and directly measures the elapsed time until the reflected pulse arrives, often using fast single-photon detectors. Direct ToF favours long range and is robust in high background light.[6][5]
- Indirect time-of-flight (iToF), also called continuous-wave or modulated ToF, does not time a single pulse. Instead it shines amplitude-modulated light, often modulated at tens of megahertz, and measures the phase shift between the emitted and returning signal; that phase shift is proportional to distance. Most ToF depth cameras used in consumer electronics and XR are indirect, phase-based devices.[1][7]
LiDAR is closely related: light detection and ranging that measures distance from the travel time of light is itself a form of time-of-flight sensing, and the direct-ToF approach is the method behind many LiDAR scanners. The term ToF camera is most often used for the full-field, per-pixel imagers described here, while LiDAR is often used for systems that build up a point cloud, sometimes by steering a beam.[6][8]
Comparison with structured light and stereo
ToF is one of three depth-imaging methods common in consumer and XR hardware. The other two are structured light, which projects a known pattern (for example a dense grid of dots) and computes depth from how the pattern deforms over the scene, and stereo vision, which uses two cameras and infers depth from the parallax between the two views, much as human binocular vision does.[9][10]
The methods trade off range, resolution, and computation differently. Structured light reaches the highest accuracy at short range, down to sub-millimetre depth resolution, which suits 3D scanning and face recognition, but it is best for close distances of a few metres and its projected pattern is washed out in bright sunlight. ToF works over a longer range and at high frame rate with low computational load, because each pixel yields a distance directly without heavy matching, at the cost of lower spatial detail than structured light. Stereo needs no active illumination, so it is power-efficient and unobtrusive and can use ordinary cameras, but it is computationally heavy, struggles on plain untextured surfaces, and its depth error grows with distance.[9][10][11]
| Method | How depth is found | Active light | Strengths | Weaknesses |
|---|---|---|---|---|
| Time-of-flight | Times the round trip of emitted light (pulse or phase shift) per pixel | Yes (near-IR) | Long range, high frame rate, low compute, works in the dark | Lower spatial detail; multipath and ambient-light errors |
| Structured light | Projects a known pattern; reads its deformation | Yes (projected IR pattern) | Very high short-range accuracy; fine detail | Short range; pattern washed out in sunlight; needs precise calibration |
| Stereo vision | Triangulates parallax between two cameras | No (passive) | No emitter; low power; cheap cameras | Heavy computation; fails on plain surfaces; error grows with distance |
Limitations
ToF cameras share several error sources. Strong background light, especially sunlight, can swamp the modulated signal of phase-based systems, which is one reason indoor use is more reliable than bright outdoor use for many iToF modules.[11][9] Running several ToF cameras in the same space can cause crosstalk unless they use different modulation frequencies or time-multiplexing.[3]
The most discussed artefact is multipath interference, where emitted light reaches a pixel by more than one route, for example bouncing between two walls or off a shiny or transparent object, so the returns blend and the measured distance no longer matches the true distance. It is most visible at corners and concave features and around reflective, glossy, glass, or curved surfaces, and it shows up in point clouds as distorted or displaced geometry.[12][7] Indirect, phase-based ToF is particularly prone to multipath, while direct ToF is more affected by specular surfaces.[3][7]
Use in VR and AR
In XR, a ToF depth camera supplies a live 3D model of the user's surroundings and of the user's hands, which the system uses for spatial mapping (building a mesh of the room), placing virtual content so it can be occluded by and rest on real surfaces, and hand tracking for controller-free input. Depth from a ToF sensor is commonly combined with the headset's cameras and inertial sensors through sensor fusion and SLAM to track the device and reconstruct the environment.[4][13]
Kinect
The Kinect motion sensor for Microsoft's Xbox consoles is the most widely cited example of consumer ToF, and its two generations illustrate the difference between the depth methods. The original Kinect (2010) used structured light, with depth-sensing developed by PrimeSense. The second generation, the Kinect for Xbox One (also called Kinect v2), released alongside the Xbox One in November 2013, switched to a time-of-flight depth sensor; Microsoft's 2010 acquisition of Canesta brought ToF expertise to the project.[14][15] The Kinect v2 depth camera has a 512 × 424 CMOS infrared ToF sensor running at about 30 Hz alongside a 1080p RGB camera, measures depth from roughly 0.8 to 4.2 m, and uses multiple modulation frequencies (about 10 to 130 MHz) to balance accuracy against phase unwrapping.[16][17] Although it was a games accessory, the Kinect v2 was widely repurposed for research, 3D scanning, and VR experiments because it delivered live depth at low cost.
Microsoft HoloLens 2 and Azure Kinect
Microsoft carried its ToF work forward into the Microsoft HoloLens 2 AR headset and the standalone Azure Kinect Developer Kit, which share the same depth-sensing module. The HoloLens 2 depth camera uses active infrared illumination and phase-based time-of-flight, and it runs in two modes: a high-frame-rate near mode (about 45 fps) used for hand tracking, and a low-frame-rate far mode (about 1 to 5 fps) used for spatial mapping of the wider environment.[4][18] Microsoft discontinued the Azure Kinect Developer Kit hardware in 2023 but continued to license the underlying ToF depth technology to partners such as Analog Devices and Orbbec, so the sensor design lived on beyond the original product.[19][20]
Magic Leap
Both Magic Leap AR headsets use a ToF depth sensor for world sensing. The original Magic Leap One has an infrared depth projector above the nose bridge for depth sensing, and the device performs real-time environment meshing and hand tracking.[13][21] The Magic Leap 2 (2022) uses an indirect time-of-flight depth sensor co-developed by Infineon and pmdtechnologies, based on Infineon's REAL3 image-sensor family, which builds a 3D map of the environment and captures hands and objects for gesture interaction; the partners tuned the sensor for low power consumption to reduce heat and extend battery life.[22][23]
Smartphones and tablets
Several AR-capable phones have included a rear ToF depth camera to improve camera autofocus and to support AR by helping apps gauge the size and shape of a room or object. Samsung used rear ToF modules on phones including the Galaxy S20+ and S20 Ultra.[24] Apple took the related direct-ToF route on its higher-end iPhone Pro and iPad Pro models, adding a LiDAR scanner starting with the 2020 iPad Pro and iPhone 12 Pro to speed up AR placement and improve low-light focus.[25] Earlier, Google's Project Tango platform paired motion-tracking and depth cameras on phones and tablets for AR before being retired. ToF and LiDAR depth on mobile devices feed the same XR tasks (occlusion, surface detection, and meshing) that headsets perform, just on a phone form factor.[24][25]
References
- ↑ 1.0 1.1 1.2 "What are depth-sensing cameras? How do they work?". https://www.e-consystems.com/blog/camera/technology/what-are-depth-sensing-cameras-how-do-they-work/.
- ↑ 2.0 2.1 "What is a Time of Flight Sensor and How does a ToF Sensor work?". 2020-01-08. https://www.seeedstudio.com/blog/2020/01/08/what-is-a-time-of-flight-sensor-and-how-does-a-tof-sensor-work/.
- ↑ 3.0 3.1 3.2 3.3 "An Overview of Depth Cameras and Range Scanners Based on Time-of-Flight Technologies". 2020. https://arxiv.org/abs/2012.06772.
- ↑ 4.0 4.1 4.2 "Microsoft HoloLens 2: Improved Research Mode to facilitate computer vision research". https://www.microsoft.com/en-us/research/blog/microsoft-hololens-2-improved-research-mode-to-facilitate-computer-vision-research/.
- ↑ 5.0 5.1 5.2 "What Is a Time of Flight Sensor and How Does ToF Work?". https://www.makeuseof.com/what-is-time-of-flight-sensor-how-does-tof-work/.
- ↑ 6.0 6.1 "Direct time-of-flight (D-TOF) image sensor for LiDAR applications". https://www.epfl.ch/labs/aqua/research/lidar/tof-lidar/.
- ↑ 7.0 7.1 7.2 "Multipath interference of indirect Time of Flight (iToF)". https://industry.goermicro.com/blog/tech-briefs/multipath-interference-of-indirect-time-of-flight-itof.html.
- ↑ "LiDAR, optical distance and time of flight sensors". https://ams-osram.com/innovation/technology/depth-and-3d-sensing/lidar-optical-distance-and-time-of-flight-sensors.
- ↑ 9.0 9.1 9.2 "A Brief Analysis of the Principles of Depth Cameras: Structured Light, TOF, and Stereo Vision". https://wiki.dfrobot.com/brief_analysis_of_camera_principles.
- ↑ 10.0 10.1 "Comparing Three Prevalent 3D Imaging Technologies: ToF, Structured Light and Binocular Stereo Vision". https://www.revopoint3d.com/comparing-three-prevalent-3d-imaging-technologies-tof-structured-light-and-binocular-stereo-vision/.
- ↑ 11.0 11.1 "Time-of-Flight (ToF) Cameras vs. other 3D Depth Mapping Cameras". https://www.e-consystems.com/blog/camera/technology/how-time-of-flight-tof-compares-with-other-3d-depth-mapping-technologies/.
- ↑ "What is multipath interference? How to minimize it in Time-of-Flight cameras?". https://www.e-consystems.com/blog/camera/technology/what-is-multipath-interference-how-to-minimize-it-in-time-of-flight-cameras/.
- ↑ 13.0 13.1 "Real-time World Sensing". https://developer-docs.magicleap.cloud/docs/guides/features/spatial-mapping/.
- ↑ "Kinect". https://en.wikipedia.org/wiki/Kinect.
- ↑ "Canesta". https://en.wikipedia.org/wiki/Canesta.
- ↑ "Metrological Qualification of the Kinect V2 Time-of-Flight Camera". 2018. https://link.springer.com/chapter/10.1007/978-3-319-91761-0_4.
- ↑ "A metrological characterization of the Kinect V2 time-of-flight camera". 2016. https://www.sciencedirect.com/science/article/abs/pii/S0921889015002195.
- ↑ "Here are the main specs for Microsoft's HoloLens 2 headset, Azure Kinect". https://www.onmsft.com/news/here-are-the-main-specs-for-microsofts-hololens-2-headset-azure-kinect.
- ↑ "Microsoft Kills Off Azure Kinect Products". 2023-08-17. https://rcpmag.com/articles/2023/08/17/microsoft-kills-azure-kinect.aspx.
- ↑ "Azure Kinect". https://en.wikipedia.org/wiki/Azure_Kinect.
- ↑ "Magic Leap One Teardown". https://www.ifixit.com/Teardown/Magic+Leap+One+Teardown/112245.
- ↑ "Infineon and pmdtechnologies develop 3D depth-sensing technology for Magic Leap 2". 2022-05-30. https://www.infineon.com/cms/en/about-infineon/press/press-releases/2022/INFPSS202205-086.html.
- ↑ "Infineon and pmdtechnologies develop 3D depth-sensing technology for Magic Leap 2". 2022-05-30. https://pmdtec.com/en/company/news/infineon-and-pmdtechnologies-develop-3d-depth-sensing-technology-for-magic-leap-2/.
- ↑ 24.0 24.1 "LiDAR vs. 3D ToF Sensors: How Apple Is Making AR Better for Smartphones". https://ios.gadgethacks.com/news/lidar-vs-3d-tof-sensors-apple-is-making-ar-better-for-smartphones-0280778/.
- ↑ 25.0 25.1 "6 ways Android phones could use the iPhone 12 Pro's LiDAR scanner tech". https://www.techradar.com/news/6-ways-android-phones-could-use-the-iphone-12-pros-lidar-scanner-tech.