Video see-through
- See also: Optical see-through head-mounted display and Passthrough
Video see-through (often abbreviated VST) is an approach to augmented reality (AR) and mixed reality (MR) in which one or more cameras capture a live view of the real world, that video feed is combined with computer-generated imagery, and the merged result is shown to the user on an opaque display. The wearer never looks directly at the physical surroundings; everything seen is a reconstructed video image. This contrasts with the optical see-through approach, in which the user views the real world directly through a transparent combiner that reflects virtual content into the eyes.[1][2]
Because the displayed scene is fully digital, the system can composite virtual objects on top of, or behind, real ones and can adjust the appearance of the real-world image. The same capability is the basis of the consumer feature commonly marketed as passthrough: a virtual reality headset with outward-facing cameras can switch from a fully synthetic VR scene to a video reconstruction of the room, then blend digital elements into it. Video see-through is the display method used by the Meta Quest 3 and the Apple Vision Pro, among other devices.[3][4]
How it works
A video see-through system has three stages: capture, composition, and display. Outward-facing cameras mounted on the head-mounted display record the scene in front of the user. The captured frames are passed to a processor that renders the virtual content and combines it with the camera image, then the composite is sent to the displays in front of each eye.[1] For a binocular stereoscopic result, the device needs a separate camera view for each eye, or it must synthesize per-eye views from the available cameras.[5]
The cameras almost never sit exactly where the user's eyes are. They are typically mounted on the front of the headset, ahead of and offset from the pupils, so the raw camera image shows the world from a slightly wrong viewpoint. To correct this the system reprojects, or warps, the camera image to approximate the geometry the eyes would have seen. Accurate reprojection requires knowledge of the distance to objects in the scene, so headsets that emphasize passthrough quality estimate per-pixel depth, sometimes with a dedicated depth sensor, and use that depth map to warp the color image into each eye's correct perspective.[6]
Two timing and geometry constraints dominate the engineering. The first is end-to-end latency: the delay between light entering a camera and the corresponding light leaving the display, often called photon-to-photon or see-through latency. Every step (camera exposure, transfer, processing, and display scan-out) adds delay, and a large delay makes the real world appear to lag behind the user's head motion, which is uncomfortable and a known cause of motion sickness.[7] The second is image fidelity: the resolution, field of view, dynamic range, and color of the reconstructed world are limited by the cameras and the displays rather than by the human eye, so the passthrough view is generally lower in detail than direct sight.[1]
Comparison with optical see-through
The central distinction between the two paradigms is whether the user receives an aided (video) or unaided (optical) view of the real world. In optical see-through, real light reaches the eye directly through a combiner and only the virtual content is electronic; in video see-through, both the real and the virtual content are electronic and pass through the same display.[2][1]
Ronald Azuma's 1997 survey and the Rolland and Fuchs medical-visualization study set out the trade-offs that are still cited.[2][1] Reported advantages of video see-through include the ability to digitally match the brightness and contrast of real and virtual content because both are rendered through the same pipeline; the ability to produce correct mutual occlusion, since the system has (or estimates) depth for both the real scene and the virtual objects and can decide which is in front; the option to delay or buffer the real-world video so that it is synchronized with the virtual content, avoiding the registration mismatch that occurs when virtual graphics lag behind a directly viewed real world; and access to compositing techniques such as chroma keying.[1]
The disadvantages also follow from the all-digital path. The real world is seen at the resolution and field of view of the cameras and displays, not at the fidelity of the eye. The camera eye-offset introduces a viewpoint displacement that must be corrected, and imperfect correction leaves parallax and distortion errors, most visible for objects close to the user. Latency is added to the real view, where optical see-through adds none to it. The device is a single point of failure: if the cameras, processor, or displays stop, the wearer of an opaque video see-through headset sees nothing, whereas an optical see-through user still has a direct view.[1] Some authors have proposed hybrid optical and video designs to capture benefits of both.[1]
History
The technique dates to the early 1990s, when researchers attached a video camera to an otherwise opaque head-mounted display to give it see-through capability. In 1992, Michael Bajura, Henry Fuchs, and Ryutarou Ohbuchi of the University of North Carolina at Chapel Hill demonstrated merging virtual objects with the real world by mounting a small video camera in front of a head-mounted display and compositing live ultrasound imagery into the camera view so that it appeared inside the patient's body.[8] In 1993, Emily Edwards, Jannick Rolland, and Kurtis Keller described a dedicated video see-through design in which an optically opaque HMD is given see-through capability by mounting stereoscopic video cameras on the outside of the helmet and projecting their views to the screens inside.[5]
Azuma's 1997 survey formalized the comparison between video and optical blending, and Rolland and Fuchs revisited it in detail for medical use in 2000. For roughly two decades video see-through remained mostly a research and specialist tool, because cameras, processing, and displays could not yet deliver a convincing real-time reconstruction at consumer cost.[2][1]
Use in modern headsets
The approach reached consumers through VR headsets that added passthrough cameras. Early standalone headsets offered low-resolution grayscale passthrough intended mainly for safety and room orientation. The Meta Quest 3, released on 10 October 2023, shipped with two 4-megapixel RGB color cameras feeding full-color passthrough, which Meta described as a large step up from the black-and-white passthrough of the Meta Quest 2. The Quest 3 also carries a depth sensor that it uses to build a depth map and reproject the color image; at launch, reviewers noted visible warping of hands and nearby objects caused by the low resolution of that depth estimate, and Meta reduced the effect through later software updates.[3][6]
The Apple Vision Pro, announced in 2023 and first sold in 2024, is built around video see-through as its primary way of showing the real world. Apple's R1 chip handles input from the headset's cameras and sensors and targets a see-through latency of about 12 milliseconds. Independent photon-to-photon measurement by the test firm OptoFidelity found roughly 11 milliseconds on the Vision Pro, markedly lower than the 35 to 40 milliseconds it measured on the Quest 3, Quest Pro, and HTC Vive XR Elite.[7][4] In October 2025 Apple refreshed the device with the M5 chip; the updated model can raise the passthrough refresh rate to 120 Hz and renders about 10 percent more pixels on its Sony micro-OLED displays, and Apple continued to sell it at 3,499 US dollars rather than discontinuing the line.[9]
At the high end of the professional market, Varjo's mixed-reality headsets use video see-through tuned for image quality. Varjo states that the Varjo XR-4 Focal Edition reaches about 51 pixels per degree in passthrough and adds a gaze-driven autofocus camera system, which the company describes as the highest-quality passthrough of any shipping or announced headset.[10]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
- Fuchs, Henry(2000). "Optical Versus Video See-Through Head-Mounted Displays in Medical Visualization".{Template:Journal. 9(3)
- 287-309. https://direct.mit.edu/pvar/article/9/3/287/18346/Optical-Versus-Video-See-Through-Head-Mounted.
- ↑ 2.0 2.1 2.2 2.3 (1997). "A Survey of Augmented Reality".{Template:Journal. 6(4)
- 355-385. https://www.cs.unc.edu/~azuma/azuma_AR.html. Retrieved 2026-06-15.
- ↑ 3.0 3.1 "Meta Releases the Quest 3, with Mixed Reality Full-Color Passthrough". 2023. https://www.bhphotovideo.com/explora/video/news/meta-releases-the-quest-3-with-mixed-reality-full-color-passthrough.
- ↑ 4.0 4.1 "Here is how Apple Vision Pro takes mixed reality to the next level". 2023. https://mixed-news.com/en/heres-how-apple-vision-pro-takes-mixed-reality-to-the-next-level/.
- ↑ 5.0 5.1 Edwards, Emily K.; Rolland, Jannick P.; Keller, Kurtis P. (1993). "Video see-through design for merging of real and virtual environments". IEEE Virtual Reality Annual International Symposium (VRAIS). https://www.semanticscholar.org/paper/214d176347c8870497f922a7dbea750ebbe5f6f5.
- ↑ 6.0 6.1 "New Update Fixes One of Quest 3's Most Noticeable Issues". 2024. https://roadtovr.com/quest-3-passthrough-warping-fix-update-v66/.
- ↑ 7.0 7.1 "Apple Vision Pro Benchmark Test 1: See-Through Latency, Photon-to-Photon". 2024. https://www.optofidelity.com/insights/blogs/apple-vision-pro-benchmark-test-1-see-through-latency-photon-to-photon.
- ↑ Bajura, Michael; Fuchs, Henry; Ohbuchi, Ryutarou (1992). "Merging virtual objects with the real world: seeing ultrasound imagery within the patient". 26. ACM SIGGRAPH Computer Graphics. pp. 203-210. https://www.semanticscholar.org/paper/0bbee958655fbc9ffe0689909a3ed89d260ae55d.
- ↑ Template:Cite news
- ↑ "Varjo XR-4 Promises Ultra High Resolution Passthrough". 2023. https://www.uploadvr.com/varjo-xr-4/.