Passthrough: Difference between revisions - VR & AR Wiki - Virtual Reality & Augmented Reality Wiki

Line 7:

The fundamental principle of passthrough involves a real-time processing pipeline:

# '''Capture:''' One or more outward-facing digital cameras mounted on the headset capture video of the external world. Early or basic systems might use a single camera (providing a monoscopic view), while more advanced systems use two or more cameras to capture [[stereoscopic]] video, enabling [[depth perception]].<ref name="StereoPassthrough">[https://ieeexplore.ieee.org/document/9191148 Example paper discussing stereoscopic passthrough challenges]</ref> Modern systems often use a combination of [[RGB]] color cameras and monochrome (grayscale) sensors for different purposes (~~e.g.,~~ capturing color data vs. motion/detail).<ref name="MixedNews_Cambria">MIXED News – Project Cambria: Meta explains new passthrough technology (Tomislav Bezmalinović, May 16, 2022)</ref>

# '''Capture:''' One or more outward-facing digital cameras mounted on the headset capture video of the external world. Early or basic systems might use a single camera (providing a monoscopic view), while more advanced systems use two or more cameras to capture [[stereoscopic]] video, enabling [[depth perception]].<ref name="StereoPassthrough">[https://ieeexplore.ieee.org/document/9191148 Example paper discussing stereoscopic passthrough challenges]</ref> Modern systems often use a combination of [[RGB]] color cameras and monochrome (grayscale) sensors for different purposes (for example capturing color data vs. motion/detail).<ref name="MixedNews_Cambria">MIXED News – Project Cambria: Meta explains new passthrough technology (Tomislav Bezmalinović, May 16, 2022)</ref>

# '''Processing:''' The captured video footage is sent to the headset's [[processor]] (either an onboard [[System on a Chip|SoC]] or a connected PC's [[GPU]]). This stage is computationally intensive and critical for a usable and comfortable experience. It typically involves several steps:

#* '''Rectification/Undistortion:''' Correcting [[lens distortion]] inherent in the wide-angle cameras typically used to maximize [[field of view|FOV]].

#* '''Reprojection/Warping:''' Adjusting the captured image perspective to align with the user's eye position inside the headset, rather than the camera's physical position on the outside. This difference in viewpoint causes [[parallax]], and correcting it ("perspective correction") is crucial for accurate spatial representation, correct scale perception, and minimizing [[motion sickness]].<ref name="PassthroughChallengesUploadVR">[https://uploadvr.com/passthrough-ar-technical-challenges/ Passthrough AR: The Technical Challenges of Blending Realities] - UploadVR article discussing latency, distortion, etc.</ref><ref name="KGuttag_Align">KGOnTech (Karl Guttag) – Perspective Correct Passthrough (Sept 26, 2023)</ref> Algorithms based on [[Computer Vision]] and potentially [[Inertial Measurement Unit|IMU]] sensor data are used. Some modern headsets, like the [[Meta Quest Pro]] and [[Meta Quest 3]], employ [[Machine Learning]] or [[Neural Network|neural networks]] to improve the realism and accuracy of this reconstruction.<ref name="QuestProPassthrough">[https://www.meta.com/blog/quest/meta-reality-passthrough-quest-pro/ Meta Blog: Inside Meta Reality and Passthrough on Quest Pro]</ref>

#* '''[[Sensor Fusion]]:''' Combining data from multiple cameras (~~e.g.,~~ fusing monochrome detail with RGB color<ref name="MixedNews_Cambria"/>) and integrating tracking data (~~e.g.,~~ from [[inside-out tracking]] sensors or [[depth sensor]]s) to ensure the passthrough view remains stable, depth-correct, and aligned with the user's head movements.

#* '''[[Sensor Fusion]]:''' Combining data from multiple cameras (for example fusing monochrome detail with RGB color<ref name="MixedNews_Cambria"/>) and integrating tracking data (for example from [[inside-out tracking]] sensors or [[depth sensor]]s) to ensure the passthrough view remains stable, depth-correct, and aligned with the user's head movements.

#* '''Color Correction & Enhancement:''' Adjusting colors, brightness, and contrast to appear more natural, especially under varying lighting conditions. This can also involve [[Artificial Intelligence|AI]]-based denoising or upscaling.<ref name="UploadVR_Q3Review">UploadVR – Quest 3 Review: Excellent VR With Limited MR (David Heaney, Oct 9, 2023)</ref>

# '''Display:''' The processed video feed is rendered onto the headset's internal [[display|displays]], replacing or being overlaid upon the virtual content. The primary goal is to achieve this entire pipeline with minimal [[latency (engineering)|latency]] (ideally under 20 milliseconds<ref name="LatencyThreshold">[https://research.nvidia.com/publication/2016-07_Latency-Requirements-Plausible-Interaction-Augmented-and-Virtual-Reality Latency Requirements for Plausible Interaction in Augmented and Virtual Reality] - Research discussing latency impact.</ref>) to avoid discomfort and maintain realism.

Line 39:

===Color Passthrough===

Uses [[RGB]] color cameras for a full-color view of the real world, greatly enhancing realism and enabling use cases like reading phone screens or interacting with colored objects. First widely available consumer example was Meta Quest Pro.<ref name="MixedNews_Cambria"/> Quality varies significantly based on camera resolution, processing, and calibration (~~e.g.,~~ Quest 3 offers ~10x the passthrough pixels of Quest 2).<ref name="UploadVR_specs">UploadVR – Quest 3 Specs Compared To Quest 2 & Apple Vision Pro (David Heaney, Sep 27, 2023)</ref> High-quality color passthrough (~~e.g.,~~ Varjo XR series, Vision Pro) aims for near-photorealism.<ref name="Skarredghost_Varjo"/><ref name="VisionProPassthrough"/> Requires more powerful hardware and sophisticated software.

Uses [[RGB]] color cameras for a full-color view of the real world, greatly enhancing realism and enabling use cases like reading phone screens or interacting with colored objects. First widely available consumer example was Meta Quest Pro.<ref name="MixedNews_Cambria"/> Quality varies significantly based on camera resolution, processing, and calibration (for example Quest 3 offers ~10x the passthrough pixels of Quest 2).<ref name="UploadVR_specs">UploadVR – Quest 3 Specs Compared To Quest 2 & Apple Vision Pro (David Heaney, Sep 27, 2023)</ref> High-quality color passthrough (for example Varjo XR series, Vision Pro) aims for near-photorealism.<ref name="Skarredghost_Varjo"/><ref name="VisionProPassthrough"/> Requires more powerful hardware and sophisticated software.

===Monoscopic vs. Stereoscopic===

*'''Monoscopic (2D):''' Uses a single camera view (or identical views) for both eyes (~~e.g.,~~ original HTC Vive, initial Pico 4 implementation<ref name="Reddit_PicoMono"/>). Lacks [[binocular disparity]], resulting in a "flat" image without true depth perception. Scale and distance can feel incorrect or uncomfortable.

*'''Monoscopic (2D):''' Uses a single camera view (or identical views) for both eyes (for example original HTC Vive, initial Pico 4 implementation<ref name="Reddit_PicoMono"/>). Lacks [[binocular disparity]], resulting in a "flat" image without true depth perception. Scale and distance can feel incorrect or uncomfortable.

*'''Stereoscopic (3D):''' Uses two distinct camera viewpoints (one per eye, or reconstructed dual views) to create a 3D effect with depth perception. Requires cameras positioned roughly at the user's [[interpupillary distance]] (IPD) and careful calibration/reprojection. Essential for comfortable MR and accurate spatial interaction. Implemented in Rift S, PSVR2 (B&W stereo), Quest Pro, Quest 3, Vision Pro, Varjo XR series, etc.<ref name="UploadVR_Q3Review"/> Achieving correct scale and geometry is key to avoiding discomfort.<ref name="UploadVR_Q3Review_MR"/>

Line 61:

*Virtual lighting affecting the appearance of the real world within the passthrough view (and vice-versa).

*Using [[Machine Learning|ML]] for scene segmentation (identifying walls, floors, furniture, people) to enable complex interactions.<ref name="XRToday_def"/>

Requires high-quality color, stereoscopic depth, active depth sensing, low latency, and sophisticated rendering techniques (~~e.g.,~~ real-time lighting estimation, environmental mapping). Devices like Quest 3 and Vision Pro heavily emphasize these capabilities.<ref name="UploadVR_Q3Review_MR"/><ref name="Verge_VisionPro"/>

Requires high-quality color, stereoscopic depth, active depth sensing, low latency, and sophisticated rendering techniques (for example real-time lighting estimation, environmental mapping). Devices like Quest 3 and Vision Pro heavily emphasize these capabilities.<ref name="UploadVR_Q3Review_MR"/><ref name="Verge_VisionPro"/>

==Technical Challenges==

Line 76:

Engineers employ various techniques to address passthrough challenges:

*'''Multi-Camera Sensor Fusion:''' Using multiple cameras with different strengths (~~e.g.,~~ high-resolution RGB for color, fast monochrome for low-latency motion/detail) and fusing their data computationally.<ref name="MixedNews_Cambria"/> Overlapping camera views help compute stereo depth and increase effective FOV.<ref name="XRToday_def"/>

*'''Multi-Camera Sensor Fusion:''' Using multiple cameras with different strengths (for example high-resolution RGB for color, fast monochrome for low-latency motion/detail) and fusing their data computationally.<ref name="MixedNews_Cambria"/> Overlapping camera views help compute stereo depth and increase effective FOV.<ref name="XRToday_def"/>

*'''Active Depth Sensing:''' Incorporating dedicated depth sensors (IR ToF, Structured Light, [[LiDAR]]) provides robust, real-time 3D geometry information of the environment, improving reprojection accuracy, occlusion handling, and spatial anchoring.<ref name="UploadVR_Q3Review"/><ref name="VisionProPassthrough"/> This enables features like quick room meshing via APIs (~~e.g.,~~ Meta's Spatial Anchors, Apple's ARKit/RoomPlan).

*'''Active Depth Sensing:''' Incorporating dedicated depth sensors (IR ToF, Structured Light, [[LiDAR]]) provides robust, real-time 3D geometry information of the environment, improving reprojection accuracy, occlusion handling, and spatial anchoring.<ref name="UploadVR_Q3Review"/><ref name="VisionProPassthrough"/> This enables features like quick room meshing via APIs (for example Meta's Spatial Anchors, Apple's ARKit/RoomPlan).

*'''[[Machine Learning]] Enhancements:''' Using AI/ML for various tasks:

**Image upscaling and denoising to improve clarity, especially in low light.

Line 84:

**Improving [[Simultaneous localization and mapping|SLAM]] for more stable tracking and anchoring of virtual objects.

*'''Reprojection and Virtual Cameras:''' Software techniques that warp the captured camera images based on depth data to synthesize a view from the user's actual eye positions ("virtual cameras"<ref name="KGuttag_Align"/>). [[Asynchronous TimeWarp|Time-warping]] techniques can further reduce perceived latency by adjusting the image based on last-moment head movements.

*'''Improved Optics and Displays:''' [[Pancake lens|Pancake lenses]] allow for thinner headsets where cameras can potentially be placed closer to the eyes, reducing offset. Higher resolution, higher [[dynamic range]] (~~e.g.,~~ [[Micro-OLED]] in Vision Pro), and faster refresh rate displays improve the fidelity of the displayed passthrough feed. Careful calibration of lens distortion profiles is also applied.<ref name="RoadToVR_PSVR2"/>

*'''Improved Optics and Displays:''' [[Pancake lens|Pancake lenses]] allow for thinner headsets where cameras can potentially be placed closer to the eyes, reducing offset. Higher resolution, higher [[dynamic range]] (for example [[Micro-OLED]] in Vision Pro), and faster refresh rate displays improve the fidelity of the displayed passthrough feed. Careful calibration of lens distortion profiles is also applied.<ref name="RoadToVR_PSVR2"/>

*'''User Experience (UX) Improvements:''' Features like a dedicated passthrough toggle button (PSVR2<ref name="RoadToVR_PSVR2"/>), automatic passthrough activation when nearing boundaries (Quest Guardian<ref name="UploadVR_Q3Review_MR"/>), and boundaryless MR modes enhance usability and seamlessly blend real/virtual interactions.

Line 92:

===Consumer Uses===

*'''Safety and Convenience:''' Defining play boundaries ([[Guardian system]], [[Chaperone (virtual reality)]]), avoiding obstacles, checking phones, finding controllers, or interacting briefly with people/pets without removing the headset.<ref name="PCMag_passthrough"/>

*'''[[Mixed Reality]] Gaming and Entertainment:''' Games where virtual elements interact with the user's physical room (~~e.g.,~~ characters hiding behind real furniture, virtual objects placed on real tables).<ref name="UploadVR_Q3Review_MR"/> Creative apps allowing virtual painting on real walls.

*'''[[Mixed Reality]] Gaming and Entertainment:''' Games where virtual elements interact with the user's physical room (for example characters hiding behind real furniture, virtual objects placed on real tables).<ref name="UploadVR_Q3Review_MR"/> Creative apps allowing virtual painting on real walls.

*'''Productivity and Utility:''' Using [[virtual desktop]]s or multiple virtual monitors while still seeing the real keyboard, mouse, and desk.<ref name="TechTarget_ARdef">[https://www.techtarget.com/whatis/definition/augmented-reality-AR TechTarget: What is augmented reality (AR)?]</ref>

*'''Social Presence:''' Reducing isolation during VR use by allowing users to see others in the same physical space. Enabling co-located MR experiences where multiple users interact with shared virtual content in the same room.

Line 98:

===Enterprise and Professional Uses===

*'''Collaboration:''' Design reviews where virtual prototypes are viewed in a real meeting room alongside physical mockups or colleagues.<ref name="XRToday_enterprise">XR Today – VR Passthrough in Enterprise (Immersive Learning News)</ref> Remote collaboration where experts guide on-site technicians using virtual annotations overlaid on the real equipment view.

*'''Training and Simulation:''' Combining virtual scenarios with physical controls or environments (~~e.g.,~~ flight simulation using a real cockpit visible via passthrough, medical training on physical manikins with virtual overlays).<ref name="VIVE_Blog_Sauce"/>

*'''Training and Simulation:''' Combining virtual scenarios with physical controls or environments (for example flight simulation using a real cockpit visible via passthrough, medical training on physical manikins with virtual overlays).<ref name="VIVE_Blog_Sauce"/>

*'''Visualization:''' Architects visualizing 3D models on a real site, designers overlaying virtual concepts onto physical products.

*'''Productivity:''' Creating expansive virtual workspaces integrated with the physical office environment, improving multitasking while maintaining awareness.<ref name="XRToday_benefits">XR Today – What is VR Passthrough... (on benefits of passthrough)</ref>

Line 118:

===Video Passthrough (VST)===

*Uses cameras and opaque displays to show a reconstructed view of the real world.

*'''Pros:''' Virtual elements can be fully opaque and seamlessly blended. Potential for wider FOV matching the VR display. Can computationally modify the real-world view (~~e.g.,~~ brightness enhancement, selective filtering). Better blocking of ambient light for virtual content.

*'''Pros:''' Virtual elements can be fully opaque and seamlessly blended. Potential for wider FOV matching the VR display. Can computationally modify the real-world view (for example brightness enhancement, selective filtering). Better blocking of ambient light for virtual content.

*'''Cons:''' Real-world view is mediated by technology, subject to limitations (latency, resolution, color, dynamic range, distortion). Higher power consumption. Potential for discomfort (motion sickness, eye strain) if not implemented well. Real-world objects might appear less "solid" due to latency or artifacts.<ref name="SkarbezVSTvsOST"/>

Line 137:

*Achieving even lower latency and higher resolution/FOV, approaching the fidelity of human vision.

*Improving camera [[dynamic range]], color fidelity, and low-light performance.

*Developing more sophisticated and efficient [[depth sensing]] and real-time 3D reconstruction (~~e.g.,~~ using [[LiDAR]], advanced [[Computer Vision|CV]], [[Neural Radiance Fields|NeRFs]]).

*Developing more sophisticated and efficient [[depth sensing]] and real-time 3D reconstruction (for example using [[LiDAR]], advanced [[Computer Vision|CV]], [[Neural Radiance Fields|NeRFs]]).

*Integrating [[Artificial Intelligence|AI]] for enhanced scene understanding, object recognition, segmentation, and interaction modeling (realistic physics, occlusion).

*Implementing selective passthrough (showing only specific real-world elements like hands or keyboards) and potentially "augmented reality" filters applied to the real-world view.

*Utilizing [[eye tracking]] for [[foveated rendering]] of the passthrough feed or dynamic depth-of-field adjustments.

*Exploring novel camera technologies like light field cameras (~~e.g.,~~ Meta's "Flamera" concept<ref name="KGuttag_Flamera">KGOnTech (Karl Guttag) – Meta Flamera Light Field Passthrough</ref>) to better solve perspective issues.

*Exploring novel camera technologies like light field cameras (for example Meta's "Flamera" concept<ref name="KGuttag_Flamera">KGOnTech (Karl Guttag) – Meta Flamera Light Field Passthrough</ref>) to better solve perspective issues.

As technology matures, VST passthrough aims to provide a near-seamless blend between the virtual and physical worlds, potentially unifying VR and AR capabilities into single, versatile devices.