Depth cue: Difference between revisions - VR & AR Wiki - Virtual Reality & Augmented Reality Wiki

Line 1:

'''Depth cue''' is any of a variety of perceptual signals that allows the [[human visual system]] to infer the distance or depth of objects in a scene, enabling the brain to transform two-dimensional retinal images into a perception of three-dimensional space. <ref name="HowardRogers2012"/> These cues are crucial for navigating the three-dimensional world and are fundamental to creating convincing, immersive, and comfortable experiences in [[Virtual Reality]] (VR) and [[Augmented Reality]] (AR), where reproducing accurate depth perception presents significant technical challenges. <ref name="HowardRogers1995">Howard, Ian P., and Brian J. Rogers. "Binocular vision and stereopsis." Oxford University Press~~, 1995~~.</ref> The brain automatically fuses multiple available depth cues to build a robust model of the spatial layout of the environment. <ref name="HITLCues1">(2014-06-20) Visual Depth Cues - Human Interface Technology Laboratory. Retrieved April 25, 2025, from https://www.hitl.washington.edu/projects/knowledge-base/virtual-worlds/EVE/III.A.1.b.VisualDepthCues.html</ref>

'''Depth cue''' is any of a variety of perceptual signals that allows the [[human visual system]] to infer the distance or depth of objects in a scene, enabling the brain to transform two-dimensional retinal images into a perception of three-dimensional space. <ref name="HowardRogers2012">Howard, I. P., & Rogers, B. J. (2012). *Perceiving in Depth, Volume 1: Basic Mechanisms*. Oxford University Press.</ref> These cues are crucial for navigating the three-dimensional world and are fundamental to creating convincing, immersive, and comfortable experiences in [[Virtual Reality]] (VR) and [[Augmented Reality]] (AR), where reproducing accurate depth perception presents significant technical challenges. <ref name="HowardRogers1995">Howard, Ian P., and Brian J. Rogers. (1995). *Binocular vision and stereopsis*. Oxford University Press.</ref> The brain automatically fuses multiple available depth cues to build a robust model of the spatial layout of the environment. <ref name="HITLCues1">(2014-06-20) Visual Depth Cues - Human Interface Technology Laboratory. Retrieved April 25, 2025, from https://www.hitl.washington.edu/projects/knowledge-base/virtual-worlds/EVE/III.A.1.b.VisualDepthCues.html</ref>

== Classification of Depth Cues ==

Line 14:

=== [[Binocular Disparity]] (Stereopsis) ===

Because the two eyes are horizontally separated (by the [[interpupillary distance]], or IPD, typically around 6-7 cm), they receive slightly different images of the world. This difference in the image location of an object seen by the left and right eyes is called '''binocular disparity'''. The brain's visual cortex processes this disparity to generate the perception of depth, a phenomenon known as '''[[stereopsis]]'''. <ref name="BlakeWilson2011">Blake, R., & Wilson, H. R. (2011). Binocular vision. *Vision Research, 51*(7), 754–770. doi:10.1016/j.visres.2010.10.009</ref> <ref name="ParkerStereo2007">Parker, Andrew J. "Binocular depth perception and the cerebral cortex." Nature Reviews Neuroscience 8.5 ~~(2007~~): 379-391.</ref> VR headsets exploit this by presenting a separate image with the correct perspective offset to each eye, simulating the natural disparity an observer would experience. It is an especially powerful depth cue for near to mid-range distances. <ref name="HITLCues1"/>

Because the two eyes are horizontally separated (by the [[interpupillary distance]], or IPD, typically around 6-7 cm), they receive slightly different images of the world. This difference in the image location of an object seen by the left and right eyes is called '''binocular disparity'''. The brain's visual cortex processes this disparity to generate the perception of depth, a phenomenon known as '''[[stereopsis]]'''. <ref name="BlakeWilson2011">Blake, R., & Wilson, H. R. (2011). Binocular vision. *Vision Research, 51*(7), 754–770. doi:10.1016/j.visres.2010.10.009</ref> <ref name="ParkerStereo2007">Parker, Andrew J. (2007). Binocular depth perception and the cerebral cortex. *Nature Reviews Neuroscience, 8*(5), 379-391.</ref> VR headsets exploit this by presenting a separate image with the correct perspective offset to each eye, simulating the natural disparity an observer would experience. It is an especially powerful depth cue for near to mid-range distances. <ref name="HITLCues1"/>

=== [[Convergence]] (Vergence) ===

This refers to the simultaneous movement of both eyes in opposite directions to maintain single binocular vision. The eyes rotate inward ('''convergence''') to focus on a nearby object, or rotate outward ('''divergence''') for a distant object. The [[extraocular muscles]] that control eye movement provide feedback to the brain about the degree of convergence, which acts as a cue to the object's distance. <ref name="HowardRogers2012"/> <ref name="WattFocusCues2005">Watt, Simon J., ~~et al~~. "Focus cues affect perceived depth." Journal of vision 5.10 ~~(2005~~): 834-862.</ref> In VR/AR, the required convergence angle changes naturally as a user looks at virtual objects simulated at different distances. Convergence is most effective as a cue at close ranges (within a few meters) and diminishes significantly for distant objects (beyond ~10 meters, the lines of sight are nearly parallel). <ref name="HITLCues1"/> [[Eye tracking]] technology can measure the vergence angle directly.

This refers to the simultaneous movement of both eyes in opposite directions to maintain single binocular vision. The eyes rotate inward ('''convergence''') to focus on a nearby object, or rotate outward ('''divergence''') for a distant object. The [[extraocular muscles]] that control eye movement provide feedback to the brain about the degree of convergence, which acts as a cue to the object's distance. <ref name="HowardRogers2012"/> <ref name="WattFocusCues2005">Watt, Simon J., Auld, W. S., & Binnie, R. G. (2005). Focus cues affect perceived depth. *Journal of vision, 5*(10), 834-862.</ref> In VR/AR, the required convergence angle changes naturally as a user looks at virtual objects simulated at different distances. Convergence is most effective as a cue at close ranges (within a few meters) and diminishes significantly for distant objects (beyond ~10 meters, the lines of sight are nearly parallel). <ref name="HITLCues1"/> [[Eye tracking]] technology can measure the vergence angle directly.

== Monocular Cues ==

Line 25:

==== [[Accommodation]] ====

This refers to the automatic adjustment of the eye's [[lens (anatomy)|lens]] focus to maintain a clear image (retinal focus) of an object as its distance changes. The [[ciliary muscle]] controls the lens shape; the muscular tension or effort involved provides the brain with a cue to the object's distance. <ref name="CuttingVishton1995">Cutting, J. E., & Vishton, P. M. (1995). Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In W. Epstein & S. Rogers (Eds.), *Handbook of perception and cognition: Vol. 5. Perception of space and motion* (pp. 69–117). Academic Press.</ref> <ref name="FisherAccommodation1988">Fisher, Scott K., and Kenneth J. Ciuffreda. "Accommodation and apparent distance." Perception 17.5 ~~(1988~~): 609-621.</ref> This cue is primarily effective for objects within approximately 2 meters and is relatively weak compared to other cues, often working in conjunction with them. <ref name="HITLCues2">(2014-06-20) Accommodation and Convergence - Human Interface Technology Laboratory. Retrieved April 25, 2025, from https://www.hitl.washington.edu/projects/knowledge-base/virtual-worlds/EVE/III.A.1.a.AccommodationConvergence.html</ref> <ref name="HITLCues1"/>

This refers to the automatic adjustment of the eye's [[lens (anatomy)|lens]] focus to maintain a clear image (retinal focus) of an object as its distance changes. The [[ciliary muscle]] controls the lens shape; the muscular tension or effort involved provides the brain with a cue to the object's distance. <ref name="CuttingVishton1995">Cutting, J. E., & Vishton, P. M. (1995). Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In W. Epstein & S. Rogers (Eds.), *Handbook of perception and cognition: Vol. 5. Perception of space and motion* (pp. 69–117). Academic Press.</ref> <ref name="FisherAccommodation1988">Fisher, Scott K., and Kenneth J. Ciuffreda. (1988). Accommodation and apparent distance. *Perception, 17*(5), 609-621.</ref> This cue is primarily effective for objects within approximately 2 meters and is relatively weak compared to other cues, often working in conjunction with them. <ref name="HITLCues2">(2014-06-20) Accommodation and Convergence - Human Interface Technology Laboratory. Retrieved April 25, 2025, from https://www.hitl.washington.edu/projects/knowledge-base/virtual-worlds/EVE/III.A.1.a.AccommodationConvergence.html</ref> <ref name="HITLCues1"/>

=== Pictorial (Static) Monocular Cues ===

Line 40:

==== [[Relative Height]] (Elevation in the Visual Field) ====

For objects resting on the same ground plane, those that are higher in the visual field (closer to the horizon line) are typically perceived as being farther away. For objects above the horizon line (e.g., clouds), those lower in the visual field are perceived as farther. <ref name="CuttingVishton1995"/> <ref name="OoiHeight2001">Ooi, Teng Leng, Bing Wu, and Zijiang J. He. "Distance determined by the angular declination below the horizon." Nature 414.6860 ~~(2001~~): 197-200.</ref>

For objects resting on the same ground plane, those that are higher in the visual field (closer to the horizon line) are typically perceived as being farther away. For objects above the horizon line (e.g., clouds), those lower in the visual field are perceived as farther. <ref name="CuttingVishton1995"/> <ref name="OoiHeight2001">Ooi, Teng Leng, Bing Wu, and Zijiang J. He. (2001). Distance determined by the angular declination below the horizon. *Nature, 414*(6860), 197-200.</ref>

==== [[Linear Perspective]] ====

Parallel lines, such as railway tracks or the edges of a straight road, appear to converge towards a single [[vanishing point]] as they recede into the distance. The degree of convergence provides a strong cue to distance and spatial layout. <ref name="Palmer1999"/> <ref name="SchwartzPerspective2009">Schwartz, Steven H. "Visual perception: A clinical orientation." McGraw Hill Professional~~, 2009~~.</ref>

Parallel lines, such as railway tracks or the edges of a straight road, appear to converge towards a single [[vanishing point]] as they recede into the distance. The degree of convergence provides a strong cue to distance and spatial layout. <ref name="Palmer1999"/> <ref name="SchwartzPerspective2009">Schwartz, Steven H. (2009). *Visual perception: A clinical orientation*. McGraw Hill Professional.</ref>

==== [[Texture Gradient]] ====

Line 49:

==== [[Atmospheric Perspective]] (Aerial Perspective) ====

Objects at great distances appear less saturated, lower in contrast, hazier, and often shifted towards a bluish hue. This is due to light scattering by particles (dust, water vapor) in the atmosphere. The farther the object, the more pronounced the effect. <ref name="Palmer1999"/> <ref name="OSheaContrast1994">O'Shea, Robert P., Simon J. Blackburn, and Hiroshi Ono. "Contrast as a depth cue." Vision research 34.12 ~~(1994~~): 1595-1604.</ref> <ref name="HITLCues1"/>

Objects at great distances appear less saturated, lower in contrast, hazier, and often shifted towards a bluish hue. This is due to light scattering by particles (dust, water vapor) in the atmosphere. The farther the object, the more pronounced the effect. <ref name="Palmer1999"/> <ref name="OSheaContrast1994">O'Shea, Robert P., Simon J. Blackburn, and Hiroshi Ono. (1994). Contrast as a depth cue. *Vision research, 34*(12), 1595-1604.</ref> <ref name="HITLCues1"/>

==== [[Shading]] and [[Lighting]] ====

The way light falls on objects creates patterns of light and shadow (shading) that provide crucial cues about their three-dimensional shape, surface curvature, and relative position to light sources and other objects. <ref name="Palmer1999"/> <ref name="RamachandranShading1988">Ramachandran, Vilayanur S. "Perception of shape from shading." Nature 331.6152 ~~(1988~~): 163-166.</ref> Assumptions, such as light typically coming from above, help interpret these cues. Shadows cast by one object onto another also indicate relative position. <ref name="HITLCues1"/>

The way light falls on objects creates patterns of light and shadow (shading) that provide crucial cues about their three-dimensional shape, surface curvature, and relative position to light sources and other objects. <ref name="Palmer1999"/> <ref name="RamachandranShading1988">Ramachandran, Vilayanur S. (1988). Perception of shape from shading. *Nature, 331*(6152), 163-166.</ref> Assumptions, such as light typically coming from above, help interpret these cues. Shadows cast by one object onto another also indicate relative position. <ref name="HITLCues1"/>

==== [[Relative Clarity]] ====

Objects that appear clearer, sharper, and more detailed are often perceived as being closer than objects that appear hazier or less distinct (related to atmospheric perspective but can also apply at shorter distances due to factors like fog or focus). <ref name="FryFog1976">Fry, Glenn A., ~~et al~~. "The effect of fog on the perception of distance." Human Factors 18.4 ~~(1976~~): 342-346.</ref>

Objects that appear clearer, sharper, and more detailed are often perceived as being closer than objects that appear hazier or less distinct (related to atmospheric perspective but can also apply at shorter distances due to factors like fog or focus). <ref name="FryFog1976">Fry, Glenn A., Kerr, K. E., Trezona, P. W., & Westerberg, C. F. (1976). The effect of fog on the perception of distance. *Human Factors, 18*(4), 342-346.</ref>

=== Dynamic Monocular Cues ===

Line 61:

==== [[Motion Parallax]] ====

As an observer moves their head or body, objects at different distances move at different apparent speeds across the visual field. Closer objects appear to move faster and in the opposite direction relative to the observer's movement compared to more distant objects, which appear to move slower and potentially in the same direction. <ref name="Gibson1950"/> <ref name="RogersMotionParallax1979">Rogers, Brian, and Maureen Graham. "Motion parallax as an independent cue for depth perception." Perception 8.2 ~~(1979~~): 125-134.</ref> For example, when looking out the side window of a moving car, nearby posts zip by while distant trees move slowly. This is a powerful depth cue, effectively utilized in VR/AR systems through [[head tracking]]. <ref name="HITLCues1"/> <ref name="ScienceLearnParallax">Depth perception. Science Learning Hub – Pokapū Akoranga Pūtaiao. Retrieved April 25, 2025, from https://www.sciencelearn.org.nz/resources/107-depth-perception</ref>

As an observer moves their head or body, objects at different distances move at different apparent speeds across the visual field. Closer objects appear to move faster and in the opposite direction relative to the observer's movement compared to more distant objects, which appear to move slower and potentially in the same direction. <ref name="Gibson1950"/> <ref name="RogersMotionParallax1979">Rogers, Brian, and Maureen Graham. (1979). Motion parallax as an independent cue for depth perception. *Perception, 8*(2), 125-134.</ref> For example, when looking out the side window of a moving car, nearby posts zip by while distant trees move slowly. This is a powerful depth cue, effectively utilized in VR/AR systems through [[head tracking]]. <ref name="HITLCues1"/> <ref name="ScienceLearnParallax">Depth perception. Science Learning Hub – Pokapū Akoranga Pūtaiao. Retrieved April 25, 2025, from https://www.sciencelearn.org.nz/resources/107-depth-perception</ref>

==== [[Kinetic Depth Effect]] ====

When a rigid, unfamiliar object rotates, the resulting changes in its two-dimensional projection onto the retina provide information about its three-dimensional structure. <ref name="WallachOConnell1953">Wallach, H., & O'Connell, D. N. (1953). The kinetic depth effect. *Journal of Experimental Psychology, 45*(4), 205–217. doi:10.1037/h0058000</ref>

~~==== Relative Motion (Motion Perspective) ====~~

Related to motion parallax, this refers specifically to the observation that objects farther away appear to move more slowly across the visual field compared to closer objects when the observer is in motion. <ref name="NawrotMotion2003">Nawrot, Mark. "Eye movements provide the extra-retinal signal required for the perception of depth from motion parallax." Vision research 43.14 (2003): 1553-1562.</ref>

==== [[Ocular Parallax]] ====

A subtle cue resulting from the slight shift in perspective that occurs when the eye rotates around its center within the eye socket (distinct from head movement). Objects at different depths shift relative to the retina in slightly different ways during eye rotation, providing potential depth information. <ref name="KudoOcularParallax1988">Kudo, Hiromi, and Hirohiko Ono. "Depth perception, ocular parallax, and stereopsis." Perception 17.4 ~~(1988~~): 473-480.</ref>

A subtle cue resulting from the slight shift in perspective that occurs when the eye rotates around its center within the eye socket (distinct from head movement). Objects at different depths shift relative to the retina in slightly different ways during eye rotation, providing potential depth information. <ref name="KudoOcularParallax1988">Kudo, Hiromi, and Hirohiko Ono. (1988). Depth perception, ocular parallax, and stereopsis. *Perception, 17*(4), 473-480.</ref>

== Depth Cues in VR and AR ==

Line 86:

Line 83:

==== The [[Vergence-Accommodation Conflict]] (VAC) ====

A major limitation in most current VR/AR displays is the mismatch between vergence and accommodation cues. Most headsets use [[fixed-focus display]]s, meaning the optics present the virtual image at a fixed focal distance (often 1.5-2 meters or optical infinity), regardless of the simulated distance of the virtual object. <ref name="ARInsiderVAC">(2024-01-29) Understanding Vergence-Accommodation Conflict in AR/VR Headsets - AR Insider. Retrieved April 25, 2025, from https://arinsider.co/2024/01/29/understanding-vergence-accommodation-conflict-in-ar-vr-headsets/</ref> <ref name="WikiVAC">Vergence-accommodation conflict - Wikipedia. Retrieved April 25, 2025, from https://en.wikipedia.org/wiki/Vergence-accommodation_conflict</ref> <ref name="DeliverContactsFocus">(2024-07-18) Exploring the Focal Distance in VR Headsets - Deliver Contacts. Retrieved April 25, 2025, from https://delivercontacts.com/blog/exploring-the-focal-distance-in-vr-headsets</ref> While the user's eyes converge appropriately for the virtual object's simulated distance (e.g., 0.5 meters), their eyes must maintain focus (accommodate) at the fixed optical distance of the display itself to keep the image sharp. This mismatch between the distance signaled by vergence and the distance signaled by accommodation is known as the '''[[vergence-accommodation conflict]]''' (VAC). <ref name="HoffmanVAC2008">Hoffman, D. M., Girshick, A. R., Akeley, K., & Banks, M. S. (2008). Vergence–accommodation conflicts hinder visual performance and cause visual fatigue. *Journal of Vision, 8*(3), 33. doi:10.1167/8.3.33</ref> <ref name="FacebookVAC2019">Facebook Research. (2019, March 28). *Vergence-Accommodation Conflict: Facebook Research Explains Why Varifocal Matters For Future VR*. YouTube. [https://www.youtube.com/watch?v=YWA4gVibKJE]</ref> <ref name="KramidaVAC2016">Kramida, Gregory. "Resolving the vergence-accommodation conflict in head-mounted displays." IEEE transactions on visualization and computer graphics 22.7 ~~(2016~~): 1912-1931.</ref>

A major limitation in most current VR/AR displays is the mismatch between vergence and accommodation cues. Most headsets use [[fixed-focus display]]s, meaning the optics present the virtual image at a fixed focal distance (often 1.5-2 meters or optical infinity), regardless of the simulated distance of the virtual object. <ref name="ARInsiderVAC">(2024-01-29) Understanding Vergence-Accommodation Conflict in AR/VR Headsets - AR Insider. Retrieved April 25, 2025, from https://arinsider.co/2024/01/29/understanding-vergence-accommodation-conflict-in-ar-vr-headsets/</ref> <ref name="WikiVAC">Vergence-accommodation conflict - Wikipedia. Retrieved April 25, 2025, from https://en.wikipedia.org/wiki/Vergence-accommodation_conflict</ref> <ref name="DeliverContactsFocus">(2024-07-18) Exploring the Focal Distance in VR Headsets - Deliver Contacts. Retrieved April 25, 2025, from https://delivercontacts.com/blog/exploring-the-focal-distance-in-vr-headsets</ref> While the user's eyes converge appropriately for the virtual object's simulated distance (e.g., 0.5 meters), their eyes must maintain focus (accommodate) at the fixed optical distance of the display itself to keep the image sharp. This mismatch between the distance signaled by vergence and the distance signaled by accommodation is known as the '''[[vergence-accommodation conflict]]''' (VAC). <ref name="HoffmanVAC2008">Hoffman, D. M., Girshick, A. R., Akeley, K., & Banks, M. S. (2008). Vergence–accommodation conflicts hinder visual performance and cause visual fatigue. *Journal of Vision, 8*(3), 33. doi:10.1167/8.3.33</ref> <ref name="FacebookVAC2019">Facebook Research. (2019, March 28). *Vergence-Accommodation Conflict: Facebook Research Explains Why Varifocal Matters For Future VR*. YouTube. [https://www.youtube.com/watch?v=YWA4gVibKJE]</ref> <ref name="KramidaVAC2016">Kramida, Gregory. (2016). Resolving the vergence-accommodation conflict in head-mounted displays. *IEEE transactions on visualization and computer graphics, 22*(7), 1912-1931.</ref>

The VAC forces the brain to deal with conflicting depth information, potentially leading to several issues:

* Visual fatigue, discomfort, and eye strain <ref name="HoffmanVAC2008"/> <ref name="ARInsiderVAC"/>

* Headaches or [[simulator sickness]] symptoms (nausea, disorientation) <ref name="ARInsiderVAC"/> <ref name="VosVAC2005">Vos, G. A., Barfield, W., & Yamamoto, T. "The Virtual Vertical: Depth Perception and Discomfort in Stereoscopic Displays." Presence: Teleoperators & Virtual Environments 14.6 ~~(2005~~): 649-664.</ref>

* Headaches or [[simulator sickness]] symptoms (nausea, disorientation) <ref name="ARInsiderVAC"/> <ref name="VosVAC2005">Vos, G. A., Barfield, W., & Yamamoto, T. (2005). The Virtual Vertical: Depth Perception and Discomfort in Stereoscopic Displays. *Presence: Teleoperators & Virtual Environments, 14*(6), 649-664.</ref>

* Difficulty fusing stereoscopic images

* Inaccurate depth and size perception, particularly for near-field objects (within arm's reach) <ref name="JonesVAC2008">Jones, J. A., ~~et al~~. "The effects of virtual reality, augmented reality, and motion parallax on egocentric depth perception." Proceedings of the 5th symposium on Applied perception in graphics and visualization~~. 2008~~.</ref>

* Inaccurate depth and size perception, particularly for near-field objects (within arm's reach) <ref name="JonesVAC2008">Jones, J. A., Swan II, J. E., Singh, G., & Ellis, S. R. (2008). The effects of virtual reality, augmented reality, and motion parallax on egocentric depth perception. *Proceedings of the 5th symposium on Applied perception in graphics and visualization*, 9-16.</ref>

* Reduced realism and immersion

Line 105:

Line 102:

To mitigate or eliminate the VAC and provide more accurate depth cues, researchers and companies are actively developing advanced display technologies:

* '''[[Varifocal Displays]]''': These displays dynamically adjust the focal distance of the display optics (e.g., using physically moving lenses/screens, [[liquid lens]] technology, or [[deformable mirror]] devices) to match the simulated distance of the object the user is currently looking at. <ref name="KonradVAC2016">Konrad, R., Cooper, E. A., & Banks, M. S. (2016). Towards the next generation of virtual and augmented reality displays. *Optics Express, 24*(15), 16800-16809. doi:10.1364/OE.24.016800</ref> <ref name="DunnVarifocal2017">Dunn, David, et al. "Wide field of view varifocal near-eye display using see-through deformable membrane mirrors." IEEE transactions on visualization and computer graphics 23.4 ~~(2017~~): 1322-1331.</ref> This typically requires fast and accurate [[eye tracking]] to determine the user's point of gaze and intended focus depth. Varifocal systems often simulate [[Depth of Field]] effects computationally, blurring parts of the scene not at the current focal distance. <ref name="ARInsiderVAC"/> Prototypes like Meta Reality Labs' "Half Dome" series have demonstrated this approach. <ref name="ARInsiderVAC"/>

* '''[[Varifocal Displays]]''': These displays dynamically adjust the focal distance of the display optics (e.g., using physically moving lenses/screens, [[liquid lens]] technology, or [[deformable mirror]] devices) to match the simulated distance of the object the user is currently looking at. <ref name="KonradVAC2016">Konrad, R., Cooper, E. A., & Banks, M. S. (2016). Towards the next generation of virtual and augmented reality displays. *Optics Express, 24*(15), 16800-16809. doi:10.1364/OE.24.016800</ref> <ref name="DunnVarifocal2017">Dunn, David, et al. (2017). Wide field of view varifocal near-eye display using see-through deformable membrane mirrors. *IEEE transactions on visualization and computer graphics, 23*(4), 1322-1331.</ref> This typically requires fast and accurate [[eye tracking]] to determine the user's point of gaze and intended focus depth. Varifocal systems often simulate [[Depth of Field]] effects computationally, blurring parts of the scene not at the current focal distance. <ref name="ARInsiderVAC"/> Prototypes like Meta Reality Labs' "Half Dome" series have demonstrated this approach. <ref name="ARInsiderVAC"/>

* '''[[Multifocal Displays]] (Multi-Plane Displays)''': Instead of a single, continuously adjusting focus, these displays present content on multiple discrete focal planes simultaneously or in rapid succession. <ref name="AkeleyMultifocal2004">Akeley, Kurt, ~~et al~~. "A stereo display prototype with multiple focal distances." ACM transactions on graphics (TOG) 23.3 ~~(2004~~): 804-813.</ref> The visual system can then accommodate to the plane closest to the target object's depth. Examples include stacked display panels or systems using switchable lenses. Magic Leap 1 used a two-plane system. <ref name="ARInsiderVAC"/> While reducing VAC, they can still exhibit quantization effects if an object lies between planes, and complexity increases with the number of planes.

* '''[[Multifocal Displays]] (Multi-Plane Displays)''': Instead of a single, continuously adjusting focus, these displays present content on multiple discrete focal planes simultaneously or in rapid succession. <ref name="AkeleyMultifocal2004">Akeley, Kurt, Watt, S. J., Girshick, A. R., & Banks, M. S. (2004). A stereo display prototype with multiple focal distances. *ACM transactions on graphics (TOG), 23*(3), 804-813.</ref> The visual system can then accommodate to the plane closest to the target object's depth. Examples include stacked display panels or systems using switchable lenses. Magic Leap 1 used a two-plane system. <ref name="ARInsiderVAC"/> While reducing VAC, they can still exhibit quantization effects if an object lies between planes, and complexity increases with the number of planes.

* '''[[Light Field Displays]]''': These displays aim to reconstruct the [[light field]] of a scene – the distribution of light rays in space – more completely. By emitting rays with the correct origin and direction, they allow the viewer's eye to naturally focus at different depths within the virtual scene, as if viewing a real 3D environment. <ref name="WetzsteinLightField2011">Wetzstein, Gordon, et al. "Computational plenoptic imaging." Computer Graphics Forum~~. Vol.~~ 30~~. No.~~ 8~~. Oxford~~, ~~UK: Blackwell Publishing Ltd, 2011~~.</ref> <ref name="Lanman2013">Lanman, D., & Luebke, D. (2013). Near-eye light field displays. *ACM Transactions on Graphics (TOG), 32*(6), 1-10. doi:10.1145/2508363.2508366</ref> This can potentially solve the VAC without requiring eye tracking. However, generating the necessary dense light fields poses significant computational and hardware challenges, often involving trade-offs between resolution, field of view, and form factor. <ref name="ARInsiderVAC"/> Companies like CREAL are developing light field modules for AR/VR. <ref name="WikiVAC"/>

* '''[[Light Field Displays]]''': These displays aim to reconstruct the [[light field]] of a scene – the distribution of light rays in space – more completely. By emitting rays with the correct origin and direction, they allow the viewer's eye to naturally focus at different depths within the virtual scene, as if viewing a real 3D environment. <ref name="WetzsteinLightField2011">Wetzstein, Gordon, et al. (2011). Computational plenoptic imaging. *Computer Graphics Forum, 30*(8), 2397-2426.</ref> <ref name="Lanman2013">Lanman, D., & Luebke, D. (2013). Near-eye light field displays. *ACM Transactions on Graphics (TOG), 32*(6), 1-10. doi:10.1145/2508363.2508366</ref> This can potentially solve the VAC without requiring eye tracking. However, generating the necessary dense light fields poses significant computational and hardware challenges, often involving trade-offs between resolution, field of view, and form factor. <ref name="ARInsiderVAC"/> Companies like CREAL are developing light field modules for AR/VR. <ref name="WikiVAC"/>

* '''[[Holographic Displays]]''': True [[holography|holographic]] displays aim to reconstruct the wavefront of light from the virtual scene using diffraction, which would inherently provide all depth cues, including accommodation, correctly and continuously. <ref name="MaimoneHolo2017">Maimone, A., Georgiou, A., & Kollin, J. S. (2017). Holographic near-eye displays for virtual and augmented reality. *ACM Transactions on Graphics (TOG), 36*(4), 1-16. doi:10.1145/3072959.3073610</ref> This is often considered an ultimate goal for visual displays. However, current implementations suitable for near-eye displays face major challenges in computational load, achievable [[field of view]], image quality (e.g., [[speckle noise]]), and component size. <ref name="MaimoneHolo2017"/> <ref name="ARInsiderVAC"/>

Line 118:

Line 115:

In AR, correctly rendering depth cues is arguably even more critical and complex than in VR because virtual objects must appear convincingly integrated with the real-world environment, which already provides a rich and consistent set of depth cues. Key challenges include:

* **Occlusion:** Virtual objects must realistically occlude real objects behind them, and be occluded by real objects in front of them. This requires accurate real-time 3D reconstruction of the surrounding environment, often using depth sensors and [[Simultaneous Localization and Mapping]] (SLAM) techniques. Without correct occlusion, virtual objects may appear as semi-transparent "ghosts" overlaid on reality. <ref name="~~HowardRogers2012" volume="3~~">Howard, I. P., & Rogers, B. J. (2012). *Perceiving in Depth, Volume 3: Other Mechanisms of Depth Perception*. Oxford University Press.</ref> <ref name="PubMedOcclusionAR">Kiyokawa, K., Billinghurst, M., Hayes, S. E., & Gupta, A. (2003). An occlusion-capable optical see-through head mount display for supporting co-located collaboration. *Proceedings. ISMAR 2003. Second IEEE and ACM International Symposium on Mixed and Augmented Reality*, 133-141. doi:10.1109/ISMAR.2003.1240688</ref>

* **Occlusion:** Virtual objects must realistically occlude real objects behind them, and be occluded by real objects in front of them. This requires accurate real-time 3D reconstruction of the surrounding environment, often using depth sensors and [[Simultaneous Localization and Mapping]] (SLAM) techniques. Without correct occlusion, virtual objects may appear as semi-transparent "ghosts" overlaid on reality. <ref name="HowardRogers2012Vol3">Howard, I. P., & Rogers, B. J. (2012). *Perceiving in Depth, Volume 3: Other Mechanisms of Depth Perception*. Oxford University Press.</ref> <ref name="PubMedOcclusionAR">Kiyokawa, K., Billinghurst, M., Hayes, S. E., & Gupta, A. (2003). An occlusion-capable optical see-through head mount display for supporting co-located collaboration. *Proceedings. ISMAR 2003. Second IEEE and ACM International Symposium on Mixed and Augmented Reality*, 133-141. doi:10.1109/ISMAR.2003.1240688</ref>

* **Lighting and Shadows:** Virtual objects should be lit consistently with real-world lighting conditions and cast plausible shadows onto real surfaces (and receive shadows from real objects) to appear grounded in the environment. <ref name="~~HowardRogers2012" volume="3~~"/>

* **Lighting and Shadows:** Virtual objects should be lit consistently with real-world lighting conditions and cast plausible shadows onto real surfaces (and receive shadows from real objects) to appear grounded in the environment. <ref name="HowardRogers2012Vol3"/>

* **Perspective and Scale:** Virtual objects must be rendered with perspective and size that are consistent with their intended location within the real scene. <ref name="~~HowardRogers2012" volume="3~~"/>

* **Perspective and Scale:** Virtual objects must be rendered with perspective and size that are consistent with their intended location within the real scene. <ref name="HowardRogers2012Vol3"/>

* **Focus:** In optical see-through AR, the fixed focus of virtual objects often conflicts with the user's ability to focus naturally on real objects at different distances, leading to [[focal rivalry]] in addition to VAC. <ref name="ARInsiderVAC"/>

=== [[Ocular Parallax]], Eye-Tracking and Eye-Box Considerations ===

In the context of VR/AR optics, the term 'ocular parallax' is sometimes used differently from the monocular depth cue described earlier. It can refer to the apparent shift in the virtual image relative to the user's eye pupil as the eye moves within the viewing zone (the '[[eye-box]]') of the headset's optics. If not well-managed, this can cause the virtual world to appear unstable or "swim," impacting depth perception and comfort, especially in AR where alignment with the real world is critical. Accurate [[eye tracking]] can help systems compensate for these effects by adjusting the rendering based on precise eye position ("gaze-contingent rendering"). <ref name="ChangEyeTrack2020">Chang, Jen-Hao Rick, et al. "Toward a unified framework for hand-eye coordination in virtual reality." 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE~~, 2020~~.</ref>

In the context of VR/AR optics, the term 'ocular parallax' is sometimes used differently from the monocular depth cue described earlier. It can refer to the apparent shift in the virtual image relative to the user's eye pupil as the eye moves within the viewing zone (the '[[eye-box]]') of the headset's optics. If not well-managed, this can cause the virtual world to appear unstable or "swim," impacting depth perception and comfort, especially in AR where alignment with the real world is critical. Accurate [[eye tracking]] can help systems compensate for these effects by adjusting the rendering based on precise eye position ("gaze-contingent rendering"). <ref name="ChangEyeTrack2020">Chang, Jen-Hao Rick, et al. (2020). Toward a unified framework for hand-eye coordination in virtual reality. *2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)*. IEEE.</ref>

== Health and Comfort Implications ==

Line 130:

Line 127:

* **Visual Fatigue and Discomfort:** The [[vergence-accommodation conflict]] is a primary contributor to eye strain, headaches, blurred vision, and general visual discomfort, especially during prolonged use. <ref name="HoffmanVAC2008"/> <ref name="ARInsiderVAC"/>

* **Spatial Perception Errors:** Inaccurate or conflicting depth cues can lead to misjudgments of distance, size, and the spatial relationships between objects, potentially affecting user performance in tasks requiring precise spatial awareness or interaction. <ref name="JonesVAC2008"/> <ref name="WillemsenHMD2009">Willemsen, Peter, ~~et al~~. "The effects of head-mounted display mechanical properties and field of view on distance judgments in virtual environments." ACM Transactions on Applied Perception (TAP) 6.2 ~~(2009~~): 1-14.</ref>

* **Spatial Perception Errors:** Inaccurate or conflicting depth cues can lead to misjudgments of distance, size, and the spatial relationships between objects, potentially affecting user performance in tasks requiring precise spatial awareness or interaction. <ref name="JonesVAC2008"/> <ref name="WillemsenHMD2009">Willemsen, Peter, Colton, M. B., Creem-Regehr, S. H., & Thompson, W. B. (2009). The effects of head-mounted display mechanical properties and field of view on distance judgments in virtual environments. *ACM Transactions on Applied Perception (TAP), 6*(2), 1-14.</ref>

* **[[Simulator Sickness]]:** Inconsistencies between visual depth cues and other sensory information (e.g., vestibular signals from the inner ear) can contribute to symptoms like nausea, disorientation, and dizziness. <ref name="VosVAC2005"/> <ref name="WannAdaptation1995">Wann, John P., Simon Rushton, and Mark Mon-Williams. "Natural problems for stereoscopic depth perception in virtual environments." Vision research 35.19 ~~(1995~~): 2731-2736.</ref>

* **[[Simulator Sickness]]:** Inconsistencies between visual depth cues and other sensory information (e.g., vestibular signals from the inner ear) can contribute to symptoms like nausea, disorientation, and dizziness. <ref name="VosVAC2005"/> <ref name="WannAdaptation1995">Wann, John P., Simon Rushton, and Mark Mon-Williams. (1995). Natural problems for stereoscopic depth perception in virtual environments. *Vision research, 35*(19), 2731-2736.</ref>

== Design Considerations for VR/AR Developers ==

Line 138:

Line 135:

* **Leverage Multiple Cues:** Rely on a combination of available cues (stereo, motion parallax, strong pictorial cues) to create a robust sense of depth. Enhance monocular cues like shadows, perspective, and texture gradients to compensate for limitations in physiological cues. <ref name="CuttingVishton1995"/>

* **Manage VAC Impact:**

* **Comfort Zones:** Place critical interactive content primarily within the zone of comfortable viewing (often suggested as roughly 0.75-3.5 meters in VR) where VAC effects may be less severe for many users. <ref name="ShibataComfortZone2011">Shibata, Takashi, ~~et al~~. "The zone of comfort: Predicting visual discomfort with stereo displays." Journal of vision 11.8 ~~(2011~~): 11-11.</ref> Avoid sustained focus on very near objects (< 0.5m).

* **Comfort Zones:** Place critical interactive content primarily within the zone of comfortable viewing (often suggested as roughly 0.75-3.5 meters in VR) where VAC effects may be less severe for many users. <ref name="ShibataComfortZone2011">Shibata, Takashi, Kim, J., Hoffman, D. M., & Banks, M. S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. *Journal of vision, 11*(8), 11-11.</ref> Avoid sustained focus on very near objects (< 0.5m).

* **Depth Budget:** Limit the overall range of depths presented simultaneously or avoid rapid, large shifts in depth between near and far objects that force quick vergence changes against a fixed accommodation state.

* **Guide Attention:** Use composition, lighting, and visual design to guide the user's focal attention appropriately within the scene.

* **Simulated Depth of Field:** Strategically apply computationally rendered blur (simulated [[Depth of Field]]) based on estimated user focus or salient objects to help guide accommodation, mask focus limitations, or enhance realism. <ref name="DuchowskiDoF2014">Duchowski, Andrew T., et al. "Reducing visual discomfort with HMDs using dynamic depth of field." IEEE computer graphics and applications 34.5 ~~(2014~~): 34-41.</ref>

* **Simulated Depth of Field:** Strategically apply computationally rendered blur (simulated [[Depth of Field]]) based on estimated user focus or salient objects to help guide accommodation, mask focus limitations, or enhance realism. <ref name="DuchowskiDoF2014">Duchowski, Andrew T., et al. (2014). Reducing visual discomfort with HMDs using dynamic depth of field. *IEEE computer graphics and applications, 34*(5), 34-41.</ref>

* **Consider Interaction Distance:** Be aware that applications requiring precise manipulation or inspection of virtual objects at close range are most susceptible to VAC issues and benefit most from advanced display technologies that address it.

Line 149:

Line 146:

* **Perceptual Adaptation:** Studying how users adapt to inconsistent or unnatural depth cues over time, potentially leading to training paradigms or design strategies that improve comfort on current hardware. <ref name="WannAdaptation1995"/>

* **Personalized Depth Rendering:** Calibrating depth cue presentation based on individual user characteristics (e.g., IPD, visual acuity, refractive error, sensitivity to VAC) for optimized comfort and performance. <ref name="WillemsenHMD2009"/>

* **[[Cross-modal interaction|Cross-Modal Integration]]:** Investigating how integrating depth information from other senses (e.g., [[spatial audio]], [[haptic feedback]]) can enhance or reinforce visual depth perception. <ref name="ErnstCrossModal2002">Ernst, Marc O., and Martin S. Banks. "Humans integrate visual and haptic information in a statistically optimal fashion." Nature 415.6870 ~~(2002~~): 429-433.</ref>

* **[[Cross-modal interaction|Cross-Modal Integration]]:** Investigating how integrating depth information from other senses (e.g., [[spatial audio]], [[haptic feedback]]) can enhance or reinforce visual depth perception. <ref name="ErnstCrossModal2002">Ernst, Marc O., and Martin S. Banks. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. *Nature, 415*(6870), 429-433.</ref>

* **[[Neural rendering|Neural Rendering]] and AI:** Utilizing machine learning techniques (e.g., [[Neural Radiance Fields]] (NeRF)) to potentially render complex scenes with perceptually accurate depth cues more efficiently by learning implicit scene representations. <ref name="MildenhallNeRF2020">Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." European conference on computer vision. Springer, Cham~~, 2020~~.</ref>

* **[[Neural rendering|Neural Rendering]] and AI:** Utilizing machine learning techniques (e.g., [[Neural Radiance Fields]] (NeRF)) to potentially render complex scenes with perceptually accurate depth cues more efficiently by learning implicit scene representations. <ref name="MildenhallNeRF2020">Mildenhall, Ben, et al. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. *European conference on computer vision*. Springer, Cham.</ref>

== Conclusion ==

Depth cues are fundamental to human visual perception and represent both a cornerstone and a significant challenge for virtual and augmented reality systems. While current technology effectively simulates many cues like binocular disparity, motion parallax, and various pictorial cues, the inability of most displays to correctly reproduce the physiological cue of accommodation leads to the vergence-accommodation conflict, impacting user comfort, performance, and the overall realism of immersive experiences. Ongoing research and the development of advanced display technologies like varifocal, multifocal, light field, and holographic systems promise to overcome these limitations, paving the way for VR and AR experiences with more natural and complete depth perception. A thorough understanding of the interplay and limitations of depth cues remains essential for researchers and developers pushing the boundaries of immersive technologies.

==References==

Line 183:

<ref name="VosVAC2005">Vos, G. A., Barfield, W., & Yamamoto, T. (2005). The Virtual Vertical: Depth Perception and Discomfort in Stereoscopic Displays. *Presence: Teleoperators & Virtual Environments, 14*(6), 649-664.</ref>

<ref name="JonesVAC2008">Jones, J. A., Swan II, J. E., Singh, G., & Ellis, S. R. (2008). The effects of virtual reality, augmented reality, and motion parallax on egocentric depth perception. *Proceedings of the 5th symposium on Applied perception in graphics and visualization*, 9-16.</ref>

<ref name="WillemsenHMD2009">Willemsen, Peter, Colton, M. B., Creem-Regehr, S. H., & Thompson, W. B. (2009). The effects of head-mounted display mechanical properties and field of view on distance judgments in virtual environments. *ACM Transactions on Applied Perception (TAP), 6*(2), 1-14.</ref>

<ref name="WannAdaptation1995">Wann, John P., Simon Rushton, and Mark Mon-Williams. (1995). Natural problems for stereoscopic depth perception in virtual environments. *Vision research, 35*(19), 2731-2736.</ref>

<ref name="KonradVAC2016">Konrad, R., Cooper, E. A., & Banks, M. S. (2016). Towards the next generation of virtual and augmented reality displays. *Optics Express, 24*(15), 16800-16809. doi:10.1364/OE.24.016800</ref>

<ref name="DunnVarifocal2017">Dunn, David, et al. (2017). Wide field of view varifocal near-eye display using see-through deformable membrane mirrors. *IEEE transactions on visualization and computer graphics, 23*(4), 1322-1331.</ref>

Line 191:

Line 189:

<ref name="Lanman2013">Lanman, D., & Luebke, D. (2013). Near-eye light field displays. *ACM Transactions on Graphics (TOG), 32*(6), 1-10. doi:10.1145/2508363.2508366</ref>

<ref name="MaimoneHolo2017">Maimone, A., Georgiou, A., & Kollin, J. S. (2017). Holographic near-eye displays for virtual and augmented reality. *ACM Transactions on Graphics (TOG), 36*(4), 1-16. doi:10.1145/3072959.3073610</ref>

<ref name="~~HowardRogers2012" volume="3~~">Howard, I. P., & Rogers, B. J. (2012). *Perceiving in Depth, Volume 3: Other Mechanisms of Depth Perception*. Oxford University Press.</ref>

<ref name="HowardRogers2012Vol3">Howard, I. P., & Rogers, B. J. (2012). *Perceiving in Depth, Volume 3: Other Mechanisms of Depth Perception*. Oxford University Press.</ref>

<ref name="PubMedOcclusionAR">Kiyokawa, K., Billinghurst, M., Hayes, S. E., & Gupta, A. (2003). An occlusion-capable optical see-through head mount display for supporting co-located collaboration. *Proceedings. ISMAR 2003. Second IEEE and ACM International Symposium on Mixed and Augmented Reality*, 133-141. doi:10.1109/ISMAR.2003.1240688</ref>

<ref name="ChangEyeTrack2020">Chang, Jen-Hao Rick, et al. (2020). Toward a unified framework for hand-eye coordination in virtual reality. *2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)*. IEEE, ~~2020~~.</ref>

<ref name="ChangEyeTrack2020">Chang, Jen-Hao Rick, et al. (2020). Toward a unified framework for hand-eye coordination in virtual reality. *2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)*. IEEE.</ref>

<ref name="WillemsenHMD2009">Willemsen, Peter, Colton, M. B., Creem-Regehr, S. H., & Thompson, W. B. (2009). The effects of head-mounted display mechanical properties and field of view on distance judgments in virtual environments. *ACM Transactions on Applied Perception (TAP), 6*(2), 1-14.</ref>

<ref name="WannAdaptation1995">Wann, John P., Simon Rushton, and Mark Mon-Williams. (1995). Natural problems for stereoscopic depth perception in virtual environments. *Vision research, 35*(19), 2731-2736.</ref>

<ref name="ShibataComfortZone2011">Shibata, Takashi, Kim, J., Hoffman, D. M., & Banks, M. S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. *Journal of vision, 11*(8), 11-11.</ref>

<ref name="DuchowskiDoF2014">Duchowski, Andrew T., et al. (2014). Reducing visual discomfort with HMDs using dynamic depth of field. *IEEE computer graphics and applications, 34*(5), 34-41.</ref>