Jump to content

Stereopsis

From VR & AR Wiki

Stereopsis is the perception of depth that the visual system computes from binocular disparity, the small difference between the two retinal images that arises because the eyes are separated horizontally on the face.[1] Because each eye views a scene from a slightly different vantage point, the same object projects to slightly different positions on the two retinas; the brain measures these positional differences and converts them into a sensation of relative distance.[1][2] Stereopsis cannot be obtained from one eye alone, which distinguishes it from the monocular depth cues such as perspective, occlusion and motion parallax.[1]

Stereopsis is the perceptual basis of all stereoscopic three-dimensional media, including the stereoscope, 3D cinema and the head-mounted display used in virtual reality (VR) and augmented reality (AR). A VR or AR headset presents a separate image to each eye so that the artificial binocular disparity drives stereopsis and the virtual scene appears to have depth.[3] The same physiology produces the vergence-accommodation conflict, a known source of visual fatigue in headsets, because the eyes converge to the simulated distance of an object while their lenses stay focused on the fixed display surface.[4]

Binocular disparity and the geometry of depth

The two human eyes are separated by the interpupillary distance (IPD), on average about 63 mm in adults, with individual values ranging from roughly 50 mm to 75 mm.[5] This lateral separation means the left and right eyes receive two slightly different views of the same scene. The difference between corresponding points in the two retinal images is the binocular disparity, and the magnitude and sign of that disparity vary with how far an object is from the point the eyes are fixating.[1][2]

When both eyes fixate a point, that point and a curved surface of points around it project to corresponding (geometrically matching) locations on the two retinas and carry zero disparity. This surface is called the horopter. Points nearer than the horopter produce crossed disparity and points farther than it produce uncrossed disparity, and the sign tells the brain whether an object is nearer or farther than the fixation distance.[1][2] Around the horopter lies a narrow region, Panum's fusional area, within which the two retinal images can be fused into a single percept; objects whose disparity exceeds this range are seen as double (physiological diplopia).[1]

The relationship between disparity and distance is the same triangulation used in computer stereo vision: for a given eye separation, the depth recovered from a measured disparity follows approximately Z = b f / d, where Z is distance, b is the baseline between the two viewpoints, f is the focal length and d is the disparity.[3] Because disparity falls off with the square of distance, stereopsis is most precise for nearby objects and contributes little beyond several metres, where monocular cues such as relative size and aerial perspective dominate.[6]

Stereopsis is one of two binocular depth cues. The other is vergence, the inward or outward rotation of the eyes to fixate objects at different distances; vergence provides usable depth information out to roughly 10 m, while the monocular cue of accommodation (lens focusing) is effective only within about 2 m.[6] Vergence and stereoscopic disparity are linked: when the eyes verge on an object they bring its images onto corresponding retinal points, setting the reference against which disparities of other objects are measured.[1]

Stereoacuity

The finest depth difference a person can detect through stereopsis is called stereoacuity and is expressed as an angular disparity in seconds of arc (arcsec). Under optimal laboratory conditions the best human observers resolve disparities of only a few arcseconds; on standard clinical tests the best obtainable scores are about 5 arcsec on the revised FD2 test and 20 arcsec on the Frisby test, and the median for healthy younger adults is around 10 arcsec on the FD2 and 20 arcsec on the Frisby.[7] Clinical screening instruments such as the Titmus stereo test use much larger disparities, from about 3,600 arcsec for the gross "fly" target down to 40 arcsec for the finest ring patterns.[1]

Not everyone has functional stereopsis. People whose two eyes never developed coordinated binocular vision, often because of childhood strabismus or amblyopia, may be stereoblind or stereo-impaired. Prevalence estimates vary with the test and the population, because stereopsis is not a single measurable quantity: a 2019 best-evidence synthesis found that four different analytical approaches all converged on a stereoblindness prevalence of about 7 percent in adults under 60.[8] Stereoblind users can still perceive depth in VR and AR through monocular cues and motion parallax, but they do not gain the additional depth precision that stereopsis provides.[6]

History

The dependence of depth perception on the two eyes' differing views was established by the English scientist Charles Wheatstone, who described stereoscopic vision in a paper read to the Royal Society on 21 June 1838 and built the first stereoscope, a device using angled mirrors to present a separate drawing to each eye.[9][10] Viewing two flat drawings of a cube taken from slightly offset positions, Wheatstone saw a single solid form, demonstrating that horizontal disparity alone is sufficient to evoke depth.[9] Stereoscopes became a popular form of home entertainment in the second half of the 19th century once photography made stereo image pairs cheap to reproduce.[9]

A second turning point came in 1960, when the Hungarian-born engineer and psychologist Bela Julesz, working at Bell Telephone Laboratories, introduced the random-dot stereogram. In "Binocular Depth Perception of Computer-Generated Patterns" (Bell System Technical Journal, vol. 39, no. 5, pp. 1125-1162, September 1960), Julesz generated pairs of fields of random dots that were identical except that a central region was shifted horizontally in one image. Each field looked like meaningless noise to either eye alone, but when the pair was viewed through a stereoscope the shifted region floated in depth.[11] Because the depth shape was invisible to either eye on its own and only emerged when the brain combined the two images, Julesz called the percept "cyclopean," after the one-eyed Cyclops of Greek myth, and developed the idea in his 1971 book Foundations of Cyclopean Perception.[12] The experiment showed that stereopsis can be computed purely from disparity, without any recognisable objects, perspective or other monocular cue, and that depth processing can precede the recognition of form.[11][12]

Role in virtual and augmented reality

Stereoscopic displays exploit stereopsis by feeding each eye a distinct image. In a head-mounted display the left and right eyes look at separate views (or separate halves of one panel) through their own lenses, so that the difference between the two images is interpreted by the visual system as binocular disparity and the scene gains apparent depth.[3] Rendering such content is called stereoscopic rendering: the engine places two virtual cameras in the scene, separated by a baseline that should match the user's IPD, and renders the entire scene twice, once for each eye.[3][13] Setting the virtual camera separation equal to the viewer's real IPD yields a one-to-one scale world; a mismatch between the rendered separation and the user's actual IPD distorts perceived scale and depth and can add to visual discomfort.[3][5] For this reason many headsets allow physical or software IPD adjustment.[5]

Presenting separate per-eye images is not the same as reproducing natural binocular viewing, and the difference is the vergence-accommodation conflict (VAC). In the real world the distance the eyes converge to and the distance they focus to are always the same. In a conventional headset the imagery can simulate any vergence distance through disparity, but the light physically comes from a display at a fixed optical distance, so accommodation stays fixed while vergence changes.[4][14] Hoffman, Girshick, Akeley and Banks measured the effects of this conflict in a 2008 Journal of Vision study using a multi-plane bench display that could present correct or incorrect focus cues. When focus cues matched the simulated depth, observers fused stereoscopic images faster, achieved higher stereoacuity in time-limited tasks, showed smaller distortions of perceived depth, and reported less fatigue and discomfort.[4] The conflict is most pronounced for virtual objects close to the viewer, where the required change in focus is largest.[4][14]

Several display approaches aim to relieve the conflict while preserving stereopsis. Varifocal displays use eye tracking to estimate where the user is looking and mechanically or optically shift the focal distance to match the vergence distance; Meta's Half Dome research prototypes demonstrated this, moving from motorised displays in the 2018 Half Dome to a solid-state stack of switchable liquid-crystal lenses in Half Dome 3.[15][16][14] Other proposed solutions include multifocal, holographic and light-field displays, which attempt to reproduce focus cues more directly.[14] As of 2026 these remain largely research or prototype technologies, and most shipping consumer headsets still use a single fixed focal plane and therefore exhibit the conflict.[14]

Stereopsis is one cue among several that a headset uses to create presence and a convincing sense of depth. Motion parallax from head tracking, occlusion, relative size and perspective all contribute, and these monocular cues allow even stereoblind users to navigate virtual environments.[6] The distinctive contribution of stereopsis is fine relative-depth judgement of nearby objects, which is why it matters most for close interaction such as reaching for or manipulating virtual objects with the hands.[4][6]

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Howard, I.P. (revised in Webvision). The Perception of Depth. Webvision: The Organization of the Retina and Visual System, NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK11512/
  2. 2.0 2.1 2.2 Wikipedia. Binocular disparity. https://en.wikipedia.org/wiki/Binocular_disparity
  3. 3.0 3.1 3.2 3.3 3.4 Arm Limited. Introduction to Stereo Rendering. VR SDK for Android documentation. https://arm-software.github.io/vr-sdk-for-android/IntroductionToStereoRendering.html
  4. 4.0 4.1 4.2 4.3 4.4 Hoffman, D.M., Girshick, A.R., Akeley, K. and Banks, M.S. (2008). Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. Journal of Vision, 8(3):33, pp. 1-30. doi:10.1167/8.3.33. https://jov.arvojournals.org/article.aspx?articleid=2122611
  5. 5.0 5.1 5.2 VR & AR Wiki. Interpupillary distance. https://vrarwiki.com/wiki/Interpupillary_distance
  6. 6.0 6.1 6.2 6.3 6.4 Wikipedia. Depth perception. https://en.wikipedia.org/wiki/Depth_perception
  7. Bohr, I. and Read, J.C.A. (2013). Stereoacuity with Frisby and Revised FD2 Stereo Tests. PLOS ONE, 8(12):e82999. doi:10.1371/journal.pone.0082999. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0082999
  8. Chopin, A., Bavelier, D. and Levi, D.M. (2019). The prevalence and diagnosis of "stereoblindness" in adults less than 60 years of age: a best evidence synthesis. Ophthalmic and Physiological Optics, 39(2), pp. 66-85. doi:10.1111/opo.12607. https://pubmed.ncbi.nlm.nih.gov/30776852/
  9. 9.0 9.1 9.2 The Royal Society (2018). 180 years of 3D. https://royalsociety.org/blog/2018/08/180-years-of-3d/
  10. Wade, N.J. (1987). On the late invention of the stereoscope. Perception, 16(6), pp. 785-818. https://pubmed.ncbi.nlm.nih.gov/3331425/
  11. 11.0 11.1 Julesz, B. (1960). Binocular Depth Perception of Computer-Generated Patterns. Bell System Technical Journal, 39(5), pp. 1125-1162. doi:10.1002/j.1538-7305.1960.tb03954.x. https://archive.org/details/bstj39-5-1125
  12. 12.0 12.1 Papathomas, T.V. (2004). Choices: The Science of Bela Julesz. PLOS Biology, 2(6):e172. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0020172
  13. Meta Platforms. Rendering to the Rift. Meta Horizon OS Developers documentation. https://developers.meta.com/horizon/documentation/native/pc/dg-render/
  14. 14.0 14.1 14.2 14.3 14.4 VR & AR Wiki. Vergence-accommodation conflict. https://vrarwiki.com/wiki/Vergence-accommodation_conflict
  15. Heaney, D. (2019). Facebook Explains Why It Engineered The Half Dome Varifocal VR Headset. UploadVR. https://www.uploadvr.com/display-week-half-dome-facebook/
  16. Meta (2019). Half Dome Updates: FRL Explores More Comfortable, Compact VR Prototypes for Work. Meta Quest Blog. https://www.meta.com/blog/half-dome-updates-frl-explores-more-comfortable-compact-vr-prototypes-for-work/