Near-eye display

A near-eye display (NED), sometimes called a near-to-eye display, is a display placed close to the eye that uses optics to form a magnified virtual image the eye can comfortably focus on. It is the core optical and image-forming component of any Head-mounted display, including virtual reality (VR) headsets, augmented reality (AR) glasses and mixed reality headsets. A near-eye display is normally provided in a stereoscopic pair, one channel for each eye, so that the left and right eye each see a slightly different image and perceive depth.^[1]

Near-eye displays fall into two broad families. Immersive displays fill the user's Field of view with rendered imagery and are used for VR; they are usually opaque and block light from the surrounding scene. See-through displays overlay synthetic images on the user's view of the real world and are used for AR. Each family solves the same underlying optical problem with different hardware.^[1]

The central optical problem

The human eye cannot focus on an object placed only a few centimeters away. The closest distance at which a typical adult eye can form a sharp image, the near point, is roughly 25 cm and grows with age. A display panel sitting just in front of the eye, as in a headset, is far closer than this, so without help the image would be an unfocused blur. Near-eye optics solve this by forming a virtual image of the panel at a comfortable apparent distance, often optical infinity, while also magnifying it to cover a wide angular field.^[1]

The simplest immersive design places the display near the front focal plane of a single positive lens. The lens then produces an upright, magnified virtual image at or near infinity on the far side of the lens, which the relaxed eye can focus on naturally.^[2] Every near-eye display, immersive or see-through, must manage a set of competing requirements at once: resolution, eye box, form factor, correct focus cues, Field of view, eye relief, brightness and full color. Improving one of these usually degrades another, so practical designs are a balance of trade-offs.^[1]

Immersive (virtual reality) architectures

In an immersive near-eye display, a microdisplay or flat panel (OLED, LCD or liquid crystal on silicon) sits behind a magnifying optic for each eye. Several optic types are used.

Fresnel lenses

A Fresnel lens collapses a conventional convex lens into a thin sheet of concentric prismatic rings, keeping most of the focusing power while removing bulk and weight. Fresnel optics were common in earlier consumer VR headsets because they are light and cheap to mold, costing on the order of 20 to 30 US dollars per headset. Their drawbacks are stray-light artifacts, in particular visible concentric "god rays" and glare around bright objects on a dark background, and softer focus toward the edges.^[3]

Pancake lenses

A pancake lens folds the optical path back on itself inside a thin multi-layer stack to shorten the distance between panel and eye. Light from the display passes into the module, reflects off a partially reflective surface, and a quarter-wave plate together with a reflective polarizer routes the polarized light so that it bounces back and forth before exiting toward the eye. Folding the path lets a pancake module be roughly 40 percent thinner than a Fresnel lens of the same magnification, with better edge-to-edge sharpness and less chromatic aberration, which is why most recent slim VR headsets use it.^[3] The main penalty is light efficiency: each polarization pass throws away light, so a pancake stack typically loses about 65 to 75 percent of the panel's output and demands a much brighter display to compensate.^[3]

Optical see-through (augmented reality) architectures

See-through near-eye displays must add bright virtual content while still passing light from the real world, so they rely on an optical combiner that merges the two. A see-through unit generally pairs a small light engine (an LCOS, micro-OLED or laser microprojector) with a combiner that relays its image to the eye.^[1] The main combiner families are:

Combiner	Operating principle	Notes
Birdbath	A curved partially reflective mirror plus a beam splitter magnify and fold the image toward the eye	Gives a relatively wide Field of view but is light-inefficient because rays cross half-mirrors several times, and adds depth in front of the eye^[1]
Reflective waveguide	An array of embedded semi-reflective mirrors couples light out of a transparent substrate	A single reflector's size sets the field of view, so wider fields need a bulkier waveguide; works with LCD, LCOS and OLED engines^[1]
Diffractive waveguide	Surface-relief or volume gratings couple light in and out of the substrate by diffraction	Good thin form factor, but prone to chromatic ("rainbow") artifacts; field of view is capped by the total-internal-reflection angle, which depends on the substrate refractive index^[1]
Holographic optical element	A recorded holographic film acts as a wavelength-selective lens or mirror	Can be very thin and support a large field of view, but full color usually needs three stacked films (red, green, blue), which can cause color crosstalk^[1]
Freeform prism	A molded freeform surface both magnifies and combines the image	Flexible to optimize, but typically has fixed optical power, which worsens the Vergence-accommodation conflict^[1]

In a waveguide the projected image is coupled into a thin transparent substrate, propagates by total internal reflection, and is coupled out over an extended area. The out-coupling grating replicates the exit pupil many times across the lens, which expands the eye box so the image stays visible as the eye moves or rotates.^[4] Reported diffractive-waveguide designs have reached around 70 degrees diagonal field of view using dual-channel pupil expansion, still short of the full human visual field.^[1]

Retinal and virtual retinal projection

A virtual retinal display, also called a retinal scan display or a Maxwellian-view display, does not form an image on a panel and then relay it. Instead it scans or projects light so that it converges through a single point at the eye's pupil and lands directly on the retina. The principle dates to James Clerk Maxwell, who noted in the nineteenth century that light focused at the pupil produces a bright spot on the retina.^[5]

Because each image point passes through one point of the eye's crystalline lens, the retinal image is almost independent of the eye's own focus state. A Maxwellian-view image therefore appears sharp regardless of viewing distance and even for users with myopia, hyperopia or astigmatism, and it sidesteps the focus problem that drives much of near-eye optical design. The cost is a very small eye box: if the eye moves so the convergence point leaves the pupil, the image vanishes, which is one reason retinal projection has stayed mostly in research and niche products.^[5]^[1]

The vergence-accommodation conflict and advanced approaches

A stereoscopic near-eye display drives the eyes to verge (rotate inward or outward) toward the simulated depth of an object, but the light still comes from a panel at one fixed optical distance, so the eye accommodates (focuses) to that fixed plane. The mismatch between where the eyes point and where they focus is the Vergence-accommodation conflict (VAC). It is a leading cause of visual discomfort, eye strain and fatigue in headsets and limits comfortable use, especially for near content.^[6]^[1]

Several research display architectures try to supply correct or near-correct focus cues so that accommodation matches vergence:

Varifocal and multifocal displays change the focal distance of the virtual image, often per frame and driven by eye tracking, using tunable lenses, Alvarez lenses or movable mirrors. They present content at the right focal depth for where the user is looking.^[1]^[6]
Light field displays reproduce the directions of light rays, not just a flat image, so the eye receives the focus cues of a real scene and can accommodate naturally across a range of depths.^[6]
Holographic displays use a spatial light modulator to reconstruct a wavefront, allowing several depth planes or continuous depth and, in principle, true object-image conjugacy that eliminates VAC across a working range.^[7]

Despite many validated prototypes, no single focus-correct approach has yet reached broad commercial adoption, and most shipping headsets still use a single fixed focal plane.^[6]

Key metrics

Near-eye displays are characterized by a recurring set of parameters, several of which trade off against each other.

Metric	Meaning
Field of view	Angular extent of the image at the eye, usually in degrees. The human binocular field is about 200 degrees horizontal by 130 degrees vertical, with roughly 120 degrees of binocular overlap, so even wide headsets cover only part of it^[8]
Resolution and angular resolution	Pixel count and the resulting pixels per degree. Around 60 pixels per degree (about one arcminute per pixel) matches 20/20 visual acuity and is often called eye-limiting or retinal resolution^[8]
Eye box	The volume in front of the eye within which the full image is visible. A larger eye box tolerates eye movement and a range of users; pupil-replicating waveguides exist mainly to enlarge it^[4]^[1]
Interpupillary distance	Spacing between the two eyes' pupils. The adult mean is about 63 mm, with most adults between 50 and 75 mm. The display's two channels and its eye box must accommodate this spread, or part of the image is clipped (vignetted) for users at the extremes; an eye box of 16 mm or more may be needed to serve both narrow and wide IPDs^[9]^[1]
Eye relief	Distance from the last optical surface to the eye, which sets how far the headset sits and whether eyeglasses fit underneath^[1]
Form factor and brightness	Overall size and weight, and the panel luminance needed to feed the optics; folded and see-through paths in particular lose a large fraction of light^[3]

References

↑ ^1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 ^1.10 ^1.11 ^1.12 ^1.13 ^1.14 ^1.15 ^1.16 Xiong, Jianghao; Hsiang, En-Lin; He, Ziqian; Zhan, Tao; Wu, Shin-Tson. "Challenges and Advancements for AR Optical See-Through Near-Eye Displays: A Review". https://www.frontiersin.org/journals/virtual-reality/articles/10.3389/frvir.2022.838237/full.
↑ "Optical system of near-eye see-through head-mounted display". https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/11360307.
↑ ^3.0 ^3.1 ^3.2 ^3.3 "Fresnel vs Pancake Lenses VR: Expert Comparison and Guide". https://www.propelrc.com/esnel-vs-pancake-lenses-vr/.
↑ ^4.0 ^4.1 "Dynamic control of waveguide eye box". https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/12320973.
↑ ^5.0 ^5.1 Lin, Tong; Zhan, Tao; Zou, Junyu. "Retinal projection head-mounted display". https://www.researchgate.net/publication/311162981_Retinal_projection_head-mounted_display.
↑ ^6.0 ^6.1 ^6.2 ^6.3 "Vergence-Accommodation Conflict". https://www.emergentmind.com/topics/vergence-accommodation-conflict-vac.
↑ Chang, Chenliang; Bang, Kiseung; Wetzstein, Gordon; Lee, Byoungho; Gao, Liang. "Toward the next-generation VR/AR optics: a review of holographic near-eye displays from a human-centric perspective". https://opg.optica.org/optica/fulltext.cfm?uri=optica-7-11-1563.
↑ ^8.0 ^8.1 "What is Field of View (FOV)?". https://www.rayneo.com/blogs/news/what-is-field-of-view-fov.
↑ Dodgson, Neil A.. "Variation and extrema of human interpupillary distance". https://www.researchgate.net/publication/229084829_Variation_and_extrema_of_human_interpupillary_distance.

[frvir-1] 1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 ^1.10 ^1.11 ^1.12 ^1.13 ^1.14 ^1.15 ^1.16 Xiong, Jianghao; Hsiang, En-Lin; He, Ziqian; Zhan, Tao; Wu, Shin-Tson. "Challenges and Advancements for AR Optical See-Through Near-Eye Displays: A Review". https://www.frontiersin.org/journals/virtual-reality/articles/10.3389/frvir.2022.838237/full.

[patent-2] "Optical system of near-eye see-through head-mounted display". https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/11360307.

[propelrc-3] 3.0 ^3.1 ^3.2 ^3.3 "Fresnel vs Pancake Lenses VR: Expert Comparison and Guide". https://www.propelrc.com/esnel-vs-pancake-lenses-vr/.

[ipe-4] 4.0 ^4.1 "Dynamic control of waveguide eye box". https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/12320973.

[retina-5] 5.0 ^5.1 Lin, Tong; Zhan, Tao; Zou, Junyu. "Retinal projection head-mounted display". https://www.researchgate.net/publication/311162981_Retinal_projection_head-mounted_display.

[vac-6] 6.0 ^6.1 ^6.2 ^6.3 "Vergence-Accommodation Conflict". https://www.emergentmind.com/topics/vergence-accommodation-conflict-vac.

[holo-7] Chang, Chenliang; Bang, Kiseung; Wetzstein, Gordon; Lee, Byoungho; Gao, Liang. "Toward the next-generation VR/AR optics: a review of holographic near-eye displays from a human-centric perspective". https://opg.optica.org/optica/fulltext.cfm?uri=optica-7-11-1563.

[rayneo-8] 8.0 ^8.1 "What is Field of View (FOV)?". https://www.rayneo.com/blogs/news/what-is-field-of-view-fov.

[ipd-9] Dodgson, Neil A.. "Variation and extrema of human interpupillary distance". https://www.researchgate.net/publication/229084829_Variation_and_extrema_of_human_interpupillary_distance.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]