Jump to content

Computer graphics

From VR & AR Wiki

Computer graphics is the field of computing concerned with generating, manipulating and displaying images with a computer. It covers both two-dimensional images (such as typography, drawings and user interfaces) and three-dimensional graphics, in which geometric models of objects and scenes are stored in a computer and converted into images through a process called rendering. The term was coined in 1960 by William Fetter and Verne Hudson of the aircraft manufacturer Boeing.[1]

In virtual reality (VR) and augmented reality (AR), computer graphics is the technology that produces the images a user sees through a head-mounted display. The defining difference from graphics for a monitor or a film is that VR and AR are interactive and head-tracked: the system must render a separate image for each eye, keep the frame rate high (typically 90 frames per second or more), and hold the delay between head movement and updated image very low, all on hardware that is often a battery-powered standalone headset. These constraints have driven techniques such as stereoscopic rendering optimizations, reprojection (asynchronous timewarp and asynchronous spacewarp), and foveated rendering.[2][3]

Subfields

Computer graphics is usually divided into a few overlapping areas. Modeling is the creation of a mathematical representation of an object or scene, most commonly a polygonal mesh of triangles, or smooth surfaces defined by NURBS (non-uniform rational B-splines). Rendering converts such a model into a raster image of pixels. Animation produces moving images by changing the model, camera or lighting over a sequence of frames. Closely related areas include image processing, geometry processing and physics simulation.[1]

3D content for VR and AR is built in modeling and 3D modeling tools (for example Blender, Autodesk Maya or 3ds Max) and then rendered in real time by a game engine, most often Unity or Unreal Engine.[4]

Rendering

Rendering is the step that generates a two-dimensional image from a description of a scene. Two broad families of algorithms dominate.[5]

Rasterization is an object-order method: each triangle in the scene is projected onto the image plane, and the pixels it covers are filled in and shaded. It is fast and maps well to graphics hardware, so it is the basis of nearly all real-time 3D engines and is the approach used for VR and AR rendering today.[6]

Ray tracing is an image-order method that simulates the path of light by tracing rays from the camera into the scene and computing how they reflect, refract and cast shadows. It can produce more physically accurate reflections, refractions and shadows than rasterization but at much higher computational cost. Ray casting for rendering was first described by Arthur Appel in 1968, recursive ray tracing for mirrors and transparency by Turner Whitted in 1980, and path tracing (a Monte Carlo method that solves the rendering equation introduced by James Kajiya in 1986) is now the standard technique for photorealistic offline rendering in film.[5][7]

A practical distinction is between real-time and offline rendering. Real-time rendering generates each frame fast enough to be displayed immediately as the user interacts, which for VR means within a few milliseconds per frame. Offline rendering, used for visual effects and animated films, can spend minutes or hours on a single frame to achieve higher quality. VR and AR are firmly in the real-time category, which is why they rely on rasterization rather than full path tracing.[5]

History

The first interactive computer graphics system is generally taken to be Sketchpad, demonstrated by Ivan Sutherland in 1963, which let a user draw shapes on a vector display with a light pen. Through the late 1960s and 1970s the University of Utah became a center of graphics research, producing foundational work on hidden-surface removal and shading. The same period saw the shift from vector displays, which drew lines between points, to raster displays, which fill a grid of pixels; an early bitmapped raster system was the Xerox Alto in the 1970s.[1][5]

Sutherland is also directly connected to VR and AR history: in 1968 he built an early head-mounted display, the system later nicknamed the Sword of Damocles, which rendered simple wireframe graphics that updated as the wearer moved their head. That work links the origins of interactive computer graphics to the origins of head-tracked displays.[1]

Graphics demands of VR and AR

Rendering for a head-mounted display is harder than rendering for a flat screen for several reasons that compound one another.

Stereoscopic rendering

A VR headset shows a different image to each eye to create stereo depth, so the scene must be rendered twice per frame, once per eye, which roughly doubles the work compared with a single view. To reduce this cost, engines use single-pass stereo techniques: instead of issuing two full render passes, the scene is submitted once and drawn to both eye images together, which cuts the CPU draw-call overhead and reduces redundant work on the GPU. NVIDIA introduced a hardware approach called Single Pass Stereo, and both Unity (single-pass instanced rendering, with a Multiview variant on supported devices) and Unreal Engine (Instanced Stereo) implement single-pass stereo rendering.[3][4][8]

Frame rate and latency

Low frame rate and high latency are both linked to discomfort and motion sickness in VR, so headsets target a high, steady refresh rate, commonly 90 Hz or more, and a low motion-to-photon latency (the delay between a head movement and the corresponding change on the display). When an application cannot render a fresh frame in time, headset software fills the gap by reprojecting an existing frame using the latest tracking data. The basic version, timewarp, corrects a finished frame for head rotation just before the display refreshes; it was added to Oculus software by John Carmack in 2014. Asynchronous timewarp runs this correction in parallel with rendering so a warped frame is always ready, shipping on the Gear VR in 2014 and on the PC Rift in 2016; Valve added an equivalent Asynchronous Reprojection to SteamVR in 2016. Asynchronous spacewarp, introduced by Oculus in 2016, additionally extrapolates object and positional motion (using depth information in its 2.0 version) so an application can run at half the display rate while the compositor synthesizes the intermediate frames; Valve's comparable feature is called Motion Smoothing.[2][9]

Foveated rendering

The human eye sees in high detail only at the fovea, the small central region of the retina, and acuity falls off sharply toward the periphery. Foveated rendering exploits this by rendering the area a user is looking at in full detail and reducing quality in the periphery, which lowers the number of pixels that must be shaded without a visible loss of quality. The idea was formalized for 3D graphics by Brian Guenter and colleagues at Microsoft Research, whose 2012 paper "Foveated 3D Graphics" reported a 5 to 6 times speedup on an HD display by rendering layered regions of decreasing resolution around the gaze point.[10][11]

Two forms exist. Fixed foveated rendering does not track the eye and simply renders the edges of each frame at lower detail, exploiting the fact that the optics of a headset already make the screen edges harder to see; Qualcomm's mobile chips have supported this since the Snapdragon 835 era (2017). Dynamic, or eye-tracked, foveated rendering moves the high-detail region with the user's gaze and so requires eye tracking hardware in the headset. Eye-tracked foveated rendering is used by the HTC Vive Pro Eye (2019), the Meta Quest Pro (2022), PlayStation VR2 (2023) and the Apple Vision Pro (2024).[10]

A related GPU feature is variable rate shading (VRS), which NVIDIA introduced with its Turing GPUs in 2018. VRS decouples the rate at which pixels are shaded from the rate at which they are rasterized, letting an application shade one sample for a block of pixels in regions that need less detail. Because the shading rate can be varied per screen region, VRS can be driven by a headset's gaze data to implement foveated rendering on the desktop.[12]

References