Jump to content

Rolling shutter

From VR & AR Wiki
See also: Global shutter and Inside-out tracking

Rolling shutter is a method of image capture in which a camera sensor is exposed and read out one row of pixels at a time rather than all at once, so the top of a frame is recorded a fraction of a moment before the bottom. Because each row samples the scene at a slightly different instant, fast-moving subjects, or a moving camera, produce characteristic distortions: straight edges lean (skew), vibrating cameras make the image wobble (the "jello" effect), and a brief flash can light only part of the frame.[1][2] The alternative is a global shutter, which exposes every pixel simultaneously and captures the whole frame at a single point in time.[1]

Rolling shutter is the default behaviour of most CMOS image sensors, which are cheap, low-power and high-resolution, so it is common in phone cameras, action cameras and the cameras built into virtual reality and augmented reality headsets.[3] The distinction matters for headsets because the same cameras that drive inside-out tracking, hand tracking and passthrough also have to cope with the rapid head and hand motion that rolling shutter handles poorly. For that reason some headset tracking cameras use global shutter sensors instead, and computer-vision pipelines that do rely on rolling shutter cameras add software to model and remove the distortion.[4][5]

How it works

In a rolling shutter sensor the rows of the pixel array are not exposed together. The sensor resets and then reads out one line (or a small group of lines) at a time, and the read-out point "rolls" down the array from top to bottom.[1][3] The time taken to read a single row is the line time, and the time to scan the whole array is the frame time. For a hypothetical 2048 by 2048 sensor with a 10 microsecond line time, the bottom row is read out about 2.048 milliseconds after the top row.[1] During that interval each row is effectively a snapshot of the scene at a different moment, and if there is relative motion between the camera and the subject the rows no longer line up into a single coherent image.

The architecture is a consequence of how CMOS sensors are built. Reading the array row by row needs fewer transistors and simpler control electronics than reading every pixel at once, which makes rolling shutter sensors cheaper, lower-power and quieter (less read noise and heat) than equivalent global shutter parts.[3][1] CCD sensors, by contrast, historically used a global shutter but read out through a single analogue-to-digital converter, which limited their speed; true global shutter CMOS sensors exist but remain less common and more expensive than the rolling shutter variety.[1]

The same effect predates digital sensors. A mechanical focal-plane shutter, in which two curtains travel across the film or sensor with a moving slit between them, also exposes one part of the frame slightly before another, and at fast shutter speeds it produces the same kind of skew of moving subjects.[2][6]

Visual artifacts

The standard reference list of rolling shutter artifacts covers several distinct effects:[2][6]

Artifact Cause Typical appearance
Skew Camera or subject moves laterally during the row-by-row scan Vertical edges lean diagonally; a fast-moving object looks slanted or sheared
Wobble (jello effect) Camera vibrates while scanning, as in handheld or vehicle-mounted footage The image ripples and wobbles unnaturally
Spatial aliasing / smear A subject rotates or moves near the scan rate, e.g. an aircraft propeller Blades appear bent, detached or of varying thickness; car wheels can look oval
Partial exposure (flash banding) A short flash or strobe fires while only some rows are exposed One band of the frame is bright and the rest dark

Because every artifact comes from the gap between when the first and last rows are read, the faster a sensor can scan a frame the smaller the distortion. The lower the frame rate, the more visible the effect becomes on moving subjects.[3][1] A global shutter avoids all of these effects by exposing the whole array at once, which also means its exposure can be cleanly synchronised to a pulsed light source so the flash lands on every row equally.[1]

Relevance to VR and AR

Modern headsets carry several outward-facing cameras for positional tracking, hand tracking and video passthrough, and the choice between rolling and global shutter shapes how well those cameras work under motion.

Tracking cameras

Cameras used for positional tracking often have to find small, bright infra-red landmarks while the headset and controllers swing quickly. The original Oculus Rift DK2 external tracking sensor was built around an Aptina MT9V034, a Wide-VGA (752 by 480) CMOS image sensor with a global shutter that reaches up to 60 frames per second at full resolution.[7] Independent reverse-engineering of the DK2 tracker, contributed to by VR researcher Oliver Kreylos among others, showed the camera's exposure being synchronised to the headset so it could capture the blink patterns of the infra-red LEDs, although that synchronisation was not enabled by default and had to be corrected by hand.[8] A global shutter is well suited to this task: the whole sensor can be exposed for a very short window (roughly tens of microseconds) timed to the moment the LEDs pulse, so the LEDs show up as crisp dots on a dark background and the brief exposure freezes their motion without rolling-shutter smear.[1] The same principle continues in current devices, whose controllers carry rings of infra-red LEDs that pulse in step with the headset's tracking-camera exposure.[9]

Inside-out tracking and SLAM

Inside-out tracking on standalone headsets uses computer vision: the headset's cameras find natural features in the room and combine them with inertial data through visual-inertial odometry and SLAM to estimate head pose. Rolling shutter is a recognised problem for these algorithms, because pixels in the same frame are captured at different times and therefore from slightly different camera positions, which breaks the usual assumption that one image corresponds to one camera pose.[5] Researchers handle this in two ways: by using global shutter cameras, or by explicitly modelling the rolling shutter in the estimator. Reported methods include correcting the distortion with gyroscope or IMU measurements, representing the camera trajectory as a continuous-time curve so each row gets its own pose, and learning the correction with neural networks; the 2020 paper by Jiawei Mo, Md Jahidul Islam and Junaed Sattar, for example, predicts a row-by-row pose from a single image with IMU assistance and validates it by running visual odometry on the corrected frames.[5]

Passthrough

Mixed-reality passthrough, which shows the user a live camera feed of the real world, makes rolling shutter directly visible because the wearer sees the raw camera image (after reprojection) rather than a tracking abstraction. In a January 2023 analysis of the Meta Quest Pro, display analyst Karl Guttag noted that motion in the passthrough view rolled and banded, attributing part of that rolling to the camera's rolling shutter (and the banding partly to the headset's row-by-row infra-red illumination).[4] On the Meta Quest 3, passthrough showed pronounced warping of moving objects such as the user's own hands and phones from its 2023 launch; Meta's v66 software update in June 2024 substantially reduced that distortion and improved the alignment of the wearer's real and virtual hands.[10][11] Such warping in passthrough comes from the combination of camera rolling shutter, the depth reprojection that warps the camera image to the user's eye position, and the system's emphasis on low motion-to-photon latency over absolute image fidelity, rather than from rolling shutter alone.[4][10]

References