Jump to content

RGB camera

From VR & AR Wiki

An RGB camera is a digital camera that records visible light as a color image, capturing the three additive primary colors (red, green, and blue) at each point in the scene. Most consumer RGB cameras use a single image sensor (CCD or, more commonly today, CMOS) overlaid with a color filter array, together with image-processing electronics that reconstruct a full-color picture from the sensor's raw output.[1][2] In virtual reality (VR), augmented reality (AR), and mixed reality (MR) headsets, RGB cameras are the sensors that supply color information for video passthrough of the real world, for spatial photo and video capture, and, in some designs, as one input to environment tracking.[3][4]

The term distinguishes a color camera from the monochrome (black and white) and infrared cameras that headsets also use, mainly for positional tracking and depth sensing. Because a color sensor spends part of its resolution and light sensitivity on separating colors, headset makers often pair RGB passthrough cameras with separate monochrome cameras tuned for tracking, rather than using one camera type for everything.[5][3]

How an RGB camera works

A digital image sensor is a grid of light-sensitive photosites (pixels). A bare photosite measures only the intensity of the light that reaches it, not its color, so an unmodified sensor produces a grayscale image.[6] To record color, almost all single-sensor cameras place a color filter array over the pixel grid so that each pixel sees light of only one primary color.[1]

The most common color filter array is the Bayer filter, patented by Bryce Bayer of Eastman Kodak in 1976 (U.S. patent 3,971,065).[1] In a Bayer mosaic, half of the pixels carry green filters and a quarter each carry red and blue filters, arranged in a repeating 2x2 pattern of one red, two green, and one blue. Green is favored because the human eye is more sensitive to green light, and the extra green samples yield an image that appears sharper and less noisy.[1][2]

Each pixel therefore measures only one of the three colors directly; the two missing color values at every pixel are estimated from neighboring pixels by an interpolation process called demosaicing.[1] The unprocessed sensor output before demosaicing is called raw image data.[1] Simple demosaicing copies or averages nearby same-color pixels, while more advanced methods follow image edges to reduce color artifacts.[1] After demosaicing the camera applies white balance, color correction, gamma, and noise reduction to produce the final RGB image.

A variant relevant to headsets is the RGB-IR sensor, whose filter array adds infrared-passing pixels alongside the red, green, and blue ones, so a single camera can output both a color image and a near-infrared image. RGB-IR modules use a dual-bandpass lens filter that admits both visible and near-infrared light, instead of the infrared-cut filter used on ordinary color cameras, and separate the two images with dedicated processing.[7]

RGB cameras versus monochrome and depth cameras

A VR or AR headset typically carries several camera types whose jobs differ.

An RGB camera records color and is used where the image is shown to the user or stored as a photo or video, principally video passthrough.[3] A monochrome camera records only brightness. Removing the color filter array lets every pixel collect light of all visible wavelengths, which improves low-light sensitivity and avoids the resolution loss of demosaicing, and the smaller data stream is faster to process. These properties suit head and controller tracking, where geometry matters more than color.[5] Tracking cameras commonly use a global shutter, which exposes the whole frame at once and so avoids the skew that a rolling shutter introduces during fast head or hand motion.[5] The OmniVision OG0TC, announced in July 2024, is an example of a dedicated tracking-camera sensor: a 400x400 pixel (about 0.16 megapixel) black-and-white global-shutter CMOS sensor in a 1/14.46-inch format, marketed primarily for inward-facing tracking cameras such as eye and face tracking in AR, VR, and MR headsets.[8]

A depth camera (a depth sensor or RGB-D camera) measures the distance to surfaces rather than their color, using techniques such as structured light, time of flight, or LiDAR. When depth data is combined with a color image, the result is an RGB-D image, which pairs color with a per-pixel distance value.[9] The Microsoft Kinect popularized low-cost RGB-D sensing and made the format common in research on three-dimensional scene reconstruction, where multiple RGB-D views are merged into colored meshes or point clouds for VR.[9][10] Because these roles trade off against one another, modern headsets often combine separate camera types and fuse their outputs through sensor fusion so the passthrough view stays color-correct, stable, and aligned to the user's head.[3][5]

Role in VR and AR

Color video passthrough and mixed reality

Video passthrough shows the wearer the outside world on the headset's internal displays using its outward-facing cameras, which lets a fully enclosed VR headset present mixed reality by blending camera video with rendered graphics.[3] The quality of that view depends directly on the RGB cameras: their resolution sets how sharp the real world looks, and their light sensitivity sets how well it holds up in dim rooms.

Early standalone headsets used only monochrome cameras for passthrough. The Meta Quest 2 (2020) carried four monochrome fisheye cameras at the front corners, intended mainly for tracking, and reused them for a low-resolution black-and-white passthrough view used to define the play boundary and to glance at the real world.[11] The Meta Quest 3 (2023) moved to color passthrough using two 4-megapixel RGB cameras plus a depth sensor, which Meta said gave roughly ten times the passthrough resolution of the Quest 2.[3] Reviewers found the color view clear enough to read a phone screen, though still somewhat soft.[3]

The Quest 3's camera layout shows the division of labor described above: two forward-facing rolling-shutter RGB cameras handle color passthrough, while two forward-facing global-shutter grayscale cameras handle visual tracking, supported by infrared cameras and an infrared depth projector.[5][3]

High-end and standalone examples

Higher-resolution RGB passthrough is used in headsets aimed at professional and industrial work. The Varjo XR-3 (2021) used dual 12-megapixel passthrough cameras running at 90 Hz with under 20 ms of latency, combined with LiDAR-and-RGB depth fusion over a 0.4 to 5 m range.[12] The Apple Vision Pro (2024) has a stereoscopic main camera system rated at 6.5 stereo megapixels with an 18 mm, f/2.00 lens for spatial photo and video, part of a sensor array that Apple lists as two main cameras, six world-facing tracking cameras, four eye-tracking cameras, a TrueDepth camera, and a LiDAR scanner.[4] Apple states that a pair of high-resolution cameras send more than one billion pixels per second to the displays so the wearer can see the surroundings.[4]

The table below summarizes the outward-facing RGB passthrough cameras of several headsets. Values for tracking and depth sensors are omitted.

Headset Year RGB passthrough cameras Notes
Meta Quest 2 2020 None (monochrome only) Low-resolution black-and-white passthrough from tracking cameras[11]
Varjo XR-3 2021 2 x 12 MP at 90 Hz Under 20 ms latency; LiDAR + RGB depth fusion[12]
Meta Quest 3 2023 2 x 4 MP Color passthrough plus a depth sensor; about 10x Quest 2 resolution[3]
Apple Vision Pro 2024 2 main cameras, 6.5 stereo MP 18 mm f/2.00; over 1 billion pixels/sec to displays[4]

Spatial capture and tracking input

Beyond live passthrough, RGB cameras let headsets record spatial photos and videos that store a stereo color image of a scene for later viewing in three dimensions; the Apple Vision Pro's main camera system is built for this.[4] RGB images can also feed tracking and scene understanding. Visible-light features support computer vision tasks such as hand tracking and recognizing objects and surfaces, and color frames can be one input to a SLAM (simultaneous localization and mapping) pipeline, although headsets often prefer monochrome global-shutter cameras for the core inside-out tracking loop because of their speed and low-light behavior.[5][3]

Limitations in headsets

RGB passthrough cameras have practical limits. They need adequate light: the Meta Quest 3's color cameras require reasonable room lighting to avoid a grainy image.[3] The cameras sit a few centimeters in front of and apart from the eyes, so their viewpoint does not match the eyes' natural one, and the system must reproject the images to the correct perspective, a step that can add blur, warping, and depth errors.[13] Demosaicing and the color filter array mean a color sensor delivers less effective spatial resolution and less light per pixel than a monochrome sensor of the same pixel count, one reason tracking is usually handled by separate cameras.[6][5] Finally, passthrough adds latency between the camera and the display, which Varjo, for example, holds below 20 ms on the XR-3 to keep the view comfortable.[12]

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 "Bayer filter". https://en.wikipedia.org/wiki/Bayer_filter.
  2. 2.0 2.1 "FAQ: e-CAM40_CUMI4682_MOD - 4 MP OV4682 RGB IR Camera Module". https://www.e-consystems.com/OV4682-4MP-MIPI-IR-camera-module-faq.asp.
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 "Meta Quest 3". https://en.wikipedia.org/wiki/Meta_Quest_3.
  4. 4.0 4.1 4.2 4.3 4.4 "Apple Vision Pro - Technical Specifications". https://www.apple.com/apple-vision-pro/specs/.
  5. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 "OV9281 Global Shutter USB Camera for Augmented Reality and Virtual Reality Tracking Systems with Low Latency". https://www.sinoseen.com/ov9281-global-shutter-usb-camera-for-augmented-reality-and-virtual-reality-tracking-systems-with-low-latency.
  6. 6.0 6.1 "The Next Generation of Image Sensor Tech - Beyond the Bayer CFA". 2023. https://www.cined.com/the-future-of-image-sensor-technology-beyond-the-bayer-cfa/.
  7. "e-con Systems Launches 4K RGB-IR USB Camera Powered by Proprietary RGB-IR Separation Tech". 2024-10. https://www.edge-ai-vision.com/2024/10/e-con-systems-launches-4k-rgb-ir-usb-camera-powered-by-proprietary-rgb-ir-separation-tech-for-diverse-embedded-vision-applications/.
  8. "OMNIVISION Launches Newest Generation of the World's Smallest, Lowest-Power Global Shutter Image Sensor for AR/VR/MR Tracking Cameras". 2024-07-09. https://www.ovt.com/press-releases/omnivision-launches-newest-generation-of-the-worlds-smallest-lowest-power-global-shutter-image-sensor-for-ar-vr-mr-tracking-cameras/.
  9. 9.0 9.1
    Vaitheeswaran, S. M.(2016). "Point Cloud Mapping Measurements Using Kinect RGB-D Sensor and Kinect Fusion for Visual Odometry".{Template:Journal. 89
    209-212. doi:10.1016/j.procs.2016.06.044. https://www.sciencedirect.com/science/article/pii/S1877050916311097.
  10. Cazamias, Jordan. "Virtualized Reality Using Depth Camera Point Clouds". https://web.stanford.edu/class/cs231a/prev_projects_2016/cs231a-project-report.pdf.
  11. 11.0 11.1 "Appreciating the Differences: Meta Quest 2 and Meta Quest 3 Front-Facing Cameras". https://bernoullium.com/appreciating-the-differences-meta-quest-2-and-meta-quest-3-front-facing-cameras/.
  12. 12.0 12.1 12.2 "Varjo XR-3 - The First True Mixed Reality Headset". https://varjo.com/products/varjo-xr-3.
  13. Nieuwoudt, Arthur(2020). "Passthrough+
    Real-time Stereoscopic View Synthesis for Mobile Mixed Reality".{Template:Journal. 3(1)
    1-17. doi:10.1145/3384540. https://dl.acm.org/doi/10.1145/3384540.