Neural Radiance Fields
Neural radiance fields (NeRF) is a method in computer graphics and computer vision for synthesizing novel views of a three-dimensional scene from a set of input photographs. A NeRF represents a scene as a continuous volumetric function, stored in the weights of a small neural network, that maps a 5D coordinate (a 3D spatial position plus a 2D viewing direction) to a color and a volume density. New images are rendered by querying this function along camera rays and compositing the results with classical volume rendering. The technique was introduced in 2020 by Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng.[1][2]
For virtual reality (VR) and augmented reality (AR), NeRF and the broader family of radiance field methods are studied as a way to capture real places and objects photorealistically and replay them as interactive, view-dependent three-dimensional content, an alternative to traditional polygon meshes built by photogrammetry. Because a NeRF reproduces effects such as specular reflections and semi-transparent surfaces that depend on the viewer's position, the rendered scene shifts correctly as a user moves their head, which suits stereoscopic head-tracked viewing. Rendering a NeRF fast enough for a headset is the central obstacle, and several research systems target real-time on-device AR/VR rendering specifically.[3][4]
Origin
The NeRF paper was presented as an oral at the European Conference on Computer Vision (ECCV) in 2020, where it received a Best Paper Honorable Mention, and it was first posted to arXiv on 19 March 2020.[1][2] At the time of publication five of the six authors were at the University of California, Berkeley (Mildenhall, Srinivasan, Tancik, and Ng), with Barron at Google Research and Ramamoorthi at the University of California, San Diego; Mildenhall and Srinivasan are listed as equal first authors.[1] An expanded version was republished as a research highlight in Communications of the ACM in January 2022.[5]
The method built on earlier work in view synthesis and learned scene representations, but it differed in storing the entire scene implicitly in the parameters of one fully connected network rather than as an explicit grid, mesh, or point set. The original results outperformed prior neural rendering and view synthesis methods on both synthetic path-traced objects and real scenes captured from inward-facing photographs.[1][5]
How it works
A neural radiance field is a function approximated by a multilayer perceptron, a fully connected, non-convolutional network. The input is a continuous 5D coordinate: a spatial location (x, y, z) and a viewing direction expressed as two angles. The output is an RGB color and a scalar volume density at that point. Density depends only on position, so geometry stays consistent from every viewpoint, while color is allowed to depend on viewing direction, which is what lets the model reproduce view-dependent appearance such as highlights and reflections.[1][5]
To render a pixel, the system casts a ray from the camera through that pixel, samples points along the ray, queries the network at each sample for its color and density, and accumulates these into a single color using classical volume rendering (an alpha-compositing integral along the ray). Because this rendering step is differentiable, the network can be trained by gradient descent so that its rendered images match the input photographs; the only data required is a set of images with known camera poses. The poses are usually recovered beforehand with a structure-from-motion tool. A trained NeRF is specific to the one scene it was optimized on.[1][5]
The original paper introduced two components needed to make this work well. The first is positional encoding: feeding the raw coordinates directly into the network produced blurry results, so the inputs are first mapped into a higher-dimensional space with sine and cosine functions at several frequencies, which lets the network represent high-frequency detail. The second is hierarchical volume sampling, which uses a coarse network to find where a ray is likely to hit surfaces and then concentrates a second, fine network's samples there, avoiding wasted computation in empty space.[1][5]
Limitations and acceleration
The original NeRF is slow. Optimizing a single scene took on the order of one to two days on a GPU, and rendering a new frame took seconds rather than the milliseconds an interactive display needs, in part because the naive ray-marching procedure samples many points in empty or occluded regions that contribute nothing to the image.[5][6]
A large body of follow-up work attacked these costs. Nvidia's Instant Neural Graphics Primitives (Instant-NGP), published at SIGGRAPH 2022 by Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller, augments a small network with a multiresolution hash table of trainable feature vectors. This cut NeRF training from hours to seconds on a single GPU and enabled rendering at interactive rates.[6] For headsets specifically, FoV-NeRF folds the limits of human visual acuity into the representation to perform gaze-contingent foveated rendering, reporting up to a 99% reduction in render time versus the original NeRF while keeping perceived quality close to full resolution.[3] RT-NeRF, presented at ICCAD 2022, describes an algorithm and hardware co-design aimed at real-time (above 30 frames per second) NeRF rendering on AR/VR devices.[4]
VR and AR relevance
The practical appeal for immersive media is photorealistic capture. With ordinary photos or phone video, a NeRF can reconstruct a real environment that a user then walks around inside a head-mounted display, with parallax and reflections that look correct from any angle, which is difficult to achieve with hand-built geometry. Comparative studies in digital heritage have found that NeRF can reconstruct texture-less, metallic, highly reflective, and transparent objects that confound conventional photogrammetry meshes, though photogrammetry can still produce metrically accurate, lower-noise surfaces and remains better for some fine geometric detail.[7]
Beyond static capture, research has targeted the specific demands of headset viewing: high field of view, high resolution, and stereoscopic, head-tracked rendering, which strain a method designed for offline single-image synthesis. Examples include foveated rendering tuned to VR, on-device acceleration, and work on streaming and interactive radiance fields for immersive viewing.[3][4][8] Commercial 3D-capture tools, such as Luma AI, brought NeRF-style capture from phone footage to consumers before largely shifting to the related Gaussian splatting representation, which renders directly to mobile devices and browsers.[9]
Relationship to Gaussian splatting
Gaussian splatting (more fully, 3D Gaussian splatting) is a later radiance field method, introduced by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis at SIGGRAPH 2023. Instead of an implicit neural function evaluated by ray marching, it represents a scene as a set of anisotropic 3D Gaussian primitives and renders them with a fast rasterizer. It targets the same novel-view-synthesis problem as NeRF and reaches high visual quality with real-time rendering, reporting 100 or more frames per second at 1080p.[10] Because it rasterizes an explicit point-based representation rather than evaluating a network per sample, Gaussian splatting is generally faster to render than the original NeRF, and by the mid-2020s much VR/AR capture tooling adopted it for that reason. NeRF remains in active research use and the two are often grouped together as radiance field methods.[10][8][9]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Mildenhall, Ben; Srinivasan, Pratul P.; Tancik, Matthew; Barron, Jonathan T.; Ramamoorthi, Ravi; Ng, Ren (2020). "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis". European Conference on Computer Vision (ECCV) 2020. pp. 405-421. https://www.matthewtancik.com/nerf.
- ↑ 2.0 2.1 Template:Cite arXiv
- ↑ 3.0 3.1 3.2
- He, Zhenyi(2022-11). "FoV-NeRF
- Foveated Neural Radiance Fields for Virtual Reality".{Template:Journal. 28(11)
- 3854-3864. doi:10.1109/TVCG.2022.3203102. https://pubmed.ncbi.nlm.nih.gov/36044494/. Retrieved 2026-06-15.
- ↑ 4.0 4.1 4.2 Li, Chaojian; Li, Sixu; Zhao, Yang; Zhu, Wenbo; Lin, Yingyan (2022). "RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering". IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2022. Template:Hide in printTemplate:Only in print. https://licj15.github.io/rt-nerf/.
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5
- Srinivasan, Pratul P.(2022-01). "NeRF
- Representing scenes as neural radiance fields for view synthesis".{Template:Journal. 65(1)
- 99-106. doi:10.1145/3503250. https://dl.acm.org/doi/10.1145/3503250. Retrieved 2026-06-15.
- ↑ 6.0 6.1
- Evans, Alex(2022-07). "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding".{Template:Journal. 41(4). doi
- 10.1145/3528223.3530127. https://nvlabs.github.io/instant-ngp/. Retrieved 2026-06-15.
- ↑ (2024). "Comparative Assessment of Neural Radiance Fields and Photogrammetry in Digital Heritage: Impact of Varying Image Conditions on 3D Reconstruction".{Template:Journal. 16(2)
- 301. doi:10.3390/rs16020301. https://www.mdpi.com/2072-4292/16/2/301. Retrieved 2026-06-15.
- ↑ 8.0 8.1 Template:Cite arXiv
- ↑ 9.0 9.1 "Luma Interactive Scenes Announced: Gaussian Splatting". 2023-10-03. https://radiancefields.com/luma-interactive-scenes-announced.
- ↑ 10.0 10.1
- Kopanas, Georgios(2023-07). "3D Gaussian Splatting for Real-Time Radiance Field Rendering".{Template:Journal. 42(4). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/. Retrieved 2026-06-15.