Gaussian splatting
- See also: Photogrammetry and SLAM
Gaussian splatting, most commonly the variant 3D Gaussian splatting (3DGS), is a technique for representing and rendering a three-dimensional scene as a large collection of semi-transparent 3D Gaussians that are projected (splatted) onto the image plane and combined by rasterization. It produces photorealistic novel views of a captured scene and renders them in real time. The method was introduced in the 2023 paper "3D Gaussian Splatting for Real-Time Radiance Field Rendering" by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler and George Drettakis of Inria, Université Côte d'Azur and the Max Planck Institute for Informatics, published in ACM Transactions on Graphics at SIGGRAPH 2023.[1][2]
Gaussian splatting is a method for radiance-field rendering, the same problem addressed by Neural Radiance Fields (NeRF). Where NeRF stores a scene implicitly inside a neural network and renders by sampling many points along each camera ray, 3D Gaussian splatting stores the scene explicitly as millions of Gaussian primitives, similar to a richly attributed point cloud, and draws them with a rasterizer rather than ray marching. This explicit, rasterized approach is the main reason it can render at interactive frame rates, which has made it relevant to virtual reality (VR) and augmented reality (AR), where photoreal scenes must be drawn twice (once per eye) at high resolution and high refresh rate.[1][3]
How it works
A 3D Gaussian splat scene is built from a set of ordinary photographs of a real place or object. The images are first run through Structure-from-Motion (the reference implementation uses COLMAP) to recover the camera poses and a sparse point cloud, and that sparse cloud is used to initialize the Gaussians.[4]
Each Gaussian is a small fuzzy blob in space described by a handful of learnable parameters: a 3D position (its mean), an anisotropic covariance that gives it an ellipsoidal shape and orientation (stored as a scale vector plus a rotation quaternion rather than a raw covariance matrix), an opacity value, and a color. To reproduce view-dependent effects such as glossy highlights, the color is not a single fixed value but a set of spherical harmonics coefficients (up to degree 3 in the original work), so a surface can look different from different angles.[4][1]
Rendering projects every visible 3D Gaussian into 2D, sorts the resulting 2D ellipses by depth, and blends them front to back. The authors describe this as a "fast visibility-aware rendering algorithm that supports anisotropic splatting"; in practice it is a tile-based GPU rasterizer that the whole pipeline is built around. Because the rasterizer is differentiable, the scene can be trained by ordinary gradient descent: rendered images are compared against the input photos and the error is back-propagated into every Gaussian's parameters.[1][2]
A second part of training is adaptive density control. The optimization periodically clones and splits Gaussians in regions that are under-reconstructed and prunes Gaussians that have become nearly transparent or too large, so that detail is added where the scene needs it and empty space carries almost no cost.[1][4] The original paper reported state-of-the-art visual quality, matching or exceeding Mip-NeRF 360, with competitive training times and real-time rendering (the abstract cites figures of at least 30 fps and up to 100 fps) at 1080p resolution on the Mip-NeRF360, Tanks and Temples and Deep Blending datasets.[1][2]
Comparison with NeRF
3D Gaussian splatting and NeRF both solve novel-view synthesis from photographs, but they differ in representation and in how they draw a frame.
| Property | Neural Radiance Fields (NeRF) | 3D Gaussian splatting |
|---|---|---|
| Scene representation | Implicit, a multilayer perceptron (neural network) | Explicit, millions of 3D Gaussian primitives |
| Rendering method | Volume rendering by ray marching, sampling many points per pixel | Differentiable rasterization, projecting and blending Gaussians |
| Rendering speed | Typically slow, often far below interactive rates | Real-time (the original paper cites 30 to 100+ fps at 1080p) |
| Storage / memory | Compact (network weights) | Large (per-Gaussian attributes), often hundreds of megabytes |
| View-dependent color | Learned inside the network | Per-Gaussian spherical harmonics |
A 2025 comparison in Sensors summarizes the trade-off as NeRF being slower to train and render but more storage-efficient, while Gaussian splatting trains and renders much faster at the cost of substantially higher storage, with comparable or better view-synthesis quality.[3]
Relevance to VR and AR
The selling point of Gaussian splatting for virtual reality and augmented reality is that it can render a photoreal, free-viewpoint capture of a real place at frame rates a headset can use, which fixed-viewpoint immersive formats such as 180-degree video and stereoscopic spatial video cannot match because they do not let the viewer move through the scene.[5]
Running 3DGS well in VR is harder than on a flat monitor: the scene must be drawn for each eye, at high per-eye resolution and a refresh rate of 72 Hz or more, and depth-sorting errors that are tolerable on a 2D screen show up as "popping" Gaussians or floaters that break stereo. The VRSplat method (Tu, Radl, Steiner, Steinberger, Kerbl and De la Torre), published in the Proceedings of the ACM on Computer Graphics and Interactive Techniques in 2025, was presented as the first systematically evaluated 3DGS approach able to support modern VR applications, reporting 72+ fps while eliminating popping and stereo-disrupting floaters and validating the result in a user study with 25 participants on a Meta Quest 3.[6] A related line of work targets foveated rendering, drawing the area the eye is looking at in full detail and the periphery more cheaply to save GPU time.[6]
On the consumer side, the startup Gracia AI released a Gaussian-splat viewer for the Meta Quest 3, distributed through the Quest Store, with a pipeline it said renders splats about ten times faster than other solutions so they can run on a standalone headset; early versions showed visible artifacts and limited resolution, indicating the hardware was near its limit.[5] Gaussian splatting has also been applied to human avatars: the SqueezeMe project demonstrated distilled full-body Gaussian avatars running on a Meta Quest-class device, with three avatars rendered concurrently at 72 fps.[7] Because a splat scene is captured from ordinary photos rather than modeled by hand, the technique overlaps with photogrammetry as a way to bring real environments into VR and AR.[5][3]
Capture tools and file formats
Several mobile apps create Gaussian splats from phone photos, including Niantic's Scaniverse, Polycam, Luma AI and KIRI Engine. Scaniverse, a free iOS and Android app from Niantic that combines 3D scanning, LiDAR and Gaussian splatting on the device, exports the SPZ format.[8]
Storage size is the main practical obstacle to distributing splats, so compression has become its own research area. In October 2024 Niantic open-sourced SPZ ("Splat Zip") under the MIT license, describing it as "JPG for 3D Gaussian splats"; the company reported roughly a 90 percent reduction in file size versus the common PLY format with little visible quality loss.[8] In August 2025 the Khronos Group, with the OGC and partners including Niantic, announced two glTF extensions to bring Gaussian splatting into the format, KHR_gaussian_splatting for the base data and KHR_gaussian_splatting_compression_spz, which adopts SPZ as a compressed encoding.[9]
Limitations
The original reference implementation is released under a research-only license held by Inria and the Max Planck Institute for Informatics, which does not permit commercial use without the licensors' explicit consent; this is separate from the many later open-source reimplementations.[4] Reconstruction quality depends on having enough well-distributed input photographs, and a static 3DGS scene captures only a single moment in time. Extending the technique to moving (dynamic) scenes, often called 4D Gaussian splatting, multiplies the number of primitives and their attributes and can require many gigabytes of storage for a short clip; a 2025 study noted one dynamic scene needing about 13 million points and roughly 7.8 GB of storage, which is why memory-efficient and compressed variants remain an active topic.[10]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5
- Kopanas, Georgios(July 2023). "3D Gaussian Splatting for Real-Time Radiance Field Rendering".{Template:Journal. 42(4). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/. Retrieved 2026-06-15.
- ↑ 2.0 2.1 2.2 Template:Cite arXiv
- ↑ 3.0 3.1 3.2 (2025). "Comparative Assessment of Neural Radiance Fields and 3D Gaussian Splatting for Point Cloud Generation from UAV Imagery".{Template:Journal. 25(10). https://www.mdpi.com/1424-8220/25/10/2995. Retrieved 2026-06-15.
- ↑ 4.0 4.1 4.2 4.3 "graphdeco-inria/gaussian-splatting: Original reference implementation". https://github.com/graphdeco-inria/gaussian-splatting.
- ↑ 5.0 5.1 5.2 Template:Cite news
- ↑ 6.0 6.1 Template:Cite arXiv
- ↑ Template:Cite arXiv
- ↑ 8.0 8.1 "Open-sourcing .SPZ: it's .JPG for 3D Gaussian splats". 2024-10-29. https://dev.scaniverse.com/news/spz-gaussian-splat-open-source-file-format.
- ↑ "Khronos, OGC, and Geospatial Leaders Add 3D Gaussian Splats to the glTF Asset Standard". 2025-08-07. https://www.khronos.org/blog/khronos-ogc-and-geospatial-leaders-add-3d-gaussian-splats-to-the-gltf-asset-standard.
- ↑ Template:Cite arXiv