Point cloud

Point cloud is a set of data points in three-dimensional space, each defined by Cartesian coordinates (X, Y, Z) and optionally by extra attributes such as color, intensity, surface normal, or a timestamp. The points sample the external surfaces of objects or an environment rather than describing them as continuous surfaces, so a point cloud is a discrete, unstructured representation of three-dimensional geometry.^[1]^[2]

Point clouds are produced by LiDAR sensors, depth cameras, other 3D scanners, and photogrammetry software that measure many points on the surfaces around a device.^[1] In virtual reality (VR) and augmented reality (AR) they are a common intermediate form of spatial data: head-worn and handheld devices reconstruct the geometry of a room as a point cloud, then use it for tracking, spatial mapping, occlusion, and 3D reconstruction. The same representation underlies volumetric capture and recent radiance-field rendering methods.

How a point cloud is structured

A point cloud is a collection of independent points with no inherent connectivity or ordering, which distinguishes it from a polygon mesh (vertices joined into faces) or a raster image (pixels on a fixed 2D grid). Because the data is unstructured it captures three-dimensional geometry without enforcing a fixed spatial layout, at the cost of leaving surface topology implicit.^[2]

Each point always carries its position. Depending on the capture method and use case, points may also store RGB color, a reflectance or laser-return intensity value, a surface normal, a timestamp, and per-point classification labels.^[1] A cloud can be dense (millions of closely spaced points covering a surface) or sparse (a thinner sampling, such as the feature points an AR device tracks), and that density distinction drives how the data is processed, stored, and compressed.^[3]

Capture methods

The main ways to generate a point cloud differ in range, density, and the conditions they need.

LiDAR (light detection and ranging) emits laser pulses and measures the return time to compute distance, producing accurate measurements at long range but comparatively sparse data.^[2]
Depth cameras (for example structured-light or time-of-flight sensors such as Microsoft Kinect) combine an infrared illuminator with an imager to recover per-pixel depth, which back-projects to a point cloud.^[2]
Photogrammetry and structure-from-motion reconstruct a 3D scene from many overlapping 2D photographs, yielding dense, colored reconstructions but requiring good lighting and textured surfaces.^[2]

LiDAR and photogrammetry trade off against each other: LiDAR is strong at long-range accuracy with sparser output, while photogrammetry can produce dense colored reconstructions but depends on lighting and surface texture.^[2]

File formats

Several formats are used to exchange and archive point clouds.

Format	Origin	Notes
PLY (Polygon File Format / Stanford Triangle Format)	Developed at Stanford University by Greg Turk and colleagues in the mid-1990s	Designed to store data from 3D scanners; holds vertices with optional color, transparency, surface normals and texture coordinates, plus faces; has ASCII and binary variants.^[4]
LAS (LASer)	Maintained by the American Society for Photogrammetry and Remote Sensing (ASPRS)	Open binary format for interchange and archiving of LiDAR point cloud data; each record stores X, Y, Z plus fields such as GPS time, intensity, return number, color and classification. Version 1.4 R15 was released on 9 July 2019.^[5]
PCD (Point Cloud Data)	Native format of the Point Cloud Library (PCL)	Header declares per-point FIELDS (for example x y z rgb), SIZE, TYPE and COUNT; stores data as ASCII, binary, or binary_compressed.^[6]

Conversion to a mesh

For rendering and physics, a point cloud is often converted into a polygon surface. Common surface-reconstruction approaches build a triangle network directly from the points (Delaunay triangulation, alpha shapes, or ball pivoting) or first convert the points into a volumetric distance field and then extract a surface with the marching cubes algorithm.^[1] In AR specifically, a point cloud assembled across frames is used as the input to generate a triangle mesh that gives a more complete and renderable model of the surroundings, which the application can then use for occlusion and collision against virtual objects.^[7]

Use in AR tracking and spatial mapping

Markerless AR systems build and track against a point cloud as part of SLAM (simultaneous localization and mapping). Google's ARCore detects visually distinct features in the camera image, called feature points, and uses them to compute its change in location; it combines this visual information with the device IMU to estimate the camera pose over time. ARCore then looks for clusters of feature points that lie on common horizontal or vertical surfaces, such as tables or walls, and exposes those surfaces to applications as geometric planes. Because plane detection depends on feature points, untextured surfaces such as a blank white wall may not be detected reliably.^[8]

Apple's ARKit exposes a comparable sparse cloud of tracked points through its raw feature points, an ARPointCloud that Apple documents as the current intermediate results of the scene analysis the framework uses for world tracking; its main intention is a debug visualization of what the underlying tracking algorithm is processing rather than a stable map for applications to build on.^[9] By combining many such per-frame observations, an AR system builds a spatial map of the environment that lets the application reason about the 3D layout of the space.^[7] The technique generalizes across markerless inside-out tracking systems used on devices such as HoloLens, Magic Leap, and the Apple Vision Pro, and historically on Google's Project Tango.

Use with depth-sensing headsets

Devices with a dedicated depth sensor turn raw depth into denser geometry. The LiDAR Scanner that Apple introduced on the 2020 iPad Pro feeds ARKit, whose scene-reconstruction feature aggregates depth data across frames (available at 60 Hz, tied to each AR frame) into a 3D mesh. The mesh is assembled from several smaller submeshes stored in ARMeshAnchor objects; regions the device is close to are tessellated more finely than distant ones, and areas the device is no longer facing are progressively coarsened or discarded.^[10] One independent analysis reported roughly 30,000 mesh vertices on average for a detailed room.^[11] Beyond the reconstructed mesh, ARKit also gives developers a LiDAR-derived depth map directly: on LiDAR-capable devices the scene depth API populates each AR frame with an ARDepthData object containing a per-pixel depth map (in meters) and a confidence map, updated at 60 Hz, which back-projects to a point cloud.^[12]

Use in radiance-field rendering

Point clouds also seed modern photorealistic scene capture. The method 3D Gaussian Splatting, presented by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis in ACM Transactions on Graphics in July 2023, starts from the sparse points produced during camera calibration (a structure-from-motion point cloud) and represents the scene with 3D Gaussians, then optimizes them to reproduce the input views and renders the result in real time.^[13] The studio Magnopus describes the pipeline the same way (estimate a point cloud by structure from motion, convert each point to a Gaussian) and reports that the real-time rasterization and detail quality make the approach useful for virtual reality backdrops and immersive scenes, citing Meta's experiments using Gaussian splatting for Codec Avatars toward photorealistic telepresence.^[14]

Compression for immersive media

Because dense point clouds are large, the Moving Picture Experts Group (MPEG) standardized two point cloud compression methods within the ISO/IEC 23090 (MPEG-I) family for immersive media. Video-based Point Cloud Compression (V-PCC) targets dense point clouds by segmenting the cloud into surface patches, projecting them into 2D atlases, and coding those with a video codec; Geometry-based Point Cloud Compression (G-PCC) targets sparse point clouds by encoding geometry and attributes directly in 3D using an octree.^[3]^[15] The overview paper by Graziosi and colleagues (2020) lists the target applications as 6-degree-of-freedom VR/AR, immersive telepresence, autonomous-vehicle LiDAR data, and cultural-heritage archival, and notes the first versions were planned for release in 2020.^[3]

References

[wiki-pc-1] 1.0 ^1.1 ^1.2 ^1.3 "Point cloud". https://en.wikipedia.org/wiki/Point_cloud.

[voxel51-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 "Comprehensive Guide to Point Cloud Data in Computer Vision". https://voxel51.com/blog/comprehensive-guide-point-cloud-data.

[apsipa-3] 3.0 ^3.1 ^3.2
Nakagami, O.(2020). "An overview of ongoing point cloud compression standardization activities

video-based (V-PCC) and geometry-based (G-PCC)".{Template:Journal. https://www.cambridge.org/core/journals/apsipa-transactions-on-signal-and-information-processing/article/an-overview-of-ongoing-point-cloud-compression-standardization-activities-videobased-vpcc-and-geometrybased-gpcc/56FCAF660DD44348BCB1BCA9B5EC56CF.

[ply-4] "PLY (file format)". https://en.wikipedia.org/wiki/PLY_(file_format).

[las-5] "LAS file format". https://en.wikipedia.org/wiki/LAS_file_format.

[pcd-6] "The PCD (Point Cloud Data) file format". https://pointclouds.org/documentation/tutorials/pcd_file_format.html.

[numberanalytics-7] 7.0 ^7.1 "Point Cloud in VR/AR: Ultimate Guide". https://www.numberanalytics.com/blog/point-cloud-vr-ar-ultimate-guide.

[arcore-8] "Fundamental concepts". https://developers.google.com/ar/develop/fundamentals.

[apple-rfp-9] "rawFeaturePoints". https://developer.apple.com/documentation/arkit/arframe/2887449-rawfeaturepoints.

[arkit911-10] "ARKit 911 - Scene Reconstruction with a LiDAR Scanner". https://medium.com/macoclock/arkit-911-scene-reconstruction-with-a-lidar-scanner-57ff0a8b247e.

[nomtek-11] "Analyzing Apple's LiDAR Scanner". https://www.nomtek.com/blog/lidar-scanner-research.

[apple-depth-12] "ARDepthData". https://developer.apple.com/documentation/arkit/ardepthdata.

[gs-repo-13] "3D Gaussian Splatting for Real-Time Radiance Field Rendering (reference implementation)". https://github.com/graphdeco-inria/gaussian-splatting.

[magnopus-14] "The rise of 3D Gaussian Splatting". https://www.magnopus.com/blog/the-rise-of-3d-gaussian-splatting.

[mpegexpert-15] "MPEG-I - Point Cloud Compression (V-PCC, G-PCC and branches)". https://mpeg.expert/v-g-pcc/index.html.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]