Jump to content

3D scanning

From VR & AR Wiki

3D scanning is the process of analyzing a real-world object or environment to capture data about its shape and, in many cases, its surface appearance such as color and texture. The captured data is assembled into a digital three-dimensional representation, usually a point cloud of sampled surface coordinates or a polygon mesh derived from it.[1] Scanning methods range from physical contact probes to non-contact optical techniques including laser triangulation, time-of-flight LiDAR, structured light, and Photogrammetry.[2]

In virtual reality (VR) and augmented reality (AR), 3D scanning is used to two ends. It produces digital replicas of real objects, people, and places that become assets inside virtual environments, and it maps the user's surroundings so that headsets can place virtual content correctly in the physical world, a function closely tied to Depth sensing and Spatial mapping.[3][4]

How it works

3D scanning techniques are commonly grouped into contact and non-contact methods. Contact scanners, such as coordinate measuring machines and articulated measuring arms, physically probe a surface point by point. They reach very high precision but are slow and can deform soft objects, so they are mostly used in industrial metrology rather than VR/AR content capture.[1][2]

Non-contact methods are further split into active systems, which project their own light or other radiation, and passive systems, which rely on ambient light. The main techniques are summarized below.

Technique How it works Typical accuracy and range Notes
Laser triangulation A laser dot or stripe is projected onto the surface and a camera observes where it lands; the laser source, the camera, and the lit point form a triangle whose geometry gives the distance.[1][5] Tens of micrometres; range of a few metres.[1] High detail on small to medium objects.
Time-of-flight / LiDAR A laser pulse is emitted and the round-trip time of the reflection is timed; because the speed of light is known, distance equals c times t divided by two.[1][5] Millimetre level; range up to kilometres.[1] Suited to rooms, buildings, and outdoor scenes.
Structured light A known pattern of stripes or grids is projected onto the object and one or more cameras measure how the pattern deforms across the surface.[6][1] High accuracy on small to medium objects; captures a whole field at once.[1] Fast enough to capture moving subjects.
Photogrammetry Many overlapping photographs are taken from different angles and software reconstructs the 3D shape and texture from corresponding features across the images.[1][7] Variable; depends on camera, lighting, and image count.[7] Low hardware cost; produces detailed color textures.

The feature-matching step at the heart of photogrammetry is structure from motion, a Computer vision technique that also recovers the camera positions. All of these methods first produce a point cloud, a set of measured surface samples. Mesh-processing software then converts the cloud into a triangle mesh, fills gaps, and optionally exports formats such as OBJ, glTF, or STL.[8]

History

Early non-contact 3D scanning systems using lights, cameras, and projectors date to the 1960s, and laser triangulation devices were commercialized in the 1980s.[9] Cyberware Laboratories, based in California, built influential laser-triangulation hardware in that decade, including head and full-body scanners.[9]

A widely cited demonstration of large-scale scanning was the Digital Michelangelo Project, led by Marc Levoy of Stanford University during the 1998-1999 academic year. The team digitized ten Michelangelo statues, including the David, and two building interiors, using a custom Stanford Large Statue Scanner fabricated by Cyberware. The David dataset comprised roughly two billion polygons and 7,000 color images.[10]

A turning point for real-time, consumer-grade scanning came with the Microsoft Kinect, a structured-light depth camera released for the Xbox 360 in 2010. In 2011 a Microsoft Research team published KinectFusion, which let a user create a detailed 3D reconstruction of an indoor scene in real time simply by moving a handheld Kinect, tracking the sensor's pose from depth data alone and building a dense volumetric model on commodity graphics hardware. The authors listed geometry-aware augmented reality and physics-based interaction among its uses.[11]

Use in VR and AR

Capturing assets

Scanning lets creators reproduce real objects, people, and locations as digital models instead of building them by hand. Studios scan physical sculpts, props, and actors, then import the meshes into engines such as Unity and Unreal Engine for use in games and immersive experiences.[1][3] Artec 3D, a scanner maker, documents cases in which handheld and LiDAR scanners captured farm equipment for a video-game add-on and large industrial machines for VR exhibition models, removing the need to ship the physical items to trade shows.[3]

Photographic capture pipelines have expanded alongside dedicated scanners. Traditional Photogrammetry reconstructs an explicit textured mesh, while newer view-synthesis methods, neural radiance fields and 3D Gaussian splatting, optimize a learned scene representation from the same kind of photo or video input and can render novel viewpoints in real time. Gaussian splatting represents a scene as many small 3D Gaussian primitives and generally needs more computation than classical photogrammetry, since it trains a model over many iterations per scene.[12][13]

Spatial mapping on headsets and phones

For AR and mixed reality, devices scan the user's surroundings so that virtual content can be anchored to real geometry, occluded by real objects, and collided with physically. On the Quest 3 and Quest 3S, the integrated depth sensor drives a Space Setup step that reconstructs walls, ceiling, floor, and furniture into a single triangle-based scene mesh, which apps query through Meta's Scene and Mesh APIs; a separate Depth API supplies a real-time per-frame depth map for occlusion.[14] A scan captures the room only at the moment it is taken, so if furniture is moved the scan must be edited or the room re-scanned; Meta said it planned to make Quest room scanning more automatic.[15]

Mobile devices brought 3D scanning to a wide audience. On March 18, 2020, Apple announced an iPad Pro fitted with a LiDAR Scanner, a time-of-flight sensor that measures distance to objects up to 5 metres away and operates, in Apple's description, at the photon level at nano-second speeds; the same sensor later appeared in the iPhone 12 Pro and subsequent Pro models. Apple exposed the depth data through ARKit, enabling instant AR object placement, improved motion capture, and people occlusion.[16] Third-party apps such as Polycam and Scaniverse use this LiDAR and camera data, or photogrammetry on non-LiDAR phones, to produce full-color scans that ARKit triangulates into a mesh.[17] ARCore provides comparable depth and environment understanding on Android.[4]

Capturing whole spaces

For larger environments, dedicated capture systems build navigable replicas, sometimes called digital twins. Matterport cameras are placed at successive positions in a space, rotating at each to record high-dynamic-range photography together with infrared depth, after which a cloud service aligns the scans into a single 3D model that can be toured online or in a headset; supported hardware ranges from the Matterport Pro3 to certain iPhones and the Leica BLK360.[18] Such scans support virtual property tours, remote inspection, and reconstructions of locations for VR experiences.[18][4]

Output and limitations

The raw output of any scanner is a point cloud; downstream processing turns it into a mesh and, where color is captured, a texture map. Each technique trades off accuracy, capture speed, range, and cost. Laser triangulation and structured light give fine detail on objects but short range; time-of-flight LiDAR covers large scenes at lower per-point accuracy; photogrammetry is inexpensive and texture-rich but sensitive to lighting and surface texture.[1][2] Shiny, transparent, or dark surfaces are difficult for optical methods because they reflect or scatter light unpredictably.[2] Because any single capture records only what was visible from the chosen viewpoints, optical scanning generally requires multiple angles or passes to cover an object fully; on headsets, a room scan likewise reflects only the layout at the time it was taken.[2][15]

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 "3D scanning". https://en.wikipedia.org/wiki/3D_scanning.
  2. 2.0 2.1 2.2 2.3 2.4 "3D scanning technologies - what is 3D scanning and how does it work?". https://www.aniwaa.com/guide/3d-scanners/3d-scanning-technologies-and-the-3d-scanning-process/.
  3. 3.0 3.1 3.2 "How to create VR content". https://www.artec3d.com/learning-center/how-to-make-vr-content.
  4. 4.0 4.1 4.2 "LiDAR Scanner: Mapping Your Reality". https://knowledge.vr-expert.com/kb/lidar-scanner-mapping-your-reality/.
  5. 5.0 5.1 "Types of 3D Scanners: Laser Triangulation, Structured-Light, Time-of-Flight, and Phase-Shift". https://www.3dmag.com/3d-wikipedia/types-of-3d-scanners-laser-triangulation-structured-light-time-of-flight-phase-shift/.
  6. "How does structured-light 3D scanning work?". https://www.artec3d.com/learning-center/structured-light-3d-scanning.
  7. 7.0 7.1 "Photogrammetry vs 3D Scanning". https://www.photomodeler.com/photogrammetry-vs-3d-scanning/.
  8. "Point clouds explained: scanning, processing, 3D models". https://www.wevolver.com/article/point-cloud-to-3d-model.
  9. 9.0 9.1 "History of 3D scanners". https://www.modena-aec.co.za/history-of-3d-scanners/.
  10. Levoy, M. et al. (2000). "The Digital Michelangelo Project: 3D Scanning of Large Statues". SIGGRAPH 2000. https://graphics.stanford.edu/papers/dmich-sig00/dmich-sig00-nogamma.pdf.
  11. Izadi, S.; Kim, D.; Hilliges, O.; Molyneaux, D.; Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, S.; Freeman, D.; Davison, A.; Fitzgibbon, A. (2011). "KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera". Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST '11). https://www.microsoft.com/en-us/research/publication/kinectfusion-real-time-3d-reconstruction-and-interaction-using-a-moving-depth-camera/.
  12. "Gaussian splatting vs. photogrammetry vs. NeRFs". https://get.teleport.varjo.com/blog/photogrammetry-vs-nerfs-gaussian-splatting-pros-and-cons.
  13. "Gaussian Splatting vs. Photogrammetry - The Basics". https://www.pix-pro.com/blog/gaussian-vs-photogrammetry.
  14. "Build Believable Mixed Reality Experiences with Mesh API and Depth API". https://developers.meta.com/horizon/blog/mesh-depth-api-meta-quest-3-developers-mixed-reality/.
  15. 15.0 15.1 "Meta aims to make Quest 3's room scanning more automatic". https://mixed-news.com/en/quest-3-automatic-room-scanning/.
  16. "Apple unveils new iPad Pro with LiDAR Scanner and trackpad support in iPadOS". 2020-03-18. https://www.apple.com/newsroom/2020/03/apple-unveils-new-ipad-pro-with-lidar-scanner-and-trackpad-support-in-ipados/.
  17. "Polycam Launches High-Speed LIDAR 3D Scanning App for Apple's 2020 iPad Pro Family". https://www.hackster.io/news/polycam-launches-high-speed-lidar-3d-scanning-app-for-apple-s-2020-ipad-pro-family-06cc09e086b3.
  18. 18.0 18.1 "How Does Matterport Work?". https://www.hometrack.net/blog/how-does-matterport-work.