Jump to content

SLAM: Difference between revisions

Acro (talk | contribs)
Add linkto HoloLens
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
SLAM (Simultaneous Localization And Mapping) is a method of inside-out 3D tracking based on optical data of an environment that does not require any additional hardware other than the device being tracked. It is similar to visual inertial odometry (VIO).
{{see also|Terms|Technical Terms}}
[[SLAM]] ('''S'''imultaneous '''L'''ocalization '''A'''nd '''M'''apping) is a computational problem and a set of [[algorithms]] used primarily in robotics and autonomous systems, including [[VR headset]]s and [[AR headset]]s.<ref name="DurrantWhyte2006">H. Durrant‑Whyte & T. Bailey, “Simultaneous Localization and Mapping: Part I,” ''IEEE Robotics & Automation Magazine'', 13 (2), 99–110, 2006. https://www.doc.ic.ac.uk/~ajd/Robotics/RoboticsResources/SLAMTutorial1.pdf</ref> The core challenge SLAM addresses is often described as a "chicken-and-egg problem": to know where you are, you need a map, but to build a map, you need to know where you are.<ref name="Cadena2016">C. Cadena ''et al.'', “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust‑Perception Age,” ''IEEE Transactions on Robotics'', 32 (6), 1309–1332, 2016. https://rpg.ifi.uzh.ch/docs/TRO16_cadena.pdf</ref> SLAM solves this by enabling a device, using data from its onboard [[sensors]] (like [[cameras]], [[IMU]]s, and sometimes [[depth sensors]] like [[Time-of-Flight|Time-of-Flight (ToF)]]), to construct a [[map]] of an unknown [[environment]] while simultaneously determining its own position and orientation ([[pose]]) within that newly created map.<ref name="AIInsight">A. Ranganathan, “The Oculus Insight positional tracking system,” AI Accelerator Institute, 27 Jun 2022. https://www.aiacceleratorinstitute.com/the-oculus-insight-positional-tracking-system-2/</ref> This self-contained process enables [[inside-out tracking]], meaning the device tracks its position in [[3D space]] without needing external sensors or markers (like [[Lighthouse]] base stations).<ref name="QuestInsight2018">Meta, “Introducing Oculus Quest, Our First 6DOF All‑in‑One VR System,” Developer Blog, 26 Sep 2018. https://developers.meta.com/horizon/blog/introducing-oculus-quest-our-first-6dof-all-in-one-vr-system/</ref>


One method of SLAM is ORB_SLAM2. Another method is ORB_SLAM3. Another SLAM-like method is RTAB-Map.
==How SLAM Works==
SLAM systems typically involve several key components working together in a continuous feedback loop:
* '''[[Feature Detection|Feature Detection/Tracking]]:''' Identifying salient points or features (often called [[landmarks]]) in the sensor data (for example corners in camera images using methods like the [[ORB feature detector]]). These features are tracked frame-to-frame as the device moves.<ref name="ORB2">R. Mur‑Artal & J. D. Tardós, “ORB‑SLAM2: an Open‑Source SLAM System for Monocular, Stereo and RGB‑D Cameras,” ''IEEE Transactions on Robotics'', 33 (5), 2017. https://arxiv.org/abs/1610.06475</ref>
* '''[[Mapping]]:''' Using the tracked features and the device's estimated movement (odometry) to build and update a representation (the map) of the environment. This map might consist of sparse feature points (common for localization-focused SLAM) or denser representations like [[point cloud]]s or [[mesh]]es (useful for environmental understanding).<ref name="RTABMap">M. Labbé & F. Michaud, “RTAB‑Map as an Open‑Source Lidar and Visual SLAM Library for Large‑Scale and Long‑Term Online Operation,” ''Journal of Field Robotics'', 36 (2), 416–446, 2019. https://arxiv.org/abs/2403.06341</ref>
* '''[[Localization]] (or Pose Estimation):''' Estimating the device's current position and orientation (pose) relative to the map it has built, often by observing how known landmarks appear from the current viewpoint.
* '''[[Loop Closure]]:''' Recognizing when the device has returned to a previously visited location by matching current sensor data to earlier map data (for example using appearance-based methods like [[bag-of-words]]). This is crucial for correcting accumulated [[Drift (tracking)|drift]] (incremental errors) in the map and pose estimate, leading to a globally consistent map.<ref name="ORB3">C. Campos ''et al.'', “ORB‑SLAM3: An Accurate Open‑Source Library for Visual, Visual‑Inertial and Multi‑Map SLAM,” ''IEEE Transactions on Robotics'', 2021. https://arxiv.org/abs/2007.11898</ref>
* '''[[Sensor Fusion]]:''' Often combining data from multiple sensors. [[Visual Inertial Odometry|Visual‑Inertial Odometry (VIO)]] is extremely common in modern SLAM, fusing camera data with [[IMU]] data.<ref name="ARKitVIO">Apple Inc., “Understanding World Tracking,” Apple Developer Documentation, accessed 3 May 2025. https://developer.apple.com/documentation/arkit/understanding-world-tracking</ref> The IMU provides high-frequency motion updates, improving robustness against fast motion, motion blur, or visually indistinct (textureless) surfaces where camera tracking alone might struggle.


The [[HoloLens|Hololens 1]] and the [[Magic Leap 1]] both use a SLAM-type system for their 3D tracking.
==SLAM vs. [[Visual Inertial Odometry]] (VIO)==
While related and often used together, SLAM and [[Visual Inertial Odometry]] (VIO) have different primary goals:
* '''[[VIO]]''' primarily focuses on estimating the device's ego-motion (how it moves relative to its immediate surroundings) by fusing visual data from cameras and motion data from an [[IMU]]. It's excellent for short-term, low-latency tracking but can accumulate [[Drift (tracking)|drift]] over time and doesn't necessarily build a persistent, globally consistent map optimized for re-localization or loop closure. Systems like Apple's [[ARKit]]<ref name="ARKitVIO" /> and Google's [[ARCore]]<ref name="ARCore">Google LLC, “ARCore Overview,” Google for Developers, accessed 3 May 2025. https://developers.google.com/ar</ref> rely heavily on VIO for tracking, adding surface detection and limited mapping but typically without the global map optimization and loop closure found in full SLAM systems.
* '''SLAM''' focuses on building a map of the environment and localizing the device within that map. It aims for global consistency, often incorporating techniques like loop closure to correct drift. Many modern VR/AR tracking systems use VIO for the high-frequency motion estimation component within a larger SLAM framework that handles mapping, persistence, and drift correction. Essentially, VIO provides the odometry, while SLAM builds and refines the map using that odometry and sensor data.


The [[Oculus Quest 2]] uses a tracking system that is some variant of SLAM or VIO.
==Importance in VR/AR==
SLAM (often incorporating VIO) is fundamental technology for modern standalone [[VR headset]]s and [[AR headset]]s/[[Smart Glasses|glasses]]:
* '''[[6DoF]] Tracking:''' Enables full six-degrees-of-freedom tracking (positional and rotational) without external base stations, allowing users to move freely within their [[Playspace|playspace]].<ref name="QuestInsight2018" />
* '''[[World Locking|World-Locking]]:''' Ensures virtual objects appear stable and fixed in the real world (for AR/[[Mixed Reality|MR]]) or that the virtual environment remains stable relative to the user's playspace (for VR).
* '''[[Roomscale VR|Roomscale]] Experiences & Environment Understanding:''' Defines boundaries (like [[Meta Quest Insight|Meta's Guardian]]) and understands the physical playspace (surfaces, obstacles) for safety, interaction, and realistic occlusion (virtual objects hidden by real ones).
* '''[[Passthrough AR|Passthrough]] and [[Mixed Reality]]:''' Helps align virtual content accurately with the real-world view captured by device cameras.
* '''Persistent Anchors & Shared Experiences:''' Allows digital content to be saved and anchored to specific locations in the real world ([[Spatial Anchor|spatial anchors]]), enabling multi-user experiences where participants see the same virtual objects in the same real-world spots across different sessions or devices.
 
==Types and Algorithms==
SLAM systems can be categorized based on the primary sensors used and the algorithmic approach:
* '''Visual SLAM (vSLAM):''' Relies mainly on [[cameras]]. Can be monocular (one camera), stereo (two cameras), or RGB-D (using a [[depth sensor]]). Often fused with [[IMU]] data ([[Visual Inertial Odometry|VIO-SLAM]]).<ref name="Cadena2016" />
** '''[[ORB-SLAM2]]''': A widely cited open-source library using [[ORB feature detector|ORB features]]. It supports monocular, stereo, and RGB-D cameras but is purely vision-based (no IMU). Known for robust relocalization and creating sparse feature maps.<ref name="ORB2" />
** '''[[ORB-SLAM3]]''': An evolution of ORB-SLAM2 (released c. 2020/21) adding tight visual-inertial fusion (camera + IMU) for significantly improved accuracy and robustness, especially during fast motion.<ref name="ORB3" />
** '''[[RTAB-Map]]''' (Real-Time Appearance-Based Mapping): An open-source graph-based SLAM approach focused on long-term and large-scale mapping, often used with RGB-D or stereo cameras to build dense maps.<ref name="RTABMap" />
* '''[[LiDAR]] SLAM:''' Uses Light Detection and Ranging sensors. Common in robotics and autonomous vehicles, and used in some high-end AR/MR devices (like [[Apple Vision Pro]]),<ref name="AppleVision2023">Apple Inc., “Introducing Apple Vision Pro,” Newsroom, 5 Jun 2023. https://www.apple.com/newsroom/2023/06/introducing-apple-vision-pro/</ref><ref name="WiredVisionPro">L. Bonnington, “Apple’s Mixed‑Reality Headset, Vision Pro, Is Here,” ''Wired'', 5 Jun 2023. https://www.wired.com/story/apple-vision-pro-specs-price-release-date</ref> often fused with cameras and IMUs for enhanced mapping and tracking robustness.
* '''Filter-based vs. Optimization-based:''' Historically, methods like [[Extended Kalman Filter|EKF‑SLAM]] were common (filter‑based).<ref name="EKF">J. Sun ''et al.'', “An Extended Kalman Filter for Magnetic Field SLAM Using Gaussian Process,” ''Sensors'', 22 (8), 2833, 2022. https://www.mdpi.com/1424-8220/22/8/2833</ref> Modern systems often use graph-based optimization techniques (like [[bundle adjustment]]) which optimize the entire trajectory and map simultaneously, especially after loop closures, generally leading to higher accuracy.
 
==Examples in VR/AR Devices==
Many consumer VR/AR devices utilize SLAM or SLAM-like systems, often incorporating VIO:
* '''[[Meta Quest]] Headsets ([[Meta Quest 2]], [[Meta Quest 3]], [[Meta Quest Pro]]):''' Use [[Meta Quest Insight|Insight tracking]], a sophisticated inside‑out system based heavily on VIO with SLAM components.<ref name="QuestInsight2018" />
* '''[[Microsoft HoloLens|HoloLens 1]] (2016) & [[Microsoft HoloLens 2|HoloLens 2]]:''' Employ advanced SLAM systems using multiple cameras, a [[Time-of-Flight|ToF]] depth sensor, and an IMU for robust spatial mapping.<ref name="HoloLens2">Microsoft, “HoloLens 2 hardware,” Microsoft Learn, accessed 3 May 2025. https://learn.microsoft.com/hololens/hololens2-hardware</ref>
* '''[[Magic Leap 1]] (2018) & [[Magic Leap 2]]:''' Utilize SLAM (“Visual Perception”) with an array of cameras and sensors for environment mapping and head tracking.<ref name="MagicLeap2">Magic Leap, “Spatial Mapping for Magic Leap 2,” 29 Mar 2025. https://www.magicleap.com/legal/spatial-mapping-ml2</ref>
* '''[[Apple Vision Pro]]:''' Features an advanced tracking system fusing data from numerous cameras, [[LiDAR]], depth sensors, and IMUs.<ref name="AppleVision2023" />
* Many [[Windows Mixed Reality]] headsets.
* [[Pico Neo 3 Link|Pico Neo 3]], [[Pico 4]].
 
==References==
<references />
 
[[Category:Terms]]
[[Category:Technical Terms]]
[[Category:Tracking]]
[[Category:Computer Vision]]
[[Category:Core Concepts]]
[[Category:Algorithms]]