SLAM: Difference between revisions
Appearance
Add linkto HoloLens |
Xinreality (talk | contribs) No edit summary |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
SLAM ( | {{see also|Terms|Technical Terms}} | ||
[[SLAM]] ('''S'''imultaneous '''L'''ocalization '''A'''nd '''M'''apping) is a computational problem and a set of [[algorithms]] used primarily in robotics and autonomous systems, including [[VR headset]]s and [[AR headset]]s.<ref name="DurrantWhyte2006">H. Durrant‑Whyte & T. Bailey, “Simultaneous Localization and Mapping: Part I,” ''IEEE Robotics & Automation Magazine'', 13 (2), 99–110, 2006. https://www.doc.ic.ac.uk/~ajd/Robotics/RoboticsResources/SLAMTutorial1.pdf</ref> The core challenge SLAM addresses is often described as a "chicken-and-egg problem": to know where you are, you need a map, but to build a map, you need to know where you are.<ref name="Cadena2016">C. Cadena ''et al.'', “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust‑Perception Age,” ''IEEE Transactions on Robotics'', 32 (6), 1309–1332, 2016. https://rpg.ifi.uzh.ch/docs/TRO16_cadena.pdf</ref> SLAM solves this by enabling a device, using data from its onboard [[sensors]] (like [[cameras]], [[IMU]]s, and sometimes [[depth sensors]] like [[Time-of-Flight|Time-of-Flight (ToF)]]), to construct a [[map]] of an unknown [[environment]] while simultaneously determining its own position and orientation ([[pose]]) within that newly created map.<ref name="AIInsight">A. Ranganathan, “The Oculus Insight positional tracking system,” AI Accelerator Institute, 27 Jun 2022. https://www.aiacceleratorinstitute.com/the-oculus-insight-positional-tracking-system-2/</ref> This self-contained process enables [[inside-out tracking]], meaning the device tracks its position in [[3D space]] without needing external sensors or markers (like [[Lighthouse]] base stations).<ref name="QuestInsight2018">Meta, “Introducing Oculus Quest, Our First 6DOF All‑in‑One VR System,” Developer Blog, 26 Sep 2018. https://developers.meta.com/horizon/blog/introducing-oculus-quest-our-first-6dof-all-in-one-vr-system/</ref> | |||
==How SLAM Works== | |||
SLAM systems typically involve several key components working together in a continuous feedback loop: | |||
* '''[[Feature Detection|Feature Detection/Tracking]]:''' Identifying salient points or features (often called [[landmarks]]) in the sensor data (for example corners in camera images using methods like the [[ORB feature detector]]). These features are tracked frame-to-frame as the device moves.<ref name="ORB2">R. Mur‑Artal & J. D. Tardós, “ORB‑SLAM2: an Open‑Source SLAM System for Monocular, Stereo and RGB‑D Cameras,” ''IEEE Transactions on Robotics'', 33 (5), 2017. https://arxiv.org/abs/1610.06475</ref> | |||
* '''[[Mapping]]:''' Using the tracked features and the device's estimated movement (odometry) to build and update a representation (the map) of the environment. This map might consist of sparse feature points (common for localization-focused SLAM) or denser representations like [[point cloud]]s or [[mesh]]es (useful for environmental understanding).<ref name="RTABMap">M. Labbé & F. Michaud, “RTAB‑Map as an Open‑Source Lidar and Visual SLAM Library for Large‑Scale and Long‑Term Online Operation,” ''Journal of Field Robotics'', 36 (2), 416–446, 2019. https://arxiv.org/abs/2403.06341</ref> | |||
* '''[[Localization]] (or Pose Estimation):''' Estimating the device's current position and orientation (pose) relative to the map it has built, often by observing how known landmarks appear from the current viewpoint. | |||
* '''[[Loop Closure]]:''' Recognizing when the device has returned to a previously visited location by matching current sensor data to earlier map data (for example using appearance-based methods like [[bag-of-words]]). This is crucial for correcting accumulated [[Drift (tracking)|drift]] (incremental errors) in the map and pose estimate, leading to a globally consistent map.<ref name="ORB3">C. Campos ''et al.'', “ORB‑SLAM3: An Accurate Open‑Source Library for Visual, Visual‑Inertial and Multi‑Map SLAM,” ''IEEE Transactions on Robotics'', 2021. https://arxiv.org/abs/2007.11898</ref> | |||
* '''[[Sensor Fusion]]:''' Often combining data from multiple sensors. [[Visual Inertial Odometry|Visual‑Inertial Odometry (VIO)]] is extremely common in modern SLAM, fusing camera data with [[IMU]] data.<ref name="ARKitVIO">Apple Inc., “Understanding World Tracking,” Apple Developer Documentation, accessed 3 May 2025. https://developer.apple.com/documentation/arkit/understanding-world-tracking</ref> The IMU provides high-frequency motion updates, improving robustness against fast motion, motion blur, or visually indistinct (textureless) surfaces where camera tracking alone might struggle. | |||
==SLAM vs. [[Visual Inertial Odometry]] (VIO)== | |||
While related and often used together, SLAM and [[Visual Inertial Odometry]] (VIO) have different primary goals: | |||
* '''[[VIO]]''' primarily focuses on estimating the device's ego-motion (how it moves relative to its immediate surroundings) by fusing visual data from cameras and motion data from an [[IMU]]. It's excellent for short-term, low-latency tracking but can accumulate [[Drift (tracking)|drift]] over time and doesn't necessarily build a persistent, globally consistent map optimized for re-localization or loop closure. Systems like Apple's [[ARKit]]<ref name="ARKitVIO" /> and Google's [[ARCore]]<ref name="ARCore">Google LLC, “ARCore Overview,” Google for Developers, accessed 3 May 2025. https://developers.google.com/ar</ref> rely heavily on VIO for tracking, adding surface detection and limited mapping but typically without the global map optimization and loop closure found in full SLAM systems. | |||
* '''SLAM''' focuses on building a map of the environment and localizing the device within that map. It aims for global consistency, often incorporating techniques like loop closure to correct drift. Many modern VR/AR tracking systems use VIO for the high-frequency motion estimation component within a larger SLAM framework that handles mapping, persistence, and drift correction. Essentially, VIO provides the odometry, while SLAM builds and refines the map using that odometry and sensor data. | |||
==Importance in VR/AR== | |||
SLAM (often incorporating VIO) is fundamental technology for modern standalone [[VR headset]]s and [[AR headset]]s/[[Smart Glasses|glasses]]: | |||
* '''[[6DoF]] Tracking:''' Enables full six-degrees-of-freedom tracking (positional and rotational) without external base stations, allowing users to move freely within their [[Playspace|playspace]].<ref name="QuestInsight2018" /> | |||
* '''[[World Locking|World-Locking]]:''' Ensures virtual objects appear stable and fixed in the real world (for AR/[[Mixed Reality|MR]]) or that the virtual environment remains stable relative to the user's playspace (for VR). | |||
* '''[[Roomscale VR|Roomscale]] Experiences & Environment Understanding:''' Defines boundaries (like [[Meta Quest Insight|Meta's Guardian]]) and understands the physical playspace (surfaces, obstacles) for safety, interaction, and realistic occlusion (virtual objects hidden by real ones). | |||
* '''[[Passthrough AR|Passthrough]] and [[Mixed Reality]]:''' Helps align virtual content accurately with the real-world view captured by device cameras. | |||
* '''Persistent Anchors & Shared Experiences:''' Allows digital content to be saved and anchored to specific locations in the real world ([[Spatial Anchor|spatial anchors]]), enabling multi-user experiences where participants see the same virtual objects in the same real-world spots across different sessions or devices. | |||
==Types and Algorithms== | |||
SLAM systems can be categorized based on the primary sensors used and the algorithmic approach: | |||
* '''Visual SLAM (vSLAM):''' Relies mainly on [[cameras]]. Can be monocular (one camera), stereo (two cameras), or RGB-D (using a [[depth sensor]]). Often fused with [[IMU]] data ([[Visual Inertial Odometry|VIO-SLAM]]).<ref name="Cadena2016" /> | |||
** '''[[ORB-SLAM2]]''': A widely cited open-source library using [[ORB feature detector|ORB features]]. It supports monocular, stereo, and RGB-D cameras but is purely vision-based (no IMU). Known for robust relocalization and creating sparse feature maps.<ref name="ORB2" /> | |||
** '''[[ORB-SLAM3]]''': An evolution of ORB-SLAM2 (released c. 2020/21) adding tight visual-inertial fusion (camera + IMU) for significantly improved accuracy and robustness, especially during fast motion.<ref name="ORB3" /> | |||
** '''[[RTAB-Map]]''' (Real-Time Appearance-Based Mapping): An open-source graph-based SLAM approach focused on long-term and large-scale mapping, often used with RGB-D or stereo cameras to build dense maps.<ref name="RTABMap" /> | |||
* '''[[LiDAR]] SLAM:''' Uses Light Detection and Ranging sensors. Common in robotics and autonomous vehicles, and used in some high-end AR/MR devices (like [[Apple Vision Pro]]),<ref name="AppleVision2023">Apple Inc., “Introducing Apple Vision Pro,” Newsroom, 5 Jun 2023. https://www.apple.com/newsroom/2023/06/introducing-apple-vision-pro/</ref><ref name="WiredVisionPro">L. Bonnington, “Apple’s Mixed‑Reality Headset, Vision Pro, Is Here,” ''Wired'', 5 Jun 2023. https://www.wired.com/story/apple-vision-pro-specs-price-release-date</ref> often fused with cameras and IMUs for enhanced mapping and tracking robustness. | |||
* '''Filter-based vs. Optimization-based:''' Historically, methods like [[Extended Kalman Filter|EKF‑SLAM]] were common (filter‑based).<ref name="EKF">J. Sun ''et al.'', “An Extended Kalman Filter for Magnetic Field SLAM Using Gaussian Process,” ''Sensors'', 22 (8), 2833, 2022. https://www.mdpi.com/1424-8220/22/8/2833</ref> Modern systems often use graph-based optimization techniques (like [[bundle adjustment]]) which optimize the entire trajectory and map simultaneously, especially after loop closures, generally leading to higher accuracy. | |||
==Examples in VR/AR Devices== | |||
Many consumer VR/AR devices utilize SLAM or SLAM-like systems, often incorporating VIO: | |||
* '''[[Meta Quest]] Headsets ([[Meta Quest 2]], [[Meta Quest 3]], [[Meta Quest Pro]]):''' Use [[Meta Quest Insight|Insight tracking]], a sophisticated inside‑out system based heavily on VIO with SLAM components.<ref name="QuestInsight2018" /> | |||
* '''[[Microsoft HoloLens|HoloLens 1]] (2016) & [[Microsoft HoloLens 2|HoloLens 2]]:''' Employ advanced SLAM systems using multiple cameras, a [[Time-of-Flight|ToF]] depth sensor, and an IMU for robust spatial mapping.<ref name="HoloLens2">Microsoft, “HoloLens 2 hardware,” Microsoft Learn, accessed 3 May 2025. https://learn.microsoft.com/hololens/hololens2-hardware</ref> | |||
* '''[[Magic Leap 1]] (2018) & [[Magic Leap 2]]:''' Utilize SLAM (“Visual Perception”) with an array of cameras and sensors for environment mapping and head tracking.<ref name="MagicLeap2">Magic Leap, “Spatial Mapping for Magic Leap 2,” 29 Mar 2025. https://www.magicleap.com/legal/spatial-mapping-ml2</ref> | |||
* '''[[Apple Vision Pro]]:''' Features an advanced tracking system fusing data from numerous cameras, [[LiDAR]], depth sensors, and IMUs.<ref name="AppleVision2023" /> | |||
* Many [[Windows Mixed Reality]] headsets. | |||
* [[Pico Neo 3 Link|Pico Neo 3]], [[Pico 4]]. | |||
==References== | |||
<references /> | |||
[[Category:Terms]] | |||
[[Category:Technical Terms]] | |||
[[Category:Tracking]] | |||
[[Category:Computer Vision]] | |||
[[Category:Core Concepts]] | |||
[[Category:Algorithms]] |