Jump to content

SLAM: Difference between revisions

No edit summary
No edit summary
Line 1: Line 1:
[[SLAM]] (**S**imultaneous **L**ocalization **A**nd **M**apping) is a computational problem and a set of [[algorithms]] used primarily in robotics and autonomous systems, including [[VR headset]]s and [[AR headset]]s. The core challenge SLAM addresses is often described as a "chicken-and-egg problem": to know where you are, you need a map, but to build a map, you need to know where you are. SLAM solves this by enabling a device, using data from its onboard [[sensors]] (like [[cameras]], [[IMU]]s, and sometimes [[depth sensors]] like [[Time-of-Flight|Time-of-Flight (ToF)]]), to construct a [[map]] of an unknown [[environment]] while simultaneously determining its own position and orientation ([[pose]]) within that newly created map. This self-contained process enables [[inside-out tracking]], meaning the device tracks its position in [[3D space]] without needing external sensors or markers (like [[Lighthouse]] base stations).
[[SLAM]] ('''S'''imultaneous '''L'''ocalization '''A'''nd '''M'''apping) is a computational problem and a set of [[algorithms]] used primarily in robotics and autonomous systems, including [[VR headset]]s and [[AR headset]]s. The core challenge SLAM addresses is often described as a "chicken-and-egg problem": to know where you are, you need a map, but to build a map, you need to know where you are. SLAM solves this by enabling a device, using data from its onboard [[sensors]] (like [[cameras]], [[IMU]]s, and sometimes [[depth sensors]] like [[Time-of-Flight|Time-of-Flight (ToF)]]), to construct a [[map]] of an unknown [[environment]] while simultaneously determining its own position and orientation ([[pose]]) within that newly created map. This self-contained process enables [[inside-out tracking]], meaning the device tracks its position in [[3D space]] without needing external sensors or markers (like [[Lighthouse]] base stations).


=== How SLAM Works ===
==How SLAM Works==
SLAM systems typically involve several key components working together in a continuous feedback loop:
SLAM systems typically involve several key components working together in a continuous feedback loop:
* '''[[Feature Detection|Feature Detection/Tracking]]:''' Identifying salient points or features (often called [[landmarks]]) in the sensor data (e.g., corners in camera images using methods like the [[ORB feature detector]]). These features are tracked frame-to-frame as the device moves.
* '''[[Feature Detection|Feature Detection/Tracking]]:''' Identifying salient points or features (often called [[landmarks]]) in the sensor data (e.g., corners in camera images using methods like the [[ORB feature detector]]). These features are tracked frame-to-frame as the device moves.
Line 9: Line 9:
* '''[[Sensor Fusion]]:''' Often combining data from multiple sensors. [[Visual Inertial Odometry|Visual-Inertial Odometry (VIO)]] is extremely common in modern SLAM, fusing camera data with [[IMU]] data. The IMU provides high-frequency motion updates, improving robustness against fast motion, motion blur, or visually indistinct (textureless) surfaces where camera tracking alone might struggle.
* '''[[Sensor Fusion]]:''' Often combining data from multiple sensors. [[Visual Inertial Odometry|Visual-Inertial Odometry (VIO)]] is extremely common in modern SLAM, fusing camera data with [[IMU]] data. The IMU provides high-frequency motion updates, improving robustness against fast motion, motion blur, or visually indistinct (textureless) surfaces where camera tracking alone might struggle.


=== SLAM vs. [[Visual Inertial Odometry]] (VIO) ===
==SLAM vs. [[Visual Inertial Odometry]] (VIO)==
While related and often used together, SLAM and [[Visual Inertial Odometry]] (VIO) have different primary goals:
While related and often used together, SLAM and [[Visual Inertial Odometry]] (VIO) have different primary goals:
* '''[[VIO]]''' primarily focuses on estimating the device's ego-motion (how it moves relative to its immediate surroundings) by fusing visual data from cameras and motion data from an [[IMU]]. It's excellent for short-term, low-latency tracking but can accumulate [[Drift (tracking)|drift]] over time and doesn't necessarily build a persistent, globally consistent map optimized for re-localization or loop closure. Systems like Apple's [[ARKit]] and Google's [[ARCore]] rely heavily on VIO for tracking, adding surface detection and limited mapping but typically without the global map optimization and loop closure found in full SLAM systems.
* '''[[VIO]]''' primarily focuses on estimating the device's ego-motion (how it moves relative to its immediate surroundings) by fusing visual data from cameras and motion data from an [[IMU]]. It's excellent for short-term, low-latency tracking but can accumulate [[Drift (tracking)|drift]] over time and doesn't necessarily build a persistent, globally consistent map optimized for re-localization or loop closure. Systems like Apple's [[ARKit]] and Google's [[ARCore]] rely heavily on VIO for tracking, adding surface detection and limited mapping but typically without the global map optimization and loop closure found in full SLAM systems.
* '''SLAM''' focuses on building a map of the environment and localizing the device within that map. It aims for global consistency, often incorporating techniques like loop closure to correct drift. Many modern VR/AR tracking systems use VIO for the high-frequency motion estimation component within a larger SLAM framework that handles mapping, persistence, and drift correction. Essentially, VIO provides the odometry, while SLAM builds and refines the map using that odometry and sensor data.
* '''SLAM''' focuses on building a map of the environment and localizing the device within that map. It aims for global consistency, often incorporating techniques like loop closure to correct drift. Many modern VR/AR tracking systems use VIO for the high-frequency motion estimation component within a larger SLAM framework that handles mapping, persistence, and drift correction. Essentially, VIO provides the odometry, while SLAM builds and refines the map using that odometry and sensor data.


=== Importance in VR/AR ===
==Importance in VR/AR==
SLAM (often incorporating VIO) is fundamental technology for modern standalone [[VR headset]]s and [[AR headset]]s/[[Smart Glasses|glasses]]:
SLAM (often incorporating VIO) is fundamental technology for modern standalone [[VR headset]]s and [[AR headset]]s/[[Smart Glasses|glasses]]:
* '''[[6DoF]] Tracking:''' Enables full six-degrees-of-freedom tracking (positional and rotational) without external base stations, allowing users to move freely within their [[Playspace|playspace]].
* '''[[6DoF]] Tracking:''' Enables full six-degrees-of-freedom tracking (positional and rotational) without external base stations, allowing users to move freely within their [[Playspace|playspace]].
Line 22: Line 22:
* '''Persistent Anchors & Shared Experiences:''' Allows digital content to be saved and anchored to specific locations in the real world ([[Spatial Anchor|spatial anchors]]), enabling multi-user experiences where participants see the same virtual objects in the same real-world spots across different sessions or devices.
* '''Persistent Anchors & Shared Experiences:''' Allows digital content to be saved and anchored to specific locations in the real world ([[Spatial Anchor|spatial anchors]]), enabling multi-user experiences where participants see the same virtual objects in the same real-world spots across different sessions or devices.


=== Types and Algorithms ===
==Types and Algorithms==
SLAM systems can be categorized based on the primary sensors used and the algorithmic approach:
SLAM systems can be categorized based on the primary sensors used and the algorithmic approach:
* '''Visual SLAM (vSLAM):''' Relies mainly on [[cameras]]. Can be monocular (one camera), stereo (two cameras), or RGB-D (using a [[depth sensor]]). Often fused with [[IMU]] data ([[Visual Inertial Odometry|VIO-SLAM]]).
* '''Visual SLAM (vSLAM):''' Relies mainly on [[cameras]]. Can be monocular (one camera), stereo (two cameras), or RGB-D (using a [[depth sensor]]). Often fused with [[IMU]] data ([[Visual Inertial Odometry|VIO-SLAM]]).
Line 31: Line 31:
* '''Filter-based vs. Optimization-based:''' Historically, methods like [[Extended Kalman Filter|EKF-SLAM]] were common (filter-based). Modern systems often use graph-based optimization techniques (like [[bundle adjustment]]) which optimize the entire trajectory and map simultaneously, especially after loop closures, generally leading to higher accuracy.
* '''Filter-based vs. Optimization-based:''' Historically, methods like [[Extended Kalman Filter|EKF-SLAM]] were common (filter-based). Modern systems often use graph-based optimization techniques (like [[bundle adjustment]]) which optimize the entire trajectory and map simultaneously, especially after loop closures, generally leading to higher accuracy.


=== Examples in VR/AR Devices ===
==Examples in VR/AR Devices==
Many consumer VR/AR devices utilize SLAM or SLAM-like systems, often incorporating VIO:
Many consumer VR/AR devices utilize SLAM or SLAM-like systems, often incorporating VIO:
* '''[[Meta Quest]] Headsets ([[Meta Quest 2]], [[Meta Quest 3]], [[Meta Quest Pro]]):''' Use [[Meta Quest Insight|Insight tracking]], a sophisticated inside-out system based heavily on VIO (using 4 low-light [[fisheye lens|fisheye]] cameras and an IMU on Quest 2/Pro/3) with SLAM components for mapping (sparse feature map), boundary definition (Guardian), persistence, and enabling features like Passthrough and Space Sense. Considered a breakthrough for affordable, high-quality consumer VR tracking.
* '''[[Meta Quest]] Headsets ([[Meta Quest 2]], [[Meta Quest 3]], [[Meta Quest Pro]]):''' Use [[Meta Quest Insight|Insight tracking]], a sophisticated inside-out system based heavily on VIO (using 4 low-light [[fisheye lens|fisheye]] cameras and an IMU on Quest 2/Pro/3) with SLAM components for mapping (sparse feature map), boundary definition (Guardian), persistence, and enabling features like Passthrough and Space Sense. Considered a breakthrough for affordable, high-quality consumer VR tracking.