Markerless outside-in tracking: Difference between revisions
Appearance
Xinreality (talk | contribs) No edit summary |
Xinreality (talk | contribs) No edit summary Tag: Reverted |
||
Line 1: | Line 1: | ||
{{see also| | {{see also|Outside-in tracking|Markerless tracking|Positional tracking}} | ||
==Introduction== | ==Introduction== | ||
'''[[Markerless outside-in tracking]]''' is a | '''[[Markerless outside-in tracking]]''' is a form of [[positional tracking]] for [[virtual reality]] (VR) and [[augmented reality]] (AR) that estimates a user’s six-degree-of-freedom pose from externally mounted [[depth sensing|depth-sensing]] or RGB cameras without requiring any [[fiducial marker]]s. Instead, per-frame depth or colour images are processed by [[computer vision]] algorithms that segment the scene, classify body parts and fit a kinematic skeleton, enabling real-time [[motion capture]] and interaction.<ref name="Shotton2011" /> | ||
==Underlying technology== | ==Underlying technology== | ||
A typical | A typical pipeline combines specialised hardware with software-based human-pose estimation: | ||
* '''Sensing layer''' – One or more fixed [[RGB-D]] or [[infra-red]] depth cameras stream point clouds. The original Microsoft Kinect projects a near-IR [[structured light]] pattern, whereas Kinect V2 and Azure Kinect use [[time-of-flight camera|time-of-flight]] ranging.<ref name="Zhang2012" /> The effective operating range for Kinect v1 is ≈ 0.8 – 4.5 m (specification upper limit 5 m).<ref name="DepthRange2012" /> | |||
* '''Segmentation''' – Foreground extraction isolates user pixels from background geometry. | |||
* '''Per-pixel body-part classification''' – A Randomised Decision Forest labels each pixel (head, hand, torso, …).<ref name="Shotton2011" /> | |||
* '''Skeletal reconstruction and filtering''' – Joint positions are inferred and temporally filtered to reduce jitter, producing head- and hand-pose data consumable by VR/AR engines. | |||
Although a single depth camera can suffice, multi-camera rigs expand coverage and reduce occlusions. Open-source and proprietary middleware (e.g., [[OpenNI]]/NiTE 2, Microsoft Kinect SDK) expose joint-stream APIs for developers.<ref name="NiTE2013" /> | |||
Measured end-to-end skeleton latency for Kinect ranges from 60 – 90 ms, depending on model and SDK settings.<ref name="Livingston2012" /> | |||
Although a single camera can suffice, multi-camera rigs | |||
==Markerless vs. marker-based tracking== | ==Markerless vs. marker-based tracking== | ||
Marker-based outside-in systems | Marker-based outside-in systems such as **Vicon** optical mocap or HTC Vive **Lighthouse** attach retro-reflective spheres or use on-device photodiodes that read sweeping IR lasers from the base stations, achieving sub-millimetre precision and motion-to-photon latency below 10 ms.<ref name="ViconSpec" /><ref name="Lighthouse2016" /> | ||
Markerless alternatives remove physical targets, improving comfort and setup time, but at the cost of: | |||
* | * '''Lower positional accuracy and higher latency''' – Depth-sensor noise plus the 60 – 90 ms processing pipeline produce millimetre- to centimetre-level error.<ref name="Guffanti2020" /> | ||
* | * '''Sensitivity to occlusion''' – Body parts outside the camera’s line-of-sight are temporarily lost. | ||
==History and notable systems== | ==History and notable systems== | ||
Line 25: | Line 24: | ||
! Year !! System !! Notes | ! Year !! System !! Notes | ||
|- | |- | ||
| 2003 || [[EyeToy]] (PlayStation 2) || 2-D silhouette tracking with a single RGB camera | | 2003 || [[EyeToy]] (PlayStation 2) || 2-D silhouette tracking with a single RGB camera.<ref name="EyeToy2004" /> | ||
|- | |- | ||
| 2010 || [[Kinect]] for Xbox 360 || | | 2010 || [[Kinect]] for Xbox 360 || First consumer structured-light depth sensor with real-time full-body skeletons (up to six users).<ref name="Microsoft2010" /> | ||
|- | |- | ||
| 2014 – 2016 || Research prototypes || | | 2014 – 2016 || Research prototypes || Academic work showed Kinect V2 could deliver 6-DOF head- and hand-pose input for DIY VR HMDs.<ref name="Livingston2012" /> | ||
|- | |- | ||
| 2017 || Kinect production ends || Microsoft discontinued Kinect hardware as commercial VR shifted toward marker-based | | 2017 || Kinect production ends || Microsoft discontinued Kinect hardware as commercial VR shifted toward inside-out and marker-based solutions.<ref name="Microsoft2017" /> | ||
|} | |} | ||
==Applications== | ==Applications== | ||
* **Gaming | * **Gaming & entertainment** – Titles such as ''Kinect Sports'' map whole-body actions to avatars; some VR chat platforms still use Kinect skeletons. | ||
* **Rehabilitation | * **Rehabilitation & exercise** – Clinicians monitor range-of-motion without attaching markers.<ref name="Wade2022" /> | ||
* **Interactive installations** – | * **Interactive installations** – Depth cameras create “magic-mirror” AR exhibits in museums. | ||
* **Telepresence** – Multi- | * **Telepresence** – Multi-camera arrays stream volumetric avatars into shared virtual spaces. | ||
==Advantages== | ==Advantages== | ||
* | * No wearable markers. | ||
* | * Rapid single-sensor setup; no lighthouse calibration. | ||
* | * Simultaneous multi-user support. | ||
* | * Lower hardware cost than professional optical mocap rigs. | ||
==Disadvantages== | ==Disadvantages== | ||
* | * Occlusion sensitivity – furniture or other players can block tracking. | ||
* | * Reduced accuracy and 60 – 90 ms latency compared with lighthouse or Vicon systems.<ref name="Guffanti2020" /><ref name="Lighthouse2016" /> | ||
* | * Environmental constraints – bright sunlight or glossy surfaces degrade depth quality. | ||
* | * Limited range and FOV – reliable only within ≈ 0.8 – 4.5 m for Kinect-class sensors.<ref name="DepthRange2012" /> | ||
==References== | ==References== | ||
<ref name="Shotton2011">Shotton | <references> | ||
<ref name="Shotton2011">Shotton J. ''et al.'' (2011). “Real-Time Human Pose Recognition in Parts from a Single Depth Image.” *CVPR 2011.*</ref> | |||
.</ref> | <ref name="Zhang2012">Zhang Z. (2012). “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia* 19 (2): 4–10.</ref> | ||
<ref name="Zhang2012"> | <ref name="DepthRange2012">Khoshelham K.; Elberink S. (2012). “Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications.” *Sensors* 12 (2): 1437 – 1454.</ref> | ||
<ref name="NiTE2013">OpenNI Foundation (2013). “NiTE 2.0 User Guide.”</ref> | |||
.</ref> | <ref name="Livingston2012">Livingston M. A. ''et al.'' (2012). “Performance Measurements for the Microsoft Kinect Skeleton.” *IEEE VR 2012 Workshop.*</ref> | ||
<ref name=" | <ref name="Guffanti2020">Guffanti D. ''et al.'' (2020). “Accuracy of the Microsoft Kinect V2 Sensor for Human Gait Analysis.” *Sensors* 20 (16): 4405.</ref> | ||
<ref name=" | <ref name="ViconSpec">Vicon Motion Systems. “Vicon Tracker – Latency down to 2.5 ms.” Product sheet.</ref> | ||
<ref name="Lighthouse2016">Malventano A. (2016). “SteamVR HTC Vive In-depth – Lighthouse Tracking System Dissected.” *PC Perspective.*</ref> | |||
.</ref> | <ref name="EyeToy2004">Pham A. (2004-01-18). “EyeToy Springs From One Man’s Vision.” *Los Angeles Times.*</ref> | ||
<ref name=" | <ref name="Microsoft2010">Microsoft News Center (2010-11-04). “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.”</ref> | ||
<ref name="Wade2022">Wade L. ''et al.'' (2022). “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *PeerJ* 10:e12995.</ref> | |||
<ref name="Microsoft2017">Good O. S. (2017-10-25). “Kinect is officially dead. Really. Officially. It’s dead.” *Polygon.*</ref> | |||
<ref name="Microsoft2010">Microsoft News Center (2010-11-04). | </references> | ||
<ref name=" | |||
.</ref> | |||
<ref name="Microsoft2017">Good | |||
[[Category:Terms]] | [[Category:Terms]] |