Markerless outside-in tracking: Difference between revisions
Appearance
Xinreality (talk | contribs) Undo revision 34791 by Xinreality (talk) Tag: Undo |
Xinreality (talk | contribs) No edit summary |
||
Line 3: | Line 3: | ||
==Introduction== | ==Introduction== | ||
'''[[Markerless outside-in tracking]]''' is a subtype of [[positional tracking]] used in [[virtual reality]] (VR) and [[augmented reality]] (AR). In this approach, external [[camera]]s or other [[depth sensing]] devices positioned in the environment estimate the six-degree-of-freedom pose of a user or object without relying on any [[fiducial marker]]s. Instead, [[computer vision]] algorithms analyse the incoming colour or depth stream to detect and follow natural scene features or the user’s own body, enabling real-time [[motion capture]] and interaction.<ref name="Shotton2011" /> | '''[[Markerless outside-in tracking]]''' is a subtype of [[positional tracking]] used in [[virtual reality]] (VR) and [[augmented reality]] (AR). In this approach, external [[camera]]s or other [[depth sensing]] devices positioned in the environment estimate the six-degree-of-freedom ([[6DOF]]) [[pose]] of a user or object without relying on any [[fiducial marker]]s. Instead, [[computer vision]] algorithms analyse the incoming colour or depth stream to detect and follow natural scene features or the user’s own body, enabling real-time [[motion capture]] and interaction.<ref name="Shotton2011" /> | ||
==Underlying technology== | ==Underlying technology== | ||
A typical markerless outside-in pipeline combines specialised hardware with software-based human-pose estimation: | A typical markerless outside-in pipeline combines specialised hardware with software-based human-pose estimation: | ||
* | * '''Sensing layer''' – One or more fixed [[RGB-D]] or [[infrared]] depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a [[structured light]] pattern or use [[time-of-flight]] methods to compute depth maps.<ref name="Zhang2012" /> | ||
* | * '''Segmentation''' – Foreground extraction or person segmentation isolates user pixels from the static background. | ||
* | * '''Per-pixel body-part classification''' – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (e.g., the Randomised Decision Forest used in the original Kinect).<ref name="Shotton2011" /> | ||
* | * '''Skeletal reconstruction and filtering''' – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications. | ||
Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (e.g., [[OpenNI]]/NITE, the Microsoft Kinect SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" /> | Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (e.g., [[OpenNI]]/NITE, the [[Microsoft Kinect]] SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" /> | ||
==Markerless vs. marker-based tracking== | ==Markerless vs. marker-based tracking== | ||
Marker-based outside-in systems (HTC Vive Lighthouse, PlayStation VR) attach active LEDs or retro-reflective spheres to the headset or controllers; external sensors triangulate these explicit targets, achieving sub-millimetre precision and sub-10 ms latency. Markerless alternatives dispense with physical targets, improving user comfort and reducing setup time, but at the cost of: | [[Outside-in tracking|Marker-based outside-in systems]] ([[HTC Vive]] [[Lighthouse]], [[PlayStation VR]) attach active LEDs or retro-reflective spheres to the headset or controllers; external sensors triangulate these explicit targets, achieving sub-millimetre precision and sub-10 ms latency. Markerless alternatives dispense with physical targets, improving user comfort and reducing setup time, but at the cost of: | ||
* | * '''Lower positional accuracy and higher latency''' – Depth-sensor noise and computational overhead introduce millimetre- to centimetre-level error and ~20–30 ms end-to-end latency. | ||
* | * '''Sensitivity to occlusion''' – If a body part leaves the camera’s line of sight, the model loses track until the part re-enters view. | ||
==History and notable systems== | ==History and notable systems== | ||
Line 35: | Line 35: | ||
==Applications== | ==Applications== | ||
* | * '''Gaming and Entertainment''' – Titles like ''Kinect Sports'' mapped whole-body actions directly onto avatars. Enthusiast VR chat platforms still use Kinect skeletons to animate full-body avatars. | ||
* | * '''Rehabilitation and Exercise''' – Clinicians employ depth-based pose tracking to monitor range-of-motion exercises without encumbering patients with sensors. | ||
* | * '''Interactive installations''' – Museums deploy wall-mounted depth cameras to create “magic-mirror” AR exhibits that overlay virtual costumes onto visitors in real time. | ||
* | * '''Telepresence''' – Multi-Kinect arrays stream volumetric representations of remote participants into shared virtual spaces. | ||
==Advantages== | ==Advantages== |