Markerless outside-in tracking: Difference between revisions
Appearance
No edit summary |
Xinreality (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
{{stub}} | {{stub}} | ||
:''See also [[Outside-in tracking]], [[Markerless tracking]], [[Positional tracking]]'' | :''See also [[Outside-in tracking]], [[Markerless tracking]], [[Positional tracking]]'' | ||
==Introduction== | ==Introduction== | ||
[[ | '''[[Markerless outside-in tracking]]''' is a subtype of [[positional tracking]] used in [[virtual reality]] (VR) and [[augmented reality]] (AR). In this approach, external [[camera]]s or other [[depth sensing]] devices positioned in the environment estimate the six-degree-of-freedom pose of a user or object without relying on any [[fiducial marker]]s. Instead, [[computer vision]] algorithms analyse the incoming colour or depth stream to detect and follow natural scene features or the user’s own body, enabling real-time [[motion capture]] and interaction.<ref name="Shotton2011" /> | ||
==Underlying technology== | |||
A typical markerless outside-in pipeline combines specialised hardware with software-based human-pose estimation: | |||
* **Sensing layer** – One or more fixed [[RGB-D]] or [[infrared]] depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a [[structured light]] pattern or use [[time-of-flight]] methods to compute depth maps.<ref name="Zhang2012" /> | |||
* **Segmentation** – Foreground extraction or person segmentation isolates user pixels from the static background. | |||
* **Per-pixel body-part classification** – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (e.g., the Randomised Decision Forest used in the original Kinect).<ref name="Shotton2011" /> | |||
* **Skeletal reconstruction and filtering** – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications. | |||
Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (e.g., [[OpenNI]]/NITE, the Microsoft Kinect SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" /> | |||
Markerless outside-in | ==Markerless vs. marker-based tracking== | ||
Marker-based outside-in systems (HTC Vive Lighthouse, PlayStation VR) attach active LEDs or retro-reflective spheres to the headset or controllers; external sensors triangulate these explicit targets, achieving sub-millimetre precision and sub-10 ms latency. Markerless alternatives dispense with physical targets, improving user comfort and reducing setup time, but at the cost of: | |||
* **Lower positional accuracy and higher latency** – Depth-sensor noise and computational overhead introduce millimetre- to centimetre-level error and ~20–30 ms end-to-end latency.<ref name="Baker2016" /> | |||
* **Sensitivity to occlusion** – If a body part leaves the camera’s line of sight, the model loses track until the part re-enters view. | |||
== | ==History and notable systems== | ||
{| class="wikitable" | |||
! Year !! System !! Notes | |||
|- | |||
| 2003 || [[EyeToy]] (PlayStation 2) || 2-D silhouette tracking with a single RGB camera for casual gesture-based games.<ref name="Sony2003" /> | |||
|- | |||
| 2010 || [[Kinect]] for Xbox 360 || Consumer launch of a structured-light depth sensor delivering real-time full-body skeletons (up to six users).<ref name="Microsoft2010" /> | |||
|- | |||
| 2014 – 2016 || Research prototypes || Studies showed Kinect V2 could supply 6-DOF head, hand, and body input to DIY VR HMDs.<ref name="KinectVRStudy" /> | |||
|- | |||
| 2017 || Kinect production ends || Microsoft discontinued Kinect hardware as commercial VR shifted toward marker-based and inside-out solutions.<ref name="Microsoft2017" /> | |||
|} | |||
==Applications== | |||
* **Gaming and Entertainment** – Titles like ''Kinect Sports'' mapped whole-body actions directly onto avatars. Enthusiast VR chat platforms still use Kinect skeletons to animate full-body avatars. | |||
* **Rehabilitation and Exercise** – Clinicians employ depth-based pose tracking to monitor range-of-motion exercises without encumbering patients with sensors.<ref name="Baker2016" /> | |||
* **Interactive installations** – Museums deploy wall-mounted depth cameras to create “magic-mirror” AR exhibits that overlay virtual costumes onto visitors in real time. | |||
* **Telepresence** – Multi-Kinect arrays stream volumetric representations of remote participants into shared virtual spaces. | |||
== | ==Advantages== | ||
* '''No wearable markers''' – Users remain unencumbered, enhancing comfort and lowering entry barriers. | |||
* '''Rapid setup''' – A single sensor covers an entire play area; no lighthouse calibration or reflector placement is necessary. | |||
* '''Multi-user support''' – Commodity depth cameras distinguish and skeletonise several people simultaneously. | |||
* '''Lower hardware cost''' – RGB or RGB-D sensors are inexpensive compared with professional optical-mocap rigs. | |||
==Disadvantages== | |||
* '''Occlusion sensitivity''' – Furniture or other players can block the line of sight, causing intermittent loss of tracking. | |||
* '''Reduced accuracy and jitter''' – Compared with marker-based solutions, joint estimates exhibit higher positional noise, especially during fast or complex motion.<ref name="Baker2016" /> | |||
* '''Environmental constraints''' – Bright sunlight, glossy surfaces, and feature-poor backgrounds degrade depth or feature extraction quality. | |||
* '''Limited range and FOV''' – Most consumer depth cameras operate effectively only within 0.8–5 m; beyond that, depth resolution and skeleton stability decrease. | |||
==References== | ==References== | ||
<references /> | <references /> | ||
[[Category:Terms]] | |||
[[Category:Terms]] [[Category:Technical Terms]] | [[Category:Technical Terms]] |