Markerless outside-in tracking

See also: Outside-in tracking, Markerless tracking and Positional tracking

Introduction

Markerless outside-in tracking is a form of positional tracking for virtual reality (VR) and augmented reality (AR) that estimates a user’s six-degree-of-freedom pose from externally mounted depth-sensing or RGB cameras without requiring any fiducial markers. Instead, per-frame depth or colour images are processed by computer vision algorithms that segment the scene, classify body parts and fit a kinematic skeleton, enabling real-time motion capture and interaction.^[1]

Underlying technology

A typical pipeline combines specialised hardware with software-based human-pose estimation:

Sensing layer – One or more fixed RGB-D or infra-red depth cameras stream point clouds. The original Microsoft Kinect projects a near-IR structured light pattern, whereas Kinect V2 and Azure Kinect use time-of-flight ranging.^[2] The effective operating range for Kinect v1 is ≈ 0.8 – 4.5 m (specification upper limit 5 m).^[3]
Segmentation – Foreground extraction isolates user pixels from background geometry.
Per-pixel body-part classification – A Randomised Decision Forest labels each pixel (head, hand, torso, …).^[1]
Skeletal reconstruction and filtering – Joint positions are inferred and temporally filtered to reduce jitter, producing head- and hand-pose data consumable by VR/AR engines.

Although a single depth camera can suffice, multi-camera rigs expand coverage and reduce occlusions. Open-source and proprietary middleware (e.g., OpenNI/NiTE 2, Microsoft Kinect SDK) expose joint-stream APIs for developers.^[4] Measured end-to-end skeleton latency for Kinect ranges from 60 – 90 ms, depending on model and SDK settings.^[5]

Markerless vs. marker-based tracking

Marker-based outside-in systems such as **Vicon** optical mocap or HTC Vive **Lighthouse** attach retro-reflective spheres or use on-device photodiodes that read sweeping IR lasers from the base stations, achieving sub-millimetre precision and motion-to-photon latency below 10 ms.^[6]^[7] Markerless alternatives remove physical targets, improving comfort and setup time, but at the cost of:

Lower positional accuracy and higher latency – Depth-sensor noise plus the 60 – 90 ms processing pipeline produce millimetre- to centimetre-level error.^[8]
Sensitivity to occlusion – Body parts outside the camera’s line-of-sight are temporarily lost.

History and notable systems

Year	System	Notes
2003	EyeToy (PlayStation 2)	2-D silhouette tracking with a single RGB camera.^[9]
2010	Kinect for Xbox 360	First consumer structured-light depth sensor with real-time full-body skeletons (up to six users).^[10]
2014 – 2016	Research prototypes	Academic work showed Kinect V2 could deliver 6-DOF head- and hand-pose input for DIY VR HMDs.^[5]
2017	Kinect production ends	Microsoft discontinued Kinect hardware as commercial VR shifted toward inside-out and marker-based solutions.^[11]

Applications

**Gaming & entertainment** – Titles such as Kinect Sports map whole-body actions to avatars; some VR chat platforms still use Kinect skeletons.
**Rehabilitation & exercise** – Clinicians monitor range-of-motion without attaching markers.^[12]
**Interactive installations** – Depth cameras create “magic-mirror” AR exhibits in museums.
**Telepresence** – Multi-camera arrays stream volumetric avatars into shared virtual spaces.

Advantages

No wearable markers.
Rapid single-sensor setup; no lighthouse calibration.
Simultaneous multi-user support.
Lower hardware cost than professional optical mocap rigs.

Disadvantages

Occlusion sensitivity – furniture or other players can block tracking.
Reduced accuracy and 60 – 90 ms latency compared with lighthouse or Vicon systems.^[8]^[7]
Environmental constraints – bright sunlight or glossy surfaces degrade depth quality.
Limited range and FOV – reliable only within ≈ 0.8 – 4.5 m for Kinect-class sensors.^[3]

References

↑ ^1.0 ^1.1 Shotton J. et al. (2011). “Real-Time Human Pose Recognition in Parts from a Single Depth Image.” *CVPR 2011.*
↑ Zhang Z. (2012). “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia* 19 (2): 4–10.
↑ ^3.0 ^3.1 Khoshelham K.; Elberink S. (2012). “Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications.” *Sensors* 12 (2): 1437 – 1454.
↑ OpenNI Foundation (2013). “NiTE 2.0 User Guide.”
↑ ^5.0 ^5.1 Livingston M. A. et al. (2012). “Performance Measurements for the Microsoft Kinect Skeleton.” *IEEE VR 2012 Workshop.*
↑ Vicon Motion Systems. “Vicon Tracker – Latency down to 2.5 ms.” Product sheet.
↑ ^7.0 ^7.1 Malventano A. (2016). “SteamVR HTC Vive In-depth – Lighthouse Tracking System Dissected.” *PC Perspective.*
↑ ^8.0 ^8.1 Guffanti D. et al. (2020). “Accuracy of the Microsoft Kinect V2 Sensor for Human Gait Analysis.” *Sensors* 20 (16): 4405.
↑ Pham A. (2004-01-18). “EyeToy Springs From One Man’s Vision.” *Los Angeles Times.*
↑ Microsoft News Center (2010-11-04). “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.”
↑ Good O. S. (2017-10-25). “Kinect is officially dead. Really. Officially. It’s dead.” *Polygon.*
↑ Wade L. et al. (2022). “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *PeerJ* 10:e12995.

[Shotton2011-1] 1.0 ^1.1 Shotton J. et al. (2011). “Real-Time Human Pose Recognition in Parts from a Single Depth Image.” *CVPR 2011.*

[Zhang2012-2] Zhang Z. (2012). “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia* 19 (2): 4–10.

[DepthRange2012-3] 3.0 ^3.1 Khoshelham K.; Elberink S. (2012). “Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications.” *Sensors* 12 (2): 1437 – 1454.

[NiTE2013-4] OpenNI Foundation (2013). “NiTE 2.0 User Guide.”

[Livingston2012-5] 5.0 ^5.1 Livingston M. A. et al. (2012). “Performance Measurements for the Microsoft Kinect Skeleton.” *IEEE VR 2012 Workshop.*

[ViconSpec-6] Vicon Motion Systems. “Vicon Tracker – Latency down to 2.5 ms.” Product sheet.

[Lighthouse2016-7] 7.0 ^7.1 Malventano A. (2016). “SteamVR HTC Vive In-depth – Lighthouse Tracking System Dissected.” *PC Perspective.*

[Guffanti2020-8] 8.0 ^8.1 Guffanti D. et al. (2020). “Accuracy of the Microsoft Kinect V2 Sensor for Human Gait Analysis.” *Sensors* 20 (16): 4405.

[EyeToy2004-9] Pham A. (2004-01-18). “EyeToy Springs From One Man’s Vision.” *Los Angeles Times.*

[Microsoft2010-10] Microsoft News Center (2010-11-04). “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.”

[Microsoft2017-11] Good O. S. (2017-10-25). “Kinect is officially dead. Really. Officially. It’s dead.” *Polygon.*

[Wade2022-12] Wade L. et al. (2022). “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *PeerJ* 10:e12995.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]