Markerless outside-in tracking

This page is a stub, please expand it if you have more information.

See also Outside-in tracking, Markerless tracking, Positional tracking

Introduction

Markerless outside-in tracking is a subtype of positional tracking used in both virtual reality (VR) and augmented reality (AR). It places external cameras or other depth sensing devices around the play area and estimates a user’s six-degree-of-freedom pose without any worn fiducial markers. Instead, the system runs computer vision algorithms—most famously the per-pixel body-part classifier introduced for Microsoft’s Kinect—to create a real-time motion capture skeleton.^[1]

Underlying technology

A typical markerless outside-in pipeline includes:

Sensing layer – One or more fixed RGB-D or infrared depth cameras (e.g., the first-generation Kinect) acquire point-cloud frames. Depth is measured with structured light or time-of-flight illumination.^[2]^[3]
Segmentation – Foreground extraction isolates user pixels from the static background.
Body-part classification – A decision-forest classifier labels each depth pixel as head, hand, torso, and so on, following Shotton et al.^[1]
Skeletal fitting and filtering – Joint hypotheses are fitted to a kinematic model and temporally smoothed, generating continuous head- and hand-pose streams.

Open software stacks such as OpenNI/NITE expose these joint streams to developers.^[4]

Markerless vs. marker-based tracking

Marker-based outside-in systems (HTC Vive Lighthouse, PlayStation VR) attach active LEDs or reflective spheres to the headset or controllers, achieving millimetre-level accuracy. Markerless systems remove that hardware layer but incur:

Susceptibility to occlusion and environmental lighting.
Higher positional noise and latency (~20–30 ms end-to-end).^[5]

History and notable systems

Year	System	Technical note
2003	EyeToy (PlayStation 2)	2-D silhouette tracking with a single RGB webcam.^[6]
2010	Kinect for Xbox 360	Structured-light depth sensor providing full-body skeletons for up to six users.^[7]
2011	Kinect + FAAST middleware	Demonstrated low-cost VR interaction with markerless tracking.^[8]
2017	Kinect production ends	Microsoft ceased manufacturing Kinect as industry moved to other tracking paradigms.^[9]

Applications

**Gaming and entertainment** – Titles such as Kinect Sports map whole-body gestures to avatars; hobbyists still use Kinect for full-body VR chat avatars.
**Rehabilitation and exercise** – Depth-based pose tracking supports remote physiotherapy and balance-training systems.^[5]
**Interactive exhibits** – Museums mount depth cameras to create “magic-mirror” AR overlays.
**Telepresence** – Multi-camera arrays stream volumetric avatars into shared virtual spaces.

Advantages

No wearable markers, enhancing comfort.
Quick single-sensor setup and lower hardware cost.
Ability to track multiple users at once.

Disadvantages

Occlusion sensitivity and limited camera field-of-view.
Lower accuracy than marker-based alternatives.^[10]
Performance degradation in bright sunlight or on reflective surfaces.

References

↑ ^1.0 ^1.1 Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. “Real-Time Human Pose Recognition in Parts from a Single Depth Image.” Proceedings of CVPR 2011. IEEE, 2011.
↑ Zeng, W.; Zhang, Z. “Microsoft Kinect Sensor and Its Effect.” IEEE MultiMedia, 19 (2), 2012, pp. 4–10.
↑ “Structured-light 3D scanner.” Wikipedia. Accessed 1 May 2025.
↑ OpenNI Foundation. OpenNI 1.5.2 User Guide. 2013.
↑ ^5.0 ^5.1 Pfister, A.; West, N.; et al. “Applications and limitations of current markerless motion capture methods for clinical gait biomechanics.” Journal of Biomechanics, 129 (2022) 110844.
↑ Pham, A. “EyeToy Springs From One Man’s Vision.” Los Angeles Times, 27 Nov 2003.
↑ Microsoft News Center. “The Future of Entertainment Starts Today as Kinect for Xbox 360 …”, 4 Nov 2010.
↑ Lange, B.; Rizzo, A.; Chang, C-Y.; Suma, E.; Bolas, M. “Markerless Full Body Tracking: Depth-Sensing Technology within Virtual Environments.” I/ITSEC 2011.
↑ Good, O. “Kinect is officially dead. Really. Officially. It’s dead.” Polygon, 25 Oct 2017.
↑ Remocapp. “Marker vs Markerless Motion Capture by Accuracy and Detail Level.” Blog post, 2024.

[Shotton2011-1] 1.0 ^1.1 Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. “Real-Time Human Pose Recognition in Parts from a Single Depth Image.” Proceedings of CVPR 2011. IEEE, 2011.

[Zhang2012-2] Zeng, W.; Zhang, Z. “Microsoft Kinect Sensor and Its Effect.” IEEE MultiMedia, 19 (2), 2012, pp. 4–10.

[StructuredLight-3] “Structured-light 3D scanner.” Wikipedia. Accessed 1 May 2025.

[OpenNI2013-4] OpenNI Foundation. OpenNI 1.5.2 User Guide. 2013.

[Pfister2022-5] 5.0 ^5.1 Pfister, A.; West, N.; et al. “Applications and limitations of current markerless motion capture methods for clinical gait biomechanics.” Journal of Biomechanics, 129 (2022) 110844.

[EyeToy2003-6] Pham, A. “EyeToy Springs From One Man’s Vision.” Los Angeles Times, 27 Nov 2003.

[Kinect2010-7] Microsoft News Center. “The Future of Entertainment Starts Today as Kinect for Xbox 360 …”, 4 Nov 2010.

[Lange2011-8] Lange, B.; Rizzo, A.; Chang, C-Y.; Suma, E.; Bolas, M. “Markerless Full Body Tracking: Depth-Sensing Technology within Virtual Environments.” I/ITSEC 2011.

[KinectDead2017-9] Good, O. “Kinect is officially dead. Really. Officially. It’s dead.” Polygon, 25 Oct 2017.

[Remocapp2024-10] Remocapp. “Marker vs Markerless Motion Capture by Accuracy and Detail Level.” Blog post, 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]