Markerless outside-in tracking: Difference between revisions

Latest revision as of 21:30, 7 May 2025

See also: Terms and Technical Terms

See also Outside-in tracking, Markerless tracking, Positional tracking

Introduction

Markerless outside-in tracking is a subtype of positional tracking used in virtual reality (VR) and augmented reality (AR). In this approach, external cameras or other depth sensing devices positioned in the environment estimate the six-degree-of-freedom (6DOF) pose of a user or object without relying on any fiducial markers. Instead, computer vision algorithms analyse the incoming colour or depth stream to detect and follow natural scene features or the user’s own body, enabling real-time motion capture and interaction.^[1]

Underlying technology

A typical markerless outside-in pipeline combines specialised hardware with software-based human-pose estimation:

Sensing layer – One or more fixed RGB-D or infrared depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a structured light pattern or use time-of-flight methods to compute depth maps.^[2]
Segmentation – Foreground extraction or person segmentation isolates user pixels from the static background.
Per-pixel body-part classification – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (for example the Randomised Decision Forest used in the original Kinect).^[1]
Skeletal reconstruction and filtering – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications.

Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (for example OpenNI/NITE, the Microsoft Kinect SDK) expose joint-stream APIs for developers.^[3]

Markerless vs. marker-based tracking

Marker-based outside-in systems (HTC Vive Lighthouse, [[PlayStation VR]) attach active LEDs or retro-reflective spheres to the headset or controllers; external sensors triangulate these explicit targets, achieving sub-millimetre precision and sub-10 ms latency. Markerless alternatives dispense with physical targets, improving user comfort and reducing setup time, but at the cost of:

Lower positional accuracy and higher latency – Depth-sensor noise and computational overhead introduce millimetre- to centimetre-level error and ~20–30 ms end-to-end latency.
Sensitivity to occlusion – If a body part leaves the camera’s line of sight, the model loses track until the part re-enters view.

History and notable systems

Year	System	Notes
2003	EyeToy (PlayStation 2)	2-D silhouette tracking with a single RGB camera for casual gesture-based games.
2010	Kinect for Xbox 360	Consumer launch of a structured-light depth sensor delivering real-time full-body skeletons (up to six users).^[4]
2014 – 2016	Research prototypes	Studies showed Kinect V2 could supply 6-DOF head, hand, and body input to DIY VR HMDs.
2017	Kinect production ends	Microsoft discontinued Kinect hardware as commercial VR shifted toward marker-based and inside-out solutions.^[5]

Applications

Gaming and Entertainment – Titles like Kinect Sports mapped whole-body actions directly onto avatars. Enthusiast VR chat platforms still use Kinect skeletons to animate full-body avatars.
Rehabilitation and Exercise – Clinicians employ depth-based pose tracking to monitor range-of-motion exercises without encumbering patients with sensors.
Interactive installations – Museums deploy wall-mounted depth cameras to create “magic-mirror” AR exhibits that overlay virtual costumes onto visitors in real time.
Telepresence – Multi-Kinect arrays stream volumetric representations of remote participants into shared virtual spaces.

Advantages

No wearable markers – Users remain unencumbered, enhancing comfort and lowering entry barriers.
Rapid setup – A single sensor covers an entire play area; no lighthouse calibration or reflector placement is necessary.
Multi-user support – Commodity depth cameras distinguish and skeletonise several people simultaneously.
Lower hardware cost – RGB or RGB-D sensors are inexpensive compared with professional optical-mocap rigs.

Disadvantages

Occlusion sensitivity – Furniture or other players can block the line of sight, causing intermittent loss of tracking.
Reduced accuracy and jitter – Compared with marker-based solutions, joint estimates exhibit higher positional noise, especially during fast or complex motion.
Environmental constraints – Bright sunlight, glossy surfaces, and feature-poor backgrounds degrade depth or feature extraction quality.
Limited range and FOV – Most consumer depth cameras operate effectively only within 0.8–5 m; beyond that, depth resolution and skeleton stability decrease.

References

^[1] ^[2] ^[3] ^[6] ^[7] ^[4] ^[8] ^[5]

↑ ^1.0 ^1.1 ^1.2 Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. “Real‑Time Human Pose Recognition in Parts from a Single Depth Image.” *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 2011, pp. 1297–1304. DOI: 10.1109/CVPR.2011.5995316. Available at: https://ieeexplore.ieee.org/document/5995316 (accessed 3 May 2025).
↑ ^2.0 ^2.1 Zhang, Z. “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia*, vol. 19, no. 2, 2012, pp. 4–10. DOI: 10.1109/MMUL.2012.24. Available at: https://dl.acm.org/doi/10.1109/MMUL.2012.24 (accessed 3 May 2025).
↑ ^3.0 ^3.1 OpenNI Foundation. *OpenNI 1.5.2 User Guide*, 2010. PDF. Available at: https://www.cs.rochester.edu/courses/577/fall2011/kinect/openni-user-guide.pdf (accessed 3 May 2025).
↑ ^4.0 ^4.1 Microsoft News Center. “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.” Press release, 4 Nov 2010. Available at: https://news.microsoft.com/2010/11/04/the-future-of-entertainment-starts-today-as-kinect-for-xbox-360-leaps-and-lands-at-retailers-nationwide/ (accessed 3 May 2025).
↑ ^5.0 ^5.1 Good, O. S. “Kinect Is Officially Dead. Really. Officially. It’s Dead.” *Polygon*, 25 Oct 2017. Available at: https://www.polygon.com/2017/10/25/16543192/kinect-discontinued-microsoft-announcement (accessed 3 May 2025).
↑ Pfister, A.; West, N.; et al. “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *Journal of Biomechanics*, vol. 129, 2022, Article 110844. DOI: 10.1016/j.jbiomech.2021.110844. Available at: https://pubmed.ncbi.nlm.nih.gov/35237469/ (accessed 3 May 2025).
↑ Pham, A. “EyeToy Springs From One Man’s Vision.” *Los Angeles Times*, 18 Jan 2004. Available at: https://www.latimes.com/archives/la-xpm-2004-jan-18-fi-eyetoy18-story.html (accessed 3 May 2025).
↑ Lange, B.; Rizzo, A.; Chang, C.-Y.; Suma, E. A.; Bolas, M. “Markerless Full Body Tracking: Depth‑Sensing Technology within Virtual Environments.” *Interservice/Industry Training, Simulation and Education Conference (I/ITSEC)*, 2011. PDF. Available at: http://ict.usc.edu/pubs/Markerless%20Full%20Body%20Tracking-%20Depth-Sensing%20Technology%20within%20Virtual%20Environments.pdf (accessed 3 May 2025).

[Shotton2011-1] 1.0 ^1.1 ^1.2 Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. “Real‑Time Human Pose Recognition in Parts from a Single Depth Image.” *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 2011, pp. 1297–1304. DOI: 10.1109/CVPR.2011.5995316. Available at: https://ieeexplore.ieee.org/document/5995316 (accessed 3 May 2025).

[Zhang2012-2] 2.0 ^2.1 Zhang, Z. “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia*, vol. 19, no. 2, 2012, pp. 4–10. DOI: 10.1109/MMUL.2012.24. Available at: https://dl.acm.org/doi/10.1109/MMUL.2012.24 (accessed 3 May 2025).

[OpenNI2013-3] 3.0 ^3.1 OpenNI Foundation. *OpenNI 1.5.2 User Guide*, 2010. PDF. Available at: https://www.cs.rochester.edu/courses/577/fall2011/kinect/openni-user-guide.pdf (accessed 3 May 2025).

[Microsoft2010-4] 4.0 ^4.1 Microsoft News Center. “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.” Press release, 4 Nov 2010. Available at: https://news.microsoft.com/2010/11/04/the-future-of-entertainment-starts-today-as-kinect-for-xbox-360-leaps-and-lands-at-retailers-nationwide/ (accessed 3 May 2025).

[Microsoft2017-5] 5.0 ^5.1 Good, O. S. “Kinect Is Officially Dead. Really. Officially. It’s Dead.” *Polygon*, 25 Oct 2017. Available at: https://www.polygon.com/2017/10/25/16543192/kinect-discontinued-microsoft-announcement (accessed 3 May 2025).

[Pfister2022-6] Pfister, A.; West, N.; et al. “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *Journal of Biomechanics*, vol. 129, 2022, Article 110844. DOI: 10.1016/j.jbiomech.2021.110844. Available at: https://pubmed.ncbi.nlm.nih.gov/35237469/ (accessed 3 May 2025).

[Pham2004-7] Pham, A. “EyeToy Springs From One Man’s Vision.” *Los Angeles Times*, 18 Jan 2004. Available at: https://www.latimes.com/archives/la-xpm-2004-jan-18-fi-eyetoy18-story.html (accessed 3 May 2025).

[Lange2011-8] Lange, B.; Rizzo, A.; Chang, C.-Y.; Suma, E. A.; Bolas, M. “Markerless Full Body Tracking: Depth‑Sensing Technology within Virtual Environments.” *Interservice/Industry Training, Simulation and Education Conference (I/ITSEC)*, 2011. PDF. Available at: http://ict.usc.edu/pubs/Markerless%20Full%20Body%20Tracking-%20Depth-Sensing%20Technology%20within%20Virtual%20Environments.pdf (accessed 3 May 2025).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 1: / Line 1: @@
-{{stub}}
+{{see also|Terms|Technical Terms}}
 :''See also [[Outside-in tracking]], [[Markerless tracking]], [[Positional tracking]]''
 ==Introduction==
-[[Positional tracking]] is an essential component of both [[virtual reality]] (VR) and [[augmented reality]] (AR), contributing to a greater sense of [[immersion]] and [[presence]]. It determines the position and orientation of an object within the environment. In VR, this allows for the movements of the user to be translated into the virtual environment, and in AR it is essential for the placement of digital content into real objects or spaces. Markerless outside-in tracking is a composite term that defines a form of positional tracking that uses two specific methods: [[markerless tracking]] and [[outside-in tracking]]. <ref name="Boger"> Boger, Y. (2014). Overview of positional tracking technologies for virtual reality. Retrieved from http://www.roadtovr.com/overview-of-positional-tracking-technologies-virtual-reality/</ref> <ref name="Ziegler"> Ziegler, E. (2010). Real-time markerless tracking of objects on mobile devices. Bachelor Thesis, University of Koblenz and Landau</ref>
+'''[[Markerless outside-in tracking]]''' is a subtype of [[positional tracking]] used in [[virtual reality]] (VR) and [[augmented reality]] (AR). In this approach, external [[camera]]s or other [[depth sensing]] devices positioned in the environment estimate the six-degree-of-freedom ([[6DOF]]) [[pose]] of a user or object without relying on any [[fiducial marker]]s. Instead, [[computer vision]] algorithms analyse the incoming colour or depth stream to detect and follow natural scene features or the user’s own body, enabling real-time [[motion capture]] and interaction.<ref name="Shotton2011" />
+==Underlying technology==
+A typical markerless outside-in pipeline combines specialised hardware with software-based human-pose estimation:
+* '''Sensing layer''' – One or more fixed [[RGB-D]] or [[infrared]] depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a [[structured light]] pattern or use [[time-of-flight]] methods to compute depth maps.<ref name="Zhang2012" />
+* '''Segmentation''' – Foreground extraction or person segmentation isolates user pixels from the static background.
+* '''Per-pixel body-part classification''' – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (for example the Randomised Decision Forest used in the original Kinect).<ref name="Shotton2011" />
+* '''Skeletal reconstruction and filtering''' – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications.
+Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (for example [[OpenNI]]/NITE, the [[Microsoft Kinect]] SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" />
+==Markerless vs. marker-based tracking==
+[[Outside-in tracking|Marker-based outside-in systems]] ([[HTC Vive]] [[Lighthouse]], [[PlayStation VR]) attach active LEDs or retro-reflective spheres to the headset or controllers; external sensors triangulate these explicit targets, achieving sub-millimetre precision and sub-10 ms latency. Markerless alternatives dispense with physical targets, improving user comfort and reducing setup time, but at the cost of:
+* '''Lower positional accuracy and higher latency''' – Depth-sensor noise and computational overhead introduce millimetre- to centimetre-level error and ~20–30 ms end-to-end latency.
+* '''Sensitivity to occlusion''' – If a body part leaves the camera’s line of sight, the model loses track until the part re-enters view.
+==History and notable systems==
+{| class="wikitable"
+! Year !! System !! Notes
+|-
+| 2003 || [[EyeToy]] (PlayStation 2) || 2-D silhouette tracking with a single RGB camera for casual gesture-based games.
+|-
+| 2010 || [[Kinect]] for Xbox 360 || Consumer launch of a structured-light depth sensor delivering real-time full-body skeletons (up to six users).<ref name="Microsoft2010" />
+|-
+| 2014 – 2016 || Research prototypes || Studies showed Kinect V2 could supply 6-DOF head, hand, and body input to DIY VR HMDs.
+|-
+| 2017 || Kinect production ends || Microsoft discontinued Kinect hardware as commercial VR shifted toward marker-based and inside-out solutions.<ref name="Microsoft2017" />
+|}
-Markerless tracking is a method of motion tracking that avoids the use of markers (also known as [[fiducial markers]]). These markers are usually placed in the environment or in the head-mounted displays (HMDs), helping the system determine the users or camera position. The Markerless method uses instead natural features already present in the environment, for tracking purposes. <ref name="Virtual Reality Society"> Virtual Reality Society. Virtual reality motion tracking technology has all the moves. Retrieved from https://www.vrs.org.uk/virtual-reality-gear/motion-tracking</ref> <ref name="Klein"> Klein, G. (2006)</ref>.
+==Applications==
+* '''Gaming and Entertainment''' – Titles like ''Kinect Sports'' mapped whole-body actions directly onto avatars. Enthusiast VR chat platforms still use Kinect skeletons to animate full-body avatars.
+* '''Rehabilitation and Exercise''' – Clinicians employ depth-based pose tracking to monitor range-of-motion exercises without encumbering patients with sensors.
+* '''Interactive installations''' – Museums deploy wall-mounted depth cameras to create “magic-mirror” AR exhibits that overlay virtual costumes onto visitors in real time.
+* '''Telepresence''' – Multi-Kinect arrays stream volumetric representations of remote participants into shared virtual spaces.
-Markerless outside-in tracking is a technology that was used prior to the wide availability of consumer VR devices. Two popular non-VR systems based on this form of tracking are the PlayStation [https://en.wikipedia.org/wiki/EyeToy EyeToy], released in October 2003, and the Xbox [https://en.wikipedia.org/wiki/Kinect Kinect], released in November 2010.
+==Advantages==
+* '''No wearable markers''' – Users remain unencumbered, enhancing comfort and lowering entry barriers.
+* '''Rapid setup''' – A single sensor covers an entire play area; no lighthouse calibration or reflector placement is necessary.
+* '''Multi-user support''' – Commodity depth cameras distinguish and skeletonise several people simultaneously.
+* '''Lower hardware cost''' – RGB or RGB-D sensors are inexpensive compared with professional optical-mocap rigs.
-With markerless outside-in tracking, cameras are mounted in the environment, such as on top of a television set, and aimed at the user. The user's movements are tracked without requiring any kind of markers or other hardware. The disadvantage of this system is that lacks the fine spacial accuracy and low-latency of marker-based systems.
+==Disadvantages==
+* '''Occlusion sensitivity''' – Furniture or other players can block the line of sight, causing intermittent loss of tracking.
+* '''Reduced accuracy and jitter''' – Compared with marker-based solutions, joint estimates exhibit higher positional noise, especially during fast or complex motion.
+* '''Environmental constraints''' – Bright sunlight, glossy surfaces, and feature-poor backgrounds degrade depth or feature extraction quality.
+* '''Limited range and FOV''' – Most consumer depth cameras operate effectively only within 0.8–5 m; beyond that, depth resolution and skeleton stability decrease.
 ==References==
-<references />
+<ref name="Shotton2011">Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. “Real‑Time Human Pose Recognition in Parts from a Single Depth Image.” *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 2011, pp. 1297–1304. DOI: 10.1109/CVPR.2011.5995316. Available at: https://ieeexplore.ieee.org/document/5995316 (accessed 3 May 2025).</ref>
+<ref name="Zhang2012">Zhang, Z. “Microsoft Kinect Sensor and Its Effect.” *IEEE MultiMedia*, vol. 19, no. 2, 2012, pp. 4–10. DOI: 10.1109/MMUL.2012.24. Available at: https://dl.acm.org/doi/10.1109/MMUL.2012.24 (accessed 3 May 2025).</ref>
+<ref name="OpenNI2013">OpenNI Foundation. *OpenNI 1.5.2 User Guide*, 2010. PDF. Available at: https://www.cs.rochester.edu/courses/577/fall2011/kinect/openni-user-guide.pdf (accessed 3 May 2025).</ref>
+<ref name="Pfister2022">Pfister, A.; West, N.; et al. “Applications and Limitations of Current Markerless Motion Capture Methods for Clinical Gait Biomechanics.” *Journal of Biomechanics*, vol. 129, 2022, Article 110844. DOI: 10.1016/j.jbiomech.2021.110844. Available at: https://pubmed.ncbi.nlm.nih.gov/35237469/ (accessed 3 May 2025).</ref>
+<ref name="Pham2004">Pham, A. “EyeToy Springs From One Man’s Vision.” *Los Angeles Times*, 18 Jan 2004. Available at: https://www.latimes.com/archives/la-xpm-2004-jan-18-fi-eyetoy18-story.html (accessed 3 May 2025).</ref>
+<ref name="Microsoft2010">Microsoft News Center. “The Future of Entertainment Starts Today as Kinect for Xbox 360 Leaps and Lands at Retailers Nationwide.” Press release, 4 Nov 2010. Available at: https://news.microsoft.com/2010/11/04/the-future-of-entertainment-starts-today-as-kinect-for-xbox-360-leaps-and-lands-at-retailers-nationwide/ (accessed 3 May 2025).</ref>
+<ref name="Lange2011">Lange, B.; Rizzo, A.; Chang, C.-Y.; Suma, E. A.; Bolas, M. “Markerless Full Body Tracking: Depth‑Sensing Technology within Virtual Environments.” *Interservice/Industry Training, Simulation and Education Conference (I/ITSEC)*, 2011. PDF. Available at: http://ict.usc.edu/pubs/Markerless%20Full%20Body%20Tracking-%20Depth-Sensing%20Technology%20within%20Virtual%20Environments.pdf (accessed 3 May 2025).</ref>
+<ref name="Microsoft2017">Good, O. S. “Kinect Is Officially Dead. Really. Officially. It’s Dead.” *Polygon*, 25 Oct 2017. Available at: https://www.polygon.com/2017/10/25/16543192/kinect-discontinued-microsoft-announcement (accessed 3 May 2025).</ref>
-[[Category:Terms]] [[Category:Technical Terms]]
+[[Category:Terms]]
+[[Category:Technical Terms]]
+[[Category:Tracking]]
+[[Category:Tracking Types]]