Jump to content

Markerless outside-in tracking: Difference between revisions

No edit summary
m Text replacement - "e.g.," to "for example"
Tags: Mobile edit Mobile web edit
 
Line 10: Line 10:
* '''Sensing layer''' – One or more fixed [[RGB-D]] or [[infrared]] depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a [[structured light]] pattern or use [[time-of-flight]] methods to compute depth maps.<ref name="Zhang2012" />
* '''Sensing layer''' – One or more fixed [[RGB-D]] or [[infrared]] depth cameras acquire per-frame point clouds. Commodity devices such as the Microsoft Kinect project a [[structured light]] pattern or use [[time-of-flight]] methods to compute depth maps.<ref name="Zhang2012" />
* '''Segmentation''' – Foreground extraction or person segmentation isolates user pixels from the static background.
* '''Segmentation''' – Foreground extraction or person segmentation isolates user pixels from the static background.
* '''Per-pixel body-part classification''' – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (e.g., the Randomised Decision Forest used in the original Kinect).<ref name="Shotton2011" />
* '''Per-pixel body-part classification''' – A machine-learning model labels each pixel as “head”, “hand”, “torso”, and so on (for example the Randomised Decision Forest used in the original Kinect).<ref name="Shotton2011" />
* '''Skeletal reconstruction and filtering''' – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications.
* '''Skeletal reconstruction and filtering''' – The system fits a kinematic skeleton to the classified pixels and applies temporal filtering to reduce jitter, producing smooth head- and hand-pose data that can drive VR/AR applications.


Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (e.g., [[OpenNI]]/NITE, the [[Microsoft Kinect]] SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" />
Although a single camera can suffice, multi-camera rigs extend coverage and mitigate occlusion problems. Open source and proprietary middleware (for example [[OpenNI]]/NITE, the [[Microsoft Kinect]] SDK) expose joint-stream APIs for developers.<ref name="OpenNI2013" />


==Markerless vs. marker-based tracking==
==Markerless vs. marker-based tracking==