Tracking
- See also: Positional tracking
Tracking allows the VR or AR system to know where your head (HMD), hands and other body parts (Input Devices) are looking and moving. Tracking is important to accurately render the virtual world to match your physical positions and movements. When tracking is accurate and low-latency the virtual scene stays locked to the real world as you move; when it lags or drifts the mismatch between what you feel and what you see is a common cause of motion sickness.[1]
Tracking can either be 3DOF (rotation only) or 6DOF (position and rotation). A system's degrees of freedom describe how many independent ways a tracked object can move. The three rotational axes are pitch, yaw, and roll, and the three translational axes are forward/back, up/down, and left/right.[2] A 3DOF system knows only how the user is oriented, so the wearer can look around but cannot lean, crouch, or walk through the scene. A 6DOF system tracks orientation and position together, so the user can physically move and have that movement reflected in the virtual world.[2][3]
Early VR tracking technology in the 1990s used magnetic tracking systems that used either AC magnetics or DC magnetics depending on the provider. Commercial electromagnetic tracking dates back further still, to Polhemus and its Space-Tracker work in 1969, followed by the FASTRAK line in the 1980s and Ascension's pulsed-DC "Flock of Birds" in 1991.[4]
The Magic Leap 1 uses 6DOF magnetic tracking for its controller, but it is poor quality.
Rotational tracking
Rotational tracking tracks an object's movement in all 3 rotational directions: pitch, yaw, and roll. Rotational tracking is usually performed by IMUs such as accelerometers, gyroscopes and magnetometers.
These sensors play complementary roles. The gyroscope measures angular velocity and gives excellent short-term rotation data, but its readings accumulate small errors over time, causing the orientation estimate to slowly drift. The accelerometer measures linear acceleration and senses the constant pull of gravity, which provides an absolute reference for tilt, though during fast motion it cannot separate movement from gravity. The magnetometer measures the surrounding magnetic field and acts as a compass, giving a reference for heading relative to magnetic north.[5]
To turn these noisy individual signals into one stable orientation, headsets use sensor fusion algorithms such as Kalman filters or complementary filters. These combine high-frequency gyroscope data with the accelerometer's gravity reference and the magnetometer's heading reference to continuously correct drift.[5] Rotational tracking is cheap and very fast, with the IMU in a modern headset typically updating between 500 and 1000 times per second, much faster than camera-based tracking alone can manage, which is why it carries the burden of keeping latency low.[5] On its own, however, an IMU cannot reliably determine absolute position, because integrating acceleration twice to estimate position lets errors grow quickly.[4]
Positional tracking
Positional tracking tracks an object's movement in all 3 translational directions: forward/back, up/down, left/right. Positional tracking is usually more difficult than rotational tracking and is accomplished through different Types and Systems.
A key distinction in positional tracking is where the sensors live. Outside-in tracking places cameras or laser emitters in the room and tracks markers or sensors on the headset and controllers. This approach can offer high precision and low latency, but setup is more involved, the play area is bounded by the external hardware, and tracking can break if the user blocks the sensors' line of sight.[6] Inside-out tracking flips this around: the cameras sit on the headset and look outward at the environment. It is more portable and far easier to set up since there is no external hardware, but it leans on heavy on-device computation and can struggle in poor lighting, in rooms with blank walls, or where textures repeat.[6][7]
Tracking technologies
Several distinct technologies are used to recover position and orientation, and most modern headsets combine more than one.
Optical tracking uses cameras and infrared light. In marker-based optical tracking the tracked object carries known reference points, such as visible patterns or infrared LEDs, often blinking in sync with the camera. Markerless optical tracking instead finds and follows natural features in the scene.[4] Laser tracking, as used by Valve's Lighthouse, sweeps infrared laser planes across the room and times when they hit photosensors on the headset and controllers (see below).
Inertial tracking relies on the IMU described above and is fast but drift-prone. Magnetic tracking uses a base station that generates electromagnetic fields picked up by coils on the tracked object; because each frame is solved independently it suffers no cumulative drift and latencies are only a few milliseconds, but it works poorly near metal and conductive objects, degrades with distance, and is limited to roughly a 5 meter area.[4] The Razer Hydra controller from 2011, built by Sixense, used electromagnetic tracking and offered roughly 1 mm and 1 degree precision near its base station.[4] Ultrasonic or acoustic tracking places multiple speakers and receivers in the environment and calculates position from the time of flight of timed sound bursts, working like echolocation; the sensors are small and cheap but range is short, line of sight is required, and ambient noise can interfere.[4] Radio methods such as Ultra Wideband triangulate a tag's position from fixed anchors and can reach around 5 mm accuracy at 200 Hz when fused with other sensors.[4]
Visual-inertial odometry (VIO) is the workhorse behind most modern inside-out headsets. It estimates full 6DOF pose by fusing one or more cameras with one or more IMUs, detecting and tracking visual features frame to frame while the IMU fills in fast motion.[1] SLAM (simultaneous localization and mapping) extends this idea by building a map of the unknown environment at the same time as it locates the device within that map, recognizing landmarks to reduce error.[1] A related technique, loop closure, recognizes places the device has already visited and corrects accumulated drift; the terms odometry and SLAM are sometimes used to distinguish systems without and with loop closure.[1] Markerless tracking depends on these methods, finding natural landmarks in the camera feed rather than relying on placed markers. By contrast, fiducial markers are images or patterns deliberately placed in the environment so the camera can recover its pose; toolkits such as ARToolKit recognize these markers and compute the camera position and orientation relative to them, an approach long used in AR.[8]
Tracking systems
Lighthouse - laser-based system developed by Valve for SteamVR. Each base station contains infrared LEDs plus two rotating infrared laser emitters on orthogonal axes. The station flashes its LEDs as a sync pulse, then sweeps one laser across the room and then the other; photosensors on the headset and controllers record exactly when each laser reaches them, and because the position of every sensor on the device is known, those timings are used to compute the device's pose. The approach is computationally light and has been measured to track within roughly 10 mm at about 2 meters from the base stations.[9]
Constellation - optical-based system developed by Oculus VR for Oculus Rift (Platform). It is an outside-in system: precisely positioned infrared LEDs are embedded through the front, sides, and back of the headset and into the Oculus Touch controllers, blinking in a set pattern, while external Oculus Sensors, each an infrared camera behind a filter that blocks visible light, watch the LEDs to recover full 6DOF position and orientation.[10]
WorldSense - developed by Google that uses markerless inside-out tracking.
Tracking subtypes
The same underlying technologies are applied to track different parts of the user.
- Head and HMD tracking is the most fundamental: the headset's own position and orientation drive the rendered viewpoint, so it must be both accurate and low-latency to keep the scene stable.
- Controller tracking follows the hand-held Input Devices. The Rift's Oculus Touch controllers carry their own infrared LEDs for Constellation, while Lighthouse controllers carry photosensors.[10]
- Hand tracking uses the headset cameras and computer vision to follow the user's bare hands and fingers, removing the need to hold a controller. It is offered on headsets such as the VIVE Focus Vision.[11]
- Eye tracking follows where the user is looking. Its best-known use is foveated rendering, which renders full detail only where the eyes are pointed and reduces detail in peripheral vision to save GPU load; it also enables gaze-based input and automatic IPD adjustment. Headsets with onboard eye tracking include the Vive Pro Eye (2019), Meta Quest Pro (2022), PlayStation VR2 (2023), and Apple Vision Pro (2024).[11][12]
- Face tracking captures movements of the lips, jaw, and cheeks to drive an avatar's expressions; some headsets add this through a separate facial tracker that captures motion at around 60 Hz.[11]
- Full-body tracking adds tracked points on the waist and limbs so the user's whole body can be represented. With HTC Vive Tracker hardware, HTC recommends three or more trackers for a full-body setup, commonly one on the waist and one on each foot, and up to nine Vive Tracker 3.0 units can run in a single play area alongside two controllers.[13]
Comparison of tracking systems
- See also: Comparison of tracking systems
There are several consumer-level tracking systems currently available. Originally, these were used for interaction with regular non-VR video games, but more recent tracking systems have been used for VR systems.
| Brand & Model | Tracking system | Inside-out | Outside-in | Marker-based | Marker light frequency |
IMU | Spacial resolution (mm) |
Latency (ms) |
|---|---|---|---|---|---|---|---|---|
| Facebook/Oculus Rift | Constellation | No | Yes | Yes | Infrared | Yes | ? | ? |
| IndoTraq | HSVT | Yes | No | No | Infrared | Yes | 0.3 | 10 |
| HTC Vive/SteamVR | Lighthouse | Yes | No | Yes | Infrared | Yes | 0.3 | 15 |
| Microsoft HoloLens | ?? | Yes | No | No | Infrared | Yes | ? | ? |
| Nintendo Wii Remote | ?? | Yes | No | Yes | Infrared | Yes | ? | ? |
| Sony PSVR | ?? | No | Yes | Yes | Red/Green/Blue | Yes | ? | 18 |
| WorldSense | Yes | No | No | ? | Yes | ? | ? |
References
- ↑ 1.0 1.1 1.2 1.3 "Visual and Inertial Odometry". https://www.ifi.uzh.ch/en/rpg/research/research_vo.html.
- ↑ 2.0 2.1 "Degrees of Freedom (DoF): 3-DoF vs 6-DoF for VR Headset Selection". https://virtualspeech.com/blog/degrees-of-freedom-vr.
- ↑ "What is 6DoF". https://www.classvr.com/resource-hub/what-is-6dof/.
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 4.6 "VR positional tracking". https://en.wikipedia.org/wiki/VR_positional_tracking.
- ↑ 5.0 5.1 5.2 "How IMU Data Works: From Sensors to Real-World Uses". https://scienceinsights.org/how-imu-data-works-from-sensors-to-real-world-uses/.
- ↑ 6.0 6.1 "Pose Tracking Methods: Outside-in VS Inside-out Tracking in VR". https://pimax.com/blogs/blogs/pose-tracking-methods-outside-in-vs-inside-out-tracking-in-vr.
- ↑ "What types of tracking systems are used in VR (e.g., inside-out vs. outside-in)?". https://milvus.io/ai-quick-reference/what-types-of-tracking-systems-are-used-in-vr-eg-insideout-vs-outsidein.
- ↑ "Robust Tracking Through the Design of High Quality Fiducial Markers: An Optimization Tool for ARToolKit". https://ieeexplore.ieee.org/document/8287815/.
- ↑ "Analysis of Valve's 'Lighthouse' Tracking System Reveals Accuracy". https://roadtovr.com/analysis-of-valves-lighthouse-tracking-system-reveals-accuracy/.
- ↑ 10.0 10.1 "Oculus Rift CV1". https://en.wikipedia.org/wiki/Oculus_Rift_CV1.
- ↑ 11.0 11.1 11.2 "What Is Eye Tracking in VR, and Which Headsets Have It?". https://blog.vive.com/us/vr-eye-tracking-what-is-it-which-vr-headsets-have-it/.
- ↑ "Foveated rendering". https://en.wikipedia.org/wiki/Foveated_rendering.
- ↑ "VIVE Tracker (3.0)". https://www.vive.com/us/accessory/tracker3/.