Motion capture

Motion capture (often shortened to mocap) is the process of recording the movement of people or objects and translating it into digital data, usually to drive an animated character, a simulation, or an analysis of how something moves. In film, video games, and animation the captured motion of a live performer is mapped onto a three-dimensional model so the model moves the way the performer did. When motion capture records facial expression and full-body performance at the same time it is often called performance capture.^[1]

There are three main families of motion capture technology. Optical systems track the position of markers, or of the body directly, with cameras. Inertial systems use wearable inertial measurement units (IMUs) on the body and need no cameras. Hybrid systems combine the two to offset their individual weaknesses.^[1]^[2] In virtual and augmented reality the same underlying ideas are used to put a user's body into a virtual scene as an avatar, a topic usually discussed as body tracking and positional tracking.

History

The roots of motion capture lie in nineteenth-century photographic studies of movement. In 1878 Eadweard Muybridge published The Horse in Motion, a sequence of photographs taken by a line of cameras triggered as a horse galloped past, which settled a dispute about whether all four hooves leave the ground at once.^[3] In 1882 Etienne-Jules Marey developed chronophotography, recording successive phases of a motion on a single plate; by the 1890s he was dressing subjects in dark suits with shiny buttons at the joints and connecting bands, so that the recorded image isolated the path of the limbs. This use of marked points on the body to make movement easier to measure anticipates modern marker-based capture.^[3]

In the 1910s the industrial engineers Frank and Lillian Gilbreth attached small incandescent bulbs to workers' hands and used long-exposure photography to trace the path of their movements, a technique they called the chronocyclegraph.^[3] A parallel thread came from animation: in 1915 Max Fleischer built the rotoscope, patented in 1917, which projected live-action film frame by frame onto glass so that artists could trace realistic movement into a drawing. Rotoscoping is not motion capture in the digital sense, but it established the practice of deriving animation from recorded human movement.^[3]

Computer-based motion capture emerged from biomechanics and medicine. In 1983 Tom Calvert built an exoskeleton fitted with potentiometers to measure a patient's joint angles and feed them to a computer for animation.^[3] Optical systems that tracked markers with cameras were developed through the 1970s and 1980s and entered film and games in the 1990s.^[4] Two landmarks in cinema show how the field matured. The Polar Express (2004), directed by Robert Zemeckis, was the first feature film made entirely with performance capture, using a Sony Pictures Imageworks system to record an actor's facial and body motion together in a single session.^[5] James Cameron's Avatar (2009), with visual effects by Weta Digital, introduced head-mounted cameras into the process so that facial and body performance could be captured together while actors retained freedom of movement on the capture volume.^[6]

How it works

Optical, marker-based

Marker-based optical capture is regarded as the most accurate method and is treated as a reference standard for measuring human movement.^[1] A performer wears retroreflective markers, and a ring of synchronized infrared cameras surrounds the capture volume. Each camera measures the two-dimensional position of every marker it sees, and because two or more cameras observe the same marker from different angles the system reconstructs its three-dimensional position by triangulation. Combining all the markers yields the pose of the whole body frame by frame. This is a form of outside-in tracking: fixed cameras observe markers on a moving subject, the inverse of the inside-out tracking used by standalone headsets.^[4] The cost of this accuracy is that the system needs a calibrated rig, controlled lighting, and a dedicated space, and markers can be lost to occlusion when the body blocks a camera's line of sight.^[1] Markers may also be active, emitting light in a known pattern, rather than passive and retroreflective; the same active and passive distinction appears in VR tracking that uses fiducial markers.

Optical, markerless

Markerless optical capture removes the suit and markers and instead estimates body pose directly from camera images using computer-vision algorithms. Multi-camera markerless setups can reconstruct a performer without any worn hardware, but they are sensitive to occlusion and generally work only in the specific conditions the cameras were arranged for.^[2] Research has pushed markerless capture toward lighter setups, including egocentric capture from cameras worn on the head and methods that fuse a few body-worn sensors with video.^[7] The vision-based body tracking now built into VR headsets is a consumer application of the markerless approach.

Inertial

Inertial capture places a network of inertial measurement units on the body, each combining accelerometers, gyroscopes, and usually magnetometers, and computes joint orientations from the sensor data through sensor fusion. Because nothing external is required, inertial suits are portable and work outdoors or in cramped spaces where a camera rig cannot go. Commercial inertial systems typically use on the order of 17 to 19 sensors, while research has reduced this to as few as six on the head, wrists, pelvis, and legs.^[1]^[2] The trade-off is that an IMU measures only relative motion, so it cannot determine absolute position in space, and the orientation estimate drifts over time and needs periodic recalibration.^[1]^[2] Hybrid systems address both problems by fusing inertial data with optical input, recovering drift-free global position together with the limb rotations that cameras alone can miss.^[2]

Facial capture

Facial motion capture records expression separately from, or together with, body motion. Marker-based facial capture tracks dots on the face, while markerless facial capture estimates expression from video. The captured expression is commonly expressed as blendshapes, a set of named, artist-controllable facial poses that are mixed to reproduce the performance.^[8] The same idea drives avatars in video chat and live streaming, where a person's face puppets a virtual character in real time.^[8]

Relevance to VR and AR

Consumer VR rarely uses a full mocap rig, but it relies on the same problem of estimating body pose, and it borrows hardware and techniques from professional capture. A standard headset and two controllers provide only three tracked points: the head and the two hands. Software fills in the rest of the body from those three points using inverse kinematics (IK), which calculates plausible positions for the arms, spine, and legs that are consistent with the three known points.^[9] In the social platform VRChat, this three-point setup is the baseline, and IK estimates the avatar's pose from the head and hands; the IK 2.0 system also exposes options for which points to lock when the estimate produces an unnatural spine angle.^[9]

Adding more tracked points moves a VR avatar from estimated to measured body motion, which is what full-body tracking (FBT) does. Extra trackers placed on the hips, feet, and sometimes the knees, chest, and elbows feed real movement to the avatar instead of an IK guess.^[10] The trackers themselves come in the same three technology families as professional mocap: lighthouse-based optical trackers such as the HTC Vive Tracker and the Tundra Tracker, which need external base stations; IMU-based trackers such as SlimeVR; and self-contained camera-based trackers such as the HTC Vive Ultimate Tracker.^[10] The Vive Tracker is marketed for both VRChat full-body tracking and motion capture, illustrating how closely the two uses overlap.^[11]

Standalone headsets have begun to capture body pose with their own cameras, applying markerless techniques without any worn trackers. Meta's Inside-Out Body Tracking (IOBT), released for public use in December 2023 in the v60 software update and exclusive to the Meta Quest 3, uses the headset's cameras to track the wrists, elbows, and torso rather than estimating them with IK.^[12] Meta pairs it with Generative Legs, which works across all Quest devices and uses an AI model to infer leg movement from the upper body without extra hardware, although it is an estimator that detects actions such as jumping and crouching rather than tracking the legs directly.^[12]^[13] Facial capture has likewise moved onto consumer hardware, where headset or phone cameras drive a face-tracked avatar in real time.^[8]

Beyond avatars, professional optical mocap is used inside VR and AR production. Systems that surround a volume with infrared cameras can track multiple objects with low latency, and the same rigs are used to track props, devices, and people for research and for virtual cinematography, including the kind of camera and body tracking used to make Avatar.^[14]^[6]

Major systems

A handful of companies dominate professional motion capture, split along the optical and inertial lines.

System / company	Technology	Notes
Vicon	Optical, marker-based	Founded around a 1984 management buyout of Oxford Dynamics, with roots in an Oxford Instruments product first sold in 1979; widely used in film, games, biomechanics, and VR research.^[15]
OptiTrack (NaturalPoint)	Optical, marker-based	Infrared camera systems with sub-millimeter precision used in film visual effects, game studios, and VR/AR tracking.^[14]^[1]
Xsens	Inertial (IMU)	Founded in 2000 by Casper Peeters and Per Slycke; the Xsens MVN suits are widely used in game development and film. The brand traded under the Movella name after a 2021 rebrand, then returned to the Xsens name in 2026.^[16]^[17]
Rokoko	Inertial (IMU)	The Smartsuit Pro is a lower-cost inertial suit aimed at independent creators.^[18]
Perception Neuron (Noitom)	Inertial (IMU)	A portable IMU suit used for animation and VR body tracking.^[1]

Captured motion is consumed by game engines and animation tools such as Unity and Unreal Engine, which retarget the recorded skeleton onto a character rig.^[1]

Current status

As of 2026 the professional market remains split between high-precision optical systems for studios and portable inertial suits for smaller productions and on-location work, with hybrid optical-inertial methods and AI-driven markerless capture the main directions of active research.^[2]^[1] One market estimate put the wearable 3D motion capture systems segment at about USD 558 million in 2025.^[19] In consumer VR the trend is toward capturing more of the body with the headset's own cameras and AI estimation, reducing the need for external trackers while still falling short of the accuracy of a full studio rig.^[12]^[13]

References

[MoCapOnlineTypes-1] 1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 "Motion Capture Technology: Types & Applications in 2025". https://mocaponline.com/blogs/mocap-news/motion-capture-technology-guide.

[HybridSurvey-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 ^2.5 Template:Cite arXiv

[ScienceMuseum-3] 3.0 ^3.1 ^3.2 ^3.3 ^3.4 "Art imitates life: The surprising origins of motion capture". https://www.scienceandmediamuseum.org.uk/objects-and-stories/surprising-origins-motion-capture.

[OpticalOverview-4] 4.0 ^4.1 "Optical Motion Capture - an overview". https://www.sciencedirect.com/topics/computer-science/optical-motion-capture.

[PolarExpress-5] "The Polar Express by Zemeckis". https://history.siggraph.org/animation-video-pod/the-polar-express-by-zemeckis/.

[BeforesAfters-6] 6.0 ^6.1 "A visual history of performance capture at Weta Digital". 2019-08-21. https://beforesandafters.com/2019/08/21/a-visual-history-of-performance-capture-at-weta-digital/.

[EgoCap-7] Template:Cite arXiv

[Blendshapes-8] 8.0 ^8.1 ^8.2 Template:Cite arXiv

[VRChatIK-9] 9.0 ^9.1 "IK 2.0 Features and Options". https://docs.vrchat.com/docs/ik-20-features-and-options.

[VRChatFBT-10] 10.0 ^10.1 "Full-Body Tracking". https://wiki.vrchat.com/wiki/Full-Body_Tracking.

[ViveTracker-11] "VIVE Tracker (3.0)". https://www.vive.com/us/accessory/tracker3/.

[MetaIOBT-12] 12.0 ^12.1 ^12.2 "Create More Natural Movements Using Inside-Out Body Tracking and Generative Legs". 2023-12-20. https://developers.meta.com/horizon/blog/inside-out-body-tracking-and-generative-legs/.

[RoadtoVRLegs-13] 13.0 ^13.1 "New Quest Dev Tools to Add Leg Estimation for More Convincing Avatars". https://www.roadtovr.com/quest-3-dev-tools-body-tracking-avatars/.

[OptiTrack-14] 14.0 ^14.1 "Motion Capture Systems". https://www.optitrack.com/.

[ViconAbout-15] "About Us - The Vicon Difference". https://www.vicon.com/about-us/.

[XsensWiki-16] "Xsens". https://en.wikipedia.org/wiki/Xsens.

[XsensBrand-17] "Movella". https://www.xsens.com/movella.

[Rokoko-18] "Xsens vs Rokoko: Honest Motion Capture Comparison". https://www.rokoko.com/insights/xsens-vs-rokoko-honest-motion-capture-comparison-for-creators.

[MarketReport-19] "Wearable 3D Motion Capture Systems Market Outlook 2026-2034". https://www.intelmarketresearch.com/wearable-d-motion-capture-systems-market-27638.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]