Digital human
Digital human is a term for a computer-generated, three-dimensional representation of a person, designed to look and behave like a real human and used as a character, presenter, or interactive avatar. The term covers a range of fidelity, from stylized characters to photorealistic faces, and a range of control methods, from pre-scripted animation to performance capture to real-time generative artificial intelligence. The words virtual human and digital human are used interchangeably in research and industry, and Epic Games has also promoted the spelling MetaHuman for its own toolset.[1][2]
Digital humans are relevant to virtual and augmented reality because they supply the embodied figures that populate social spaces, the on-screen agents used for telepresence and customer service, and the avatars that represent users themselves. They draw on several established fields: human figure and facial animation, photogrammetry and 3D scanning, real-time rendering, motion capture, and, more recently, large language models and audio-driven facial animation.[1][3][4]
Terminology and scope
There is no single agreed definition. In the academic literature the established term is virtual human, described in a 2002 survey by Jonathan Gratch and colleagues as software characters that "look like, act like and interact with" people, combining human figure animation, facial conversation, and models of emotion and personality.[3] In commercial use, "digital human" is often applied more narrowly to high-fidelity, photoreal figures intended for real-time engines, film, or branded virtual presenters. The reference article on the subject treats virtual human, digital human, and metahuman as synonyms and lists their uses across simulation, video games, film production, ergonomic studies, telecommunications avatars, and medicine.[1]
A digital human is distinct from a plain text or voice chatbot in that it has a rendered body and face, and it is distinct from a generic avatar mainly by degree: digital humans tend toward realism and, in interactive systems, toward perceived intelligence and emotional expression. The same underlying assets can serve either as a user-controlled avatar or as an autonomous, AI-driven agent.[1][4]
History
Computer models of the human figure predate real-time graphics by decades. William Fetter, working at Boeing in the early 1960s, produced a computer-drawn human figure for cockpit ergonomics that became known as the "Boeing Man," and through the 1960s and 1970s aerospace and automotive firms built articulated mannequin models such as Boeman and Chrysler's Cyberman for fit and reach studies. Frederic Parke pioneered parametric facial animation at the University of Utah in the early 1970s.[1]
Through the 1980s and 1990s the field moved from static ergonomic models toward animated and then real-time, interactive characters, drawing on motion capture, behavioral animation, and crowd simulation, with Nadia Magnenat-Thalmann and Daniel Thalmann among the researchers who produced early synthetic film actors.[1] A separate research thread merged virtual reality with artificial intelligence to create embodied conversational agents, virtual figures that hold face-to-face dialogue using speech, gaze, and gesture. Gratch and co-authors' 2002 IEEE Intelligent Systems paper "Creating Interactive Virtual Humans: Some Assembly Required" surveyed this work across face-to-face conversation, emotion and personality, and figure animation, and groups such as the University of Southern California Institute for Creative Technologies built interactive virtual humans for training and simulation.[3]
The term entered wide mainstream use in the 2020s as real-time engines made photoreal faces accessible and as generative AI made digital humans conversational without hand-scripted dialogue.[2][4]
How digital humans are made
Building a realistic digital human involves several stages that can be combined or substituted depending on the target fidelity:
- Geometry and appearance. The head and body mesh, skin texture, hair, and clothing are either modeled by hand, derived from photogrammetry and 3D scans of a real person, or generated from a template. Skin is rendered with physically based shading to approximate subsurface scattering.[1][2]
- Rigging. A skeletal and facial control rig lets the geometry move; facial rigs are built around expression controls so that captured or generated performances drive the same set of parameters across different characters.[2][5]
- Animation and performance capture. Motion is supplied by keyframe animation, body and facial tracking, or audio-driven systems. Facial performance can be captured with a head-mounted camera rig or, more cheaply, with a consumer smartphone, then solved onto the rig.[5][6]
- Real-time rendering. For interactive and VR/AR use the character is rendered every frame inside a game engine, which constrains polygon counts, shading, and animation to a real-time budget.[2]
- Interactivity (optional). Autonomous digital humans add speech recognition, a language model for responses, text-to-speech, and audio-to-animation so the figure can converse and react.[4][6]
MetaHuman
Unreal Engine maker Epic Games describes MetaHuman as a framework for "creating and animating highly realistic and emotive digital human characters for real-time 3D projects, as well as for film and television content."[2] Epic launched the cloud-based MetaHuman Creator in early access on 14 April 2021, aiming to let users build a fully rigged, photoreal, animation-ready character in under an hour rather than the weeks or months such work traditionally took.[7] The toolset is built on technology from companies Epic acquired: 3Lateral (digital human and facial rig technology, 2019), Cubic Motion (automated performance-driven facial animation, 2020), and Quixel (the Megascans photogrammetry asset library, November 2019).[8][9]
Two later additions extended the pipeline. Mesh to MetaHuman converts an existing scanned or sculpted head mesh into an editable MetaHuman. MetaHuman Animator, released on 15 June 2023, captures a facial performance with an iPhone (model 12 or newer) or a stereo head-mounted camera and produces facial animation locally on GPU hardware in minutes; the resulting animation can be played back in real time on any MetaHuman, though the solving step itself is not real-time.[5]
AI-driven digital humans
A second strand uses generative AI to make digital humans interactive. NVIDIA's Avatar Cloud Engine (ACE), announced at COMPUTEX on 2 June 2024, packages models for each part of a digital human: Riva for speech recognition and text-to-speech, the Nemotron family of language models for understanding and responses, Audio2Face for facial animation driven by an audio track, and Omniverse RTX for path-traced skin and hair rendering.[4] Audio2Face uses deep neural networks to animate a 3D face to match any voice-over and can run in real time for interactive use or be baked into a clip; NVIDIA also released Audio2Face animation models as open source.[6] NVIDIA lists customer service, gaming, healthcare, and telehealth as early uses, naming adopters including Aww Inc., Perfect World Games, ServiceNow, Hippocratic AI, and UneeQ.[4]
Relevance to VR and AR
In virtual and augmented reality, digital humans appear in three main roles.
The first is the user's own avatar in social VR and the metaverse. Real-time avatars can mirror a user's expressions and gestures using the eye-tracking, face-tracking, and hand-tracking sensors built into newer XR headsets, which lets a digital human convey some of the wearer's behavior and emotional state to others in a shared space, contributing to social presence.[10]
The second is the autonomous agent or virtual presenter: an AI-driven digital human acting as a guide, receptionist, salesperson, or customer-service assistant inside an application or kiosk. The same technology underpins enterprise deployments where companies use lifelike avatars as digital assistants, with proposed future roles spanning healthcare and financial advice.[10][4]
The third is telepresence. A live person can be represented to remote participants as a 3D figure or hologram, reconstructed from cameras and reinforced with AI-driven animation, so that conversations across distance keep a sense of a present, embodied counterpart.[10]
Outside headset experiences, photoreal digital humans have also become public-facing media figures. The CGI virtual influencer Lil Miquela, created by the startup Brud and launched on Instagram in April 2016, built a following in the millions and signed brand deals with companies including Prada, Calvin Klein, and Samsung, an early demonstration that an entirely synthetic persona could function as a marketable character.[11]
Limitations
The main perceptual obstacle for realistic digital humans is the uncanny valley: as a synthetic face approaches but does not reach human likeness, small errors in skin, eyes, or expression can make it look unsettling rather than convincing, an effect repeatedly noted in coverage of photoreal virtual influencers and avatars.[11] Real-time rendering on headset-class hardware also limits the geometry, shading, and simulation budget available for skin, hair, and cloth, so the most photoreal results are still easier to achieve in offline film rendering or on powerful workstations than inside a standalone VR/AR device. Convincing interactive behavior depends in turn on the quality of the underlying speech, language, and animation models.[2][4]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 "Virtual human". https://en.wikipedia.org/wiki/Virtual_human.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 "MetaHuman - High-Fidelity Digital Humans Made Easy". https://www.metahuman.com/.
- ↑ 3.0 3.1 3.2
- Rickel, Jeff(July 2002). "Creating Interactive Virtual Humans
- Some Assembly Required".{Template:Journal. 17(4)
- 54-63. doi:10.1109/MIS.2002.1024753.
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 "NVIDIA Releases Digital Human Microservices, Paving Way for Future of Generative AI Avatars". 2024-06-02. https://nvidianews.nvidia.com/news/digital-humans-ace-generative-ai-microservices.
- ↑ 5.0 5.1 5.2 "Delivering high-quality facial animation in minutes, MetaHuman Animator is now available". 2023-06-15. https://www.metahuman.com/news/delivering-high-quality-facial-animation-in-minutes-metahuman-animator-is-now-available.
- ↑ 6.0 6.1 6.2 "Omniverse Audio2Face AI Powered Application". https://www.nvidia.com/en-us/omniverse/apps/audio2face.md/.
- ↑ "Epic Games launches MetaHuman Creator". 2021-04-14. https://www.cgchannel.com/2021/04/epic-games-announces-metahuman-creator/.
- ↑ Template:Cite news
- ↑ Template:Cite news
- ↑ 10.0 10.1 10.2 "Digital Human Avatars in Enterprise: Roles and Use Cases". https://augmentedenterprisesummit.com/human-digital-twins-role-of-avatars-in-enterprise/.
- ↑ 11.0 11.1 Template:Cite news