Jump to content

glTF

From VR & AR Wiki

glTF (GL Transmission Format) is a royalty-free, JSON-based open standard for the efficient transmission and loading of 3D scenes and models by engines and applications.[1] It is developed and maintained by the Khronos Group, the same standards consortium responsible for OpenGL, Vulkan and WebGL.[1] A glTF asset minimizes both the size of 3D content and the runtime processing needed to unpack and render it, which has made the format the de facto delivery container for 3D models on the web and a common interchange target across digital content creation tools.[1][2] Because it aims to do for 3D models what JPEG did for images, its creators and the wider community frequently describe glTF as "the JPEG of 3D."[1][3]

For virtual reality and augmented reality, glTF is the format in which most ready-to-display 3D content arrives. Web 3D engines such as Three.js and Babylon.js load glTF natively and render it through WebGL, and immersive web experiences built on the WebXR Device API present glTF and its binary variant (.glb) as the assets a user places and interacts with in a scene.[1][4] glTF was the first widely adopted 3D delivery format to standardize physically based rendering, so models authored once look consistent across the many renderers and headsets that consume them.[3][5]

History

The format originated within the COLLADA working group at Khronos, which conceived a compact runtime counterpart to its heavier authoring format; an early demonstration was shown at SIGGRAPH 2012 under the name WebGL Transmissions Format, later shortened to glTF.[6]

Khronos released the finalized glTF 1.0 specification on 19 October 2015. It defined a vendor- and runtime-neutral format that paired an easily parsed JSON scene description with binary geometry, textures and animations, designed so that content could be loaded into WebGL applications with minimal additional parsing.[7] Companies and groups involved in the 1.0 effort included Analytical Graphics, Autodesk, Adobe, Microsoft, Cesium, Fraunhofer IGD and the MPEG consortium, and loaders shipped at launch for Three.js, Babylon.js, Cesium and X3DOM.[7]

glTF 2.0 followed on 5 June 2017, announced at the Web3D 2017 conference. It was a substantial overhaul of the format. Its central addition was physically based rendering (PBR): a portable, consistent way to describe materials, based on a proposal from Fraunhofer presented at SIGGRAPH 2016, that replaced the WebGL-style GLSL shaders glTF 1.0 had used for materials.[5][6] By moving from API-specific shaders to a small set of PBR parameters, glTF 2.0 made the same material render consistently across rendering back-ends such as OpenGL, Direct3D and Metal.[5] Engine and tool support for 2.0 was announced from Babylon.js, Three.js, Cesium, Sketchfab and others, with industry backing from Adobe, Google, Microsoft, NVIDIA, Oculus and UX3D among others.[5] glTF 2.0 remains the current major version and is the basis of essentially all modern glTF tooling.[6]

In July 2022, glTF 2.0 was published as an international standard, ISO/IEC 12113:2022, with Khronos stating that it would make regular submissions to fold newly adopted functionality into refreshed versions of the standard.[6]

Version Finalized Key characteristics
glTF 1.0 19 October 2015 First release; JSON scene description plus binary buffers; materials expressed with GLSL/WebGL shaders
glTF 2.0 5 June 2017 Major overhaul; adds metallic-roughness physically based rendering, replacing per-API shaders; published as ISO/IEC 12113:2022

File formats and packaging

glTF content is delivered in one of two packagings that share the same underlying data model.[1][8]

The text form uses the .gltf extension: a JSON document that describes the scene and references external resources, typically a binary buffer file (.bin) holding geometry and animation data and separate image files (such as PNG or JPEG) for textures.[1][6]

The binary form uses the .glb extension, which bundles the JSON description, the binary buffers and (optionally) the images into a single self-contained file. This is generally preferred for distribution because it travels as one download.[1][8] A .glb file begins with a 12-byte header containing three little-endian 32-bit values: a magic number equal to the ASCII string "glTF" (0x46546C67), a container version, and the total file length. The header is followed by aligned chunks: a mandatory first JSON chunk carrying the same content that a .gltf file would hold, and an optional binary buffer chunk (BIN) containing the geometry, animation key frames, skins and image data.[8]

Data model

A glTF asset is organized as a JSON document made up of top-level arrays that reference one another by index. The core arrays describe a complete scene: scenes and a hierarchy of nodes that position objects in space; meshes built from accessors, bufferViews and buffers that point into the binary data; materials, textures, images and samplers that define appearance; skins and animations that drive skeletal and keyframe motion; and cameras that define viewpoints.[1][8] The scene graph of nodes is the fundamental structure, with meshes, cameras and skins attached to nodes in a transform hierarchy.[6]

Materials in glTF 2.0 use a metallic-roughness PBR model in which a material is defined by a few concise parameters, principally a base color, a metallic value and a roughness value, optionally supplied as textures, that can be used to generate shaders for any rendering API.[5][8] Because the model is parametric rather than tied to a specific shading language, a glTF material is meant to look the same regardless of which engine or device renders it.[5]

The format is extensible through named extensions, with ratified Khronos extensions carrying the KHR_ prefix and vendor extensions using their own prefixes.[1][6] Widely used extensions add capabilities such as Draco mesh compression to shrink geometry and KTX 2.0 textures with Basis Universal supercompression to reduce texture size and GPU memory use, both of which matter for streaming large 3D content to mobile and standalone headsets.[1][6]

Use in virtual and augmented reality

glTF reaches VR and AR almost entirely through higher-level engines and components rather than being parsed by applications directly. Three.js, a foundational WebGL library for browser 3D and WebXR, ships a glTF loader and actively upstreams glTF rendering improvements, and Babylon.js similarly treats glTF as a first-class import format.[5][3]

On the immersive web, glTF is the asset that WebXR sessions display. Google's open-source <model-viewer> web component, which is built on Three.js, loads models in the .gltf and .glb formats and lets a page move from an in-page 3D view into augmented reality.[4][3] On ARCore-capable Android devices, <model-viewer> places the glTF/GLB model in AR through the browser's WebXR functionality (or Google's Scene Viewer), keeping the experience in the browser and avoiding a second download of the model.[4]

Apple's platforms are the main exception to glTF-everywhere on the web. Safari on iOS presents web AR through AR Quick Look, which uses Pixar's USDZ format (Universal Scene Description) served with the MIME type model/vnd.usdz+zip rather than glTF; the feature shipped with iOS 12 in 2018 and is triggered by a link carrying the rel="ar" attribute.[9] To cover both ecosystems, cross-platform web AR commonly supplies a glTF/GLB model for Android and WebXR and a parallel USDZ file for iOS Quick Look; <model-viewer>, for example, accepts a separate USDZ source for its iOS path.[4][9]

Relationship to USD

glTF is frequently compared with Universal Scene Description (USD), Pixar's open framework, but the two are designed for different points in a 3D pipeline.[10][11] glTF is a transmission and final-delivery format: it is compact, fast to load and intended to carry a finished asset to an end user's engine or browser, and Khronos positions it as complementary to authoring formats, "a common, interoperable distillation target for publishing 3D assets to a wide audience of end users."[1][10] USD, by contrast, is an open framework invented by Pixar for creating, editing and rendering 3D worlds, with a composition engine that enables collaborative, non-destructive assembly of data from multiple sources, capabilities that are valuable during production but unnecessary once an asset is final.[11] In practice a production may author and compose in USD and then export glTF (or, for Apple's AR Quick Look, the USDZ packaging of USD) as the delivery format, so the two formats are often described as complementary rather than as direct competitors.[10][9]

References