We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object’s appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR.
Meta 3D AssetGen is able to generate assets with varying material properties, which allows faithful modelling of the interaction between the object surface as the environment lighting changes. Here, we show assets generated with the prompt "A cat made of MATERIAL".
For more work on similar tasks, please check out
Gaussian Reconstruction Model presents a fast, transformer-based model for efficient 3D reconstruction and generation using pixel-aligned Gaussians.
InstantMesh introduces a fast, efficient framework for generating high-quality 3D meshes from a single image using a multi-view diffusion model and sparse-view large reconstruction.
MeshLRM introduces a fast, efficient model for generating high-quality 3D meshes from just four input images using differentiable mesh extraction.
Instant3D, the original pioneering feed-forward method for generating high-quality and diverse 3D assets from text prompts using a two-stage approach: generating four-view images with a fine-tuned diffusion model and reconstructing a NeRF with a transformer-based sparse-view reconstructor.
LightplaneLRM add highly scalable splatting and rendering kernels to Instant3D's large reconstruction model, improving performance.
LumaAI Genie and Meshy 3 are commercial softwares for creating relightable assets from text prompts.
@article{siddiqui2024assetgen,
author = {Yawar Siddiqui and Tom Monnier and Filippos Kokkinos and Mahendra Kariya and Yanir Kleiman and Emilien Garreau and Oran Gafni and Natalia Neverova and Andrea Vedaldi and Roman Shapovalov and David Novotny},
title = {Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials},
journal = {arXiv},
year = {2024},
}