NVIDIA ‘3D MoMa’ research method lets content creators improvise with 3D objects

NVIDIA paid homage to jazz music with AI research that could one day allow graphic designers to improvise with 3D objects created in the time it takes to hold a jam session. The method, NVIDIA 3D MoMa, could allow architects, designers, concept artists and game developers to quickly import an object into a graphics engine to start working with it, changing the scale, changing the material or experimenting different lighting effects.

According to a report on NVIDIA’s website, research showcased this technology in a video celebrating jazz and its birthplace, New Orleans, where the paper behind 3D MoMa will be presented at this week’s computer vision and pattern recognition.

NVIDIA Vice President of Graphics Research David Luebke said reverse rendering, a technique for reconstructing a series of still photos into a 3D model of an object or scene, “has long been a Holy Grail unifying computer vision and computer graphics”.

“By formulating each element of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses modern AI machinery and the raw computing power of NVIDIA GPUs to rapidly produce 3D objects that creators can import, edit and extend without limitation in existing tools,” he said.

A 3D object should be in a form that can be inserted into widely used tools such as game engines, 3D modelers, and movie renderers, in order to be most useful to an artist or engineer. This shape is a triangular mesh with textured materials, the common language used by these 3D tools.

Game studios and other creators have traditionally created 3D objects like these with complex photogrammetry techniques that require a lot of time and manual effort. Recent work in neural radiation fields can quickly generate a 3D representation of an object or scene, but not in an easily editable triangular mesh format.

NVIDIA 3D MoMa generates triangular mesh models in an hour on a single NVIDIA Tensor Core GPU. The pipeline’s output is directly compatible with 3D graphics engines and modeling tools creators already use.

The pipeline reconstruction consists of three elements: a 3D mesh model, materials, and lighting. The mesh is like a papier-mâché model of a 3D shape constructed from triangles. Using mesh, developers can modify an object to fit their creative vision. Materials are 2D textures layered over 3D meshes like a skin. And NVIDIA 3D MoMa’s estimation of how the scene is lit allows creators to modify object lighting later.

To showcase the capabilities of NVIDIA 3D MoMa, NVIDIA’s research and design teams began by collecting approximately 100 images of each of five jazz band instruments – a trumpet, trombone, saxophone, drums, and clarinet – from different angles. NVIDIA 3D MoMa reconstructed these 2D images into 3D representations of each instrument, represented as meshes. The NVIDIA team then took the instruments out of their original scenes and imported them into the NVIDIA Omniverse 3D simulation platform for editing.

In any traditional graphics engine, creators can easily interchange the material of an NVIDIA 3D MoMa-generated shape, as if dressing the mesh of different outfits. The team did this with the Trumpet model, for example, by instantly converting its original plastic to gold, marble, wood, or cork.

Creators can then place the newly modified objects into any virtual scene. The NVIDIA team dropped the instruments into a Cornell box, a classic graphics test for rendering quality. They demonstrated that virtual instruments react to light just as they would in the physical world, with shiny brass reflecting brightly and dull drumheads absorbing light.

These new objects, generated by reverse rendering, can be used as building blocks for a complex animated scene – presented in the video’s finale as a virtual jazz band.