Alibaba Brings AI Characters to Life with Make-A-Character

2023-12-28

Alibaba's Make-A-Character (Mach) effortlessly transforms text descriptions into personalized visual representations, providing users with a convenient tool to create virtual avatars that match their desired characters.

In order to lower the barrier for creating 3D digital characters, Alibaba's researchers have recently introduced a text-to-3D model conversion tool called Make-A-Character, also known as Mach. This new tool utilizes large-scale language and visual base models to generate detailed and realistic 3D images from simple text descriptions or natural language.

The researchers stated that the current version focuses on generating visually appealing 3D images of Asian ethnicity, as their selected SD models were primarily trained on Asian facial images. They expect to expand support for different ethnicities and styles in the coming months.

Furthermore, the researchers mentioned that their de-lighting dataset only contains clean facial textures. The generated images may diminish unnatural facial patterns such as graffiti or stickers. "Currently, our clothing and body parts are pre-made and matched based on text similarity. However, we are actively developing text-driven clothing, expression, and action generation techniques," shared the researchers.

How does it work?

Alibaba's Mach seamlessly converts text descriptions into visual representations, providing users with a simple way to create customized avatars that align with their intended characters.

Its working mechanism involves mapping these semantic attributes (hints) to corresponding visual cues, further guiding the generation of reference portrait images using Stable Diffusion and ControlNet.

Once this step is completed, the target face's mesh and texture are generated and assembled through a series of 2D facial parsing and 3D generation modules, while adding matching accessories. Subsequently, parameterized representations make the generated 3D images easily animatable.

Other AI models

Just a few days ago, Alibaba also addressed the challenge of 2D to 3D generation by introducing Richdreamer, a normal diffusion model. Additionally, Alibaba introduced "Animate Anyone," an advanced character animation technology that converts static images into dynamic character videos using diffusion models.

Building on this momentum, Alibaba recently launched Qwen-72B, a language model with more parameters and greater customization, following the release of Qwen-7B in October. Furthermore, it has also gifted the research community with a smaller language model, Qwen-1.8B, which has a moderate requirement of 2K context length and only 3GB GPU memory.