Tencent MixElement DiT Upgrade: Introducing 6G VRAM Version, Supporting Kohya Training

2024-07-05

Tencent recently announced that its Huan Yuan Wen Sheng Tu large-scale model (Huan Yuan DiT) has launched a small VRAM version, which significantly reduces hardware requirements and only requires 6GB VRAM to run. This greatly facilitates local deployment and development for personal computer users. This version has been adapted to the LoRA and ControlNet plugins in the Diffusers library, and has added support for the Kohya graphical interface, further reducing the threshold for developers to train personalized LoRA models.


At the same time, the Huan Yuan DiT model has been upgraded to version 1.2, with improvements in image generation quality and composition ability, providing users with a better Wen Sheng Tu experience. In addition, Tencent has officially open-sourced the "Huan Yuan Captioner" labeling model, which focuses on Wen Sheng Tu scenes, supports both Chinese and English languages, and can understand and express Chinese semantics more accurately, generating more structured, complete, and accurate image descriptions, especially skilled at recognizing well-known figures and landmarks.


The open-source of Huan Yuan Captioner means that developers can quickly generate high-quality Wen Sheng Tu datasets. By importing original image sets or images with descriptions into this model, developers can obtain structured and high-quality annotations, thereby improving the quality of the dataset. The model also allows developers to supplement and import personalized background knowledge to meet specific needs.

It is worth noting that the launch of the small VRAM version of Huan Yuan DiT is the result of cooperation with Hugging Face. Both parties have adapted this version and the LoRA and ControlNet plugins to the Diffusers library, simplifying the calling process, allowing developers to achieve function calls with just a few lines of code. In addition, Huan Yuan DiT has also integrated with the Kohya platform, allowing developers to easily perform full-parameter fine-tuning and LoRA training of the model through a graphical interface, further reducing the technical threshold.

The Huan Yuan Captioner model has been optimized for Wen Sheng Tu scenes, improving the completeness and accuracy of descriptions by constructing a structured image description system and injecting background knowledge from various sources. The appearance of this model is expected to solve the industry's problems of simple, tedious, or lack of background knowledge in image description text generation, especially for Chinese users, its accurate Chinese description ability will be a highlight.

Since its comprehensive open-source, the Huan Yuan DiT model has attracted attention and support from many developers due to its ease of use and high performance. In just two months, its number of stars on GitHub has exceeded 2.6k, becoming one of the most popular domestic open-source DiT models. Tencent stated that it will continue to improve the ecosystem construction of Huan Yuan DiT and provide developers with more convenient and efficient tools and services.