Tencent MixElement DiT Upgrade: Introducing 6G VRAM Version, Supporting Kohya Training AI NEWS

Home
AInews
Tencent MixElement DiT Upgrade: Introducing 6G VRAM Version, Supporting Kohya Training

Tencent MixElement DiT Upgrade: Introducing 6G VRAM Version, Supporting Kohya Training

2024-07-05

Tencent recently announced that its Huan Yuan Wen Sheng Tu large-scale model (Huan Yuan DiT) has launched a small VRAM version, which significantly reduces hardware requirements and only requires 6GB VRAM to run. This greatly facilitates local deployment and development for personal computer users. This version has been adapted to the LoRA and ControlNet plugins in the Diffusers library, and has added support for the Kohya graphical interface, further reducing the threshold for developers to train personalized LoRA models.

At the same time, the Huan Yuan DiT model has been upgraded to version 1.2, with improvements in image generation quality and composition ability, providing users with a better Wen Sheng Tu experience. In addition, Tencent has officially open-sourced the "Huan Yuan Captioner" labeling model, which focuses on Wen Sheng Tu scenes, supports both Chinese and English languages, and can understand and express Chinese semantics more accurately, generating more structured, complete, and accurate image descriptions, especially skilled at recognizing well-known figures and landmarks.

The open-source of Huan Yuan Captioner means that developers can quickly generate high-quality Wen Sheng Tu datasets. By importing original image sets or images with descriptions into this model, developers can obtain structured and high-quality annotations, thereby improving the quality of the dataset. The model also allows developers to supplement and import personalized background knowledge to meet specific needs.

It is worth noting that the launch of the small VRAM version of Huan Yuan DiT is the result of cooperation with Hugging Face. Both parties have adapted this version and the LoRA and ControlNet plugins to the Diffusers library, simplifying the calling process, allowing developers to achieve function calls with just a few lines of code. In addition, Huan Yuan DiT has also integrated with the Kohya platform, allowing developers to easily perform full-parameter fine-tuning and LoRA training of the model through a graphical interface, further reducing the technical threshold.

The Huan Yuan Captioner model has been optimized for Wen Sheng Tu scenes, improving the completeness and accuracy of descriptions by constructing a structured image description system and injecting background knowledge from various sources. The appearance of this model is expected to solve the industry's problems of simple, tedious, or lack of background knowledge in image description text generation, especially for Chinese users, its accurate Chinese description ability will be a highlight.

Since its comprehensive open-source, the Huan Yuan DiT model has attracted attention and support from many developers due to its ease of use and high performance. In just two months, its number of stars on GitHub has exceeded 2.6k, becoming one of the most popular domestic open-source DiT models. Tencent stated that it will continue to improve the ecosystem construction of Huan Yuan DiT and provide developers with more convenient and efficient tools and services.

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

TTS Maker

Free AI tool for converting text to speech

RECENT AI TOOLS

Spot AI

Miko

Comet

Mirelo AI

Giskard AI

RECENT AI NEWS

Microsoft Deploys the World's First GB300 Supercluster for OpenAI

Unitree R1 Bipedal Humanoid Robot Ranks on TIME's 2025 Best Inventions List

Dishwashing and laundry "housework buddy" is here! Figure 03 humanoid robot: 1.68 meters tall, 5-hour battery life

Sora Reaches 1 Million Downloads Faster Than ChatGPT

Google Launches Gemini Enterprise: Unified AI Platform for Businesses

Figma Leverages Google's Gemini to Accelerate Enterprise AI in Its Design Platform

Intel Launches Panther Lake, the First Core Ultra Based on 18A Process

Amazon Launches Quick Suite, Introducing AI Agents to the Enterprise Workplace

RECENT AI TOOLS