Stability AI releases Open Source Stable Diffusion 3 Medium Article Image Generation Model

2024-06-13

"Bigger" doesn't always mean "better," especially when it comes to running generative AI models on commodity hardware.

This is the insight that Stability AI has gained, as the company recently released a medium-sized version of Stable Diffusion 3. Stable Diffusion is Stability AI's flagship model, capable of generating images from text. The initial version of Stable Diffusion 3 was previewed on February 22 and made publicly available through an API on April 17.


The newly launched medium-sized version of Stable Diffusion aims to be a smaller yet powerful model that can run smoothly on consumer-grade GPUs. This new medium-sized model makes Stable Diffusion 3 a more ideal choice for users and organizations with limited resources but still eager to use high-performance image generation technology.

The Stable Diffusion medium-sized version is now available for users to try out through the API and is offered on the Stable Artisan service via the Discord platform. Additionally, the weights of this model will also be available on Hugging Face for non-commercial use.

With the release of the new version, the initial release version of Stable Diffusion is now referred to as Stable Diffusion 3 (SD3) Large. Christian Laforte, Co-CEO of Stability AI, revealed to VentureBeat that SD3 Large has 8 billion parameters. In contrast, SD3 Medium has only 2 billion parameters.

"Unlike SD3 Large, SD3 Medium is smaller in size and can run efficiently on consumer-grade hardware," Laforte said.

The Stable Diffusion medium-sized version will run on 5GB of GPU VRAM

Although many generative AI workloads, including Stable Diffusion, have long relied on powerful Nvidia GPUs, Stability AI's new model breaks this tradition.

The minimum requirement to run Stable Diffusion Medium is only 5GB of GPU VRAM. With this configuration, the model can run on various consumer-grade PCs and high-end laptops. However, it is worth mentioning that this minimum requirement is just a baseline. Stability AI recommends using 16GB of GPU VRAM, although this may be challenging for most laptops, it is not an unreasonable configuration.

Stable Diffusion Medium may be small, but its functionality remains intact

Despite having fewer parameters, Stability AI claims that SD3 Medium is functionally equivalent to SD3 Large and maintains a high level of quality.

According to Laforte, SD3 Medium stands out with its range of features that are identical to SD3 Large. These features include photorealism, prompt compliance, layout, resource efficiency, and fine-tuning, all of which are characteristics of smaller models.

"SD3 Medium performs exceptionally well in all the mentioned features and is on par with the current version of the SD3 Large API that you currently like and use," Laforte said.

Laforte points out that users can expect highly realistic image outputs from SD3. He explains that with the use of a 16-channel VAE (Variational Autoencoder), SD3 Medium provides more details per megapixel than any previous model.

Regarding prompt compliance, he states that SD3 achieves impressive levels of understanding in natural language prompts. This includes spatial understanding of elements in the image, such as their position.

According to Laforte, this small model also excels in fine-tuning. He notes that the model is highly adaptable and efficiently captures details in fine-tuning datasets.

One major highlight of SD3 as a whole is its layout feature, which is also carried over to the medium-sized version of SD3.

However, the biggest feature of SD3 Medium is its resource efficiency.

"The 2 billion parameter model is relatively small and modular, allowing for reduced computational requirements without sacrificing performance," Laforte said. "This makes SD3 Medium an ideal choice for environments where resource management and efficiency are crucial."