Midjourney has implemented one of the most anticipated features by users: maintaining consistency of characters in new images.
This has always been a major challenge for AI image generators, mainly due to their limitations.
This is mainly because most AI image generators are based on "diffusion models", tools that are similar to or inspired by Stable Diffusion, an open-source image generation algorithm developed by Stability AI. These tools mainly stitch together images pixel by pixel that match the description based on similar images and text labels learned from a training dataset of millions of human-created images.
However, maintaining consistency of characters is important and difficult to achieve for generative AI images because all generative AI applications suffer from the problem of inconsistent responses: AI generates new content, in the case of Midjourney, images, based on each input prompt. So, how do you handle it if you are storyboarding for visual media such as movies, novels, comics, or manga, and you want the same character or characters to move and appear in different scenes, settings, facial expressions, and props?
This kind of scene that usually requires narrative coherence has always been a challenge for generative AI to overcome - at least until now. However, Midjourney is now attempting to address this issue by introducing a new tag called "--cref" (short for "character reference"), which users can add to the end of text prompts in Midjourney's Discord server and try to match the facial features, body shape, and even clothing of characters from the URL pasted by the user after this tag.
With the continuous improvement of this feature, Midjourney may transform from an interesting creative tool into a more professional application.
How to use Midjourney's new character consistency feature
This tag is best used in conjunction with previously generated Midjourney images. So, for example, the user's workflow would be to first generate or retrieve the URL of a previously generated character image.
Let's start from scratch, assuming we generate a new character with this prompt: "A muscular bald man wearing a beaded eye mask."
We will enlarge our favorite image and then click on it in Midjourney's Discord server to find the "Copy Link" option.
Then, we can enter the new prompt "standing in a villa wearing a white tuxedo --cref [URL]" and paste the URL of the image we just generated. Midjourney will attempt to generate the previous character in the new scene we input.
You will see that although the result is not identical to the original character (or even our original prompt), it is indeed encouraging.
In addition, users can also control the extent to which the new image faithfully reproduces the original character by adding the "--cw" tag at the end of the new prompt (after the string "--cref [URL]") followed by a number between 1 and 100 (e.g., "--cref [URL] --cw 100"). The lower the "cw" number, the greater the variation in the generated image; the higher the "cw" number, the closer the generated new image is to the original reference image.
In our example, you can see that inputting a very low "cw 8" actually returns the desired result: a white tuxedo. Although it has now removed the iconic eye mask of our character.
Well, there's nothing the "vary region" feature can't solve, right?
Well, so the eye mask is on the wrong eye... but we're getting closer!
You can also merge multiple characters into one by inputting two "--cref" tags side by side and entering their respective URLs.
This feature has just been launched recently, but artists and creators are already testing it. If you have Midjourney, why not give it a try yourself.
Although Midjourney V6 is still in the alpha stage, this feature and other features may change suddenly, but the official beta version of V6 is coming soon.