Stable Diffusion Explained

Have you ever imagined a futuristic SUV may look like?

Or perhaps a painting of a baby yoda dressed in hiphop cloths?

Baby Yoda Wearing Hip Hop Clothes - MidJourney

Probably not, but a new technology called Stable diffusion is making it possible to turn anything you can imagine into a piece of AI created art. Over the last year we have seen rapid advances in artificial intelligence. OpenAI has made amazing advances with it’s GPT3 technology and more recently with it’s Dalle-2 technology. But there’s one thing that really sets Stable Diffusion apart, it’s Open Source!

What is it?

Stable Diffusion is a technology that transforms text input into graphical images. You enter a text prompt and it’ll produce an image. Stable Diffusion can create people, paintings, landscapes and buildings. It’s been trained on millions of images giving it visual knowledge of humanity. The technology is breathtaking but the thing that sets Stable Diffusion apart from previous solutions is that it’s entirely open-source. This means that anyone with a computer and a really good graphics card can try it themselves. This also means that thousands of hobbyists, engineers, and entrepreneurs are creating amazing new products using stable diffusion and helping advance the technology at an incredibly rapid pace.

How does it work?

I won’t use too much technical jargon but Stable Diffusion has a few steps.

First the model is trained using millions of images and words from across the Internet. This model creates an association of what words are tend to correspond to what types of images. This is called CLIP.
The second part is that the AI gets trained to get rid of randomness. This is done by taking a clean image with the description text and introducing noise into the image. The AI then gets good at removing the noise, given the text as input. This is repeated over and over until the AI gets really good at defusing the noise. You could say it get’s stable at de-fusion. The CLIP model helps guide the diffusion.

Now you give it new text prompt that it’s never seen before and let it diffuse the noise and BAM. It’s able to produce unique images diffused from random noise.

How to get started?

There are three ways to get started quickly. There are a number of online sites where you can upload some images and get avatars back. There are also discord communities like MidJourney where you can just join the community and enter your prompt. Many of these are free to try with paid memberships.
The second way is using a cloud provider. Google Colab is one of my favorites the easiest to use one of the Stable Diffusion scripts. It’ll download the files and let you run Stable Diffusion in the cloud. There are lots of variations and you can customize it if you’re technical.
The third way is to install Stable Diffusion on your own computer. There are a number of scripts and tools that can do this, it seems to be most supported on a PC though Mac versions do exist. I’ve been using Stable Diffusion WebUI though I’ve played with others.

Stable diffusion is incredibly disruptive because of both the quality of the images it’s able to produce and the open nature of the technology. Thousands of variations have emerged in the last few months and new startups are bring advances faster than ever. I believe the core concept will be applied to new areas including: 3D, Audio, Video, Animation, Architecture, Interior design, and more.

Pros and Cons

The technology does have it’s limits. The current iterations are poor at producing certain elements of images, especially human hands and anything that includes text information. Over time this should improve.
Stable Diffusion also requires people to understand how to construct prompts. Because we’re taking a low resolution medium such as text and producing a high-density of information you need to get very specific in prompt construction to get exceptional results.

Resources

Online Stable Diffusion sites to get started –

Avatar Creators

Google Colab Notebooks

Library for Installing on your PC

Stable Diffusion Explained

What is it?

How does it work?

How to get started?

Further Reading

How does ChatGPT and LLMs work

Predicting Startup Founder Success

Static Sites vs. CMS