Open in app
Text-to-Image with Stable Diffusion
How to easily generate images from text using Stable Diffusion on any Python
environment
Luís Fernando Torres
6 min read · 18 hours ago
Listen Share More
Image generated using Stable Diffusion
Introduction
Stable Diffusion is a text-to-image model trained on 512x512 images from a subset
of the LAION-5B dataset.
The goal of this notebook is to demonstrate how easily you can implement text-to-
image generation using the 🤗 Diffusers library, which is the go-to library for
state-of-the-art pretrained diffusion models for generating images, audio, and 3D
structures.
Before jumping in to the coding, however, we need to know what exactly is Stable
Diffusion.
What is Stable Diffusion?
Architecture of the Stable Diffusion
Stable Diffusion is based on a type of diffusion model that is called Latent
Diffusion, which details can be seen in the paper High-Resolution Image Synthesis
with Latent Diffusion Models.
Diffusion models are a type of generative model that are trained to denoise an
object, such as an image, to obtain a sample of interest. The model is trained to
slightly denoise the image in each step, until a sample is obtained. This process can
be seen below:
Image denoising process
These diffusion models have gained popularity in recent years, specially for their
ability to achieve state-of-the-art results in generating image data. However,
diffusion models can consume a lot of memory and be computationally expensive
to work with.
Latent Diffusion, on the other side, reduces complexity and memory usage by
applying the diffusion process over a lower dimensional latent space. In latent
diffusion, the model is trained to generate compressed representations of images.
There are three main components in latent diffusion.
1. Autoencoder (VAE).
2. U-Net.
3. A text-encoder.
Autoencoder (VAE)
The Variational Autoencoder architecture
The Variational Autoencoder (VAE) is a model that consists of both an encoder and
a decoder. While the encoder is used to convert the image into a low dimensional
latent representation to serve an input to the U-Net model, the decoder transforms
the latent representation back into an image.
U-Net
The U-Net architecture
The U-Net is a convolutional neural network that is widely used in image
segmentation tasks. It also has an encoder and a decoder, both comprised of
ResNet blocks. The encoder compresses an image into a lower resolution image,
while the decoder decodes this lower resolution image back to the original higher
resolution, that is supposed to be less noisy.
Text Encoder
How does a text encoder work?
The text-encoder is responsible for transforming the text input prompt into an
embedding space that can be understood by the U-Net. It is usually a simple
transformer-based encoder that maps a sequence of input tokens to a sequence of
latent text-embeddings.
Stable Diffusion Pipeline with 🤗 Diffusers
The StableDiffusionPipeline is a pipeline created by the 🤗 Diffusers library that
allows us to generate images from text with just a few lines of code in Python. It
has many versions and checkpoints available, which you can take a loot at by
visiting the Text-to-Image Generation page of the library documentation.
For this notebook, we are going to use the Stable Diffusion version 1.4
(CompVis/stable-diffusion-v1-4). We are also using torch_dtype = torch.float16 to
load the fp16 weights, this is helpful to reduce the cost of memory used.
Let's start our project by installing the diffusers library.
# Installing diffusers library
!pip install diffusers
We may now import all relevant libraries
# Library imports
# Importing PyTorch library, for building and training neural networks
import torch
# Importing StableDiffusionPipeline to use pre-trained Stable Diffusion models
from diffusers import StableDiffusionPipeline
# Image is a class for the PIL module to visualize images in a Python Notebook
from PIL import Image
Let’s create an instance of the pipeline.
The .from_pretrained("CompVis/stable-diffusion-v1-4") will initialize the diffusion
model with pretrained weights and settings, as well a pretrained VAE, U-Net, and
text encoder to generate images from text.
The torch_dtype = torch.float16 sets the datatype of the model to float16, a lower-
precision floating-point format, to help speed up inference.
# Creating pipeline
pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v
torch_dtype=torch.float16)
Now, we may define a function that is going to create and display a grid of images
generated with Stable Diffusion.
# Defining function for the creation of a grid of images
def image_grid(imgs, rows, cols):
assert len(imgs) == rows*cols
w, h = imgs[0].size
grid = [Link]('RGB', size = (cols*w,
rows * w))
grid_w, grid_h = [Link]
for i, img in enumerate(imgs):
[Link](img, box = (i%cols*w, i // cols*h))
return grid
Next, we use PyTorch’s to method to move our pipeline to the GPU, which speed
up the training and inference of neural networks.
# Moving pipeline to GPU
pipeline = [Link]('cuda'
Now, we can finally use Stable Diffusion to generate images from text!
In the code below, n_images is used to define how many images will be generated.
Whereas prompt is the text that is going to be used to generate the images we would
like to generate.
n_images = 6 # Let's generate 6 images based on the prompt below
prompt = ['Sunset on a beach'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Sunset on a beach
n_images = 6
prompt = ['Portrait of Napoleon Bonaparte'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Portrait of Napoleon Bonaparte
n_images = 6
prompt = ['Skyline of a cyberpunk megalopolis'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Skyline of a cyberpunk megalopolis
n_images = 6
prompt = ['Painting of a woman in the style of Van Gogh'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Painting of a woman in the style of Van Gogh
n_images = 6
prompt = ['Picture of an astronaut in space'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Picture of an astronaut in space
n_images = 6
prompt = ['Renaissance marble bust sculpture'] * n_images
images = pipeline(prompt).images
grid = image_grid(images, rows=2, cols = 3)
grid
Renaissance marble bust sculpture
Thank you for reading,
Luís Fernando Torres
LinkedIn
Kaggle
Stable Diffusion Python Text To Image Generation Deep Learning
Neural Networks
Edit profile
Written by Luís Fernando Torres
69 Followers
Data Scientist | Machine Learning | Commodities Trader & Investor
More from Luís Fernando Torres
Luís Fernando Torres
Introduction to Quant Investing with Python
Introduction
15 min read · Mar 30
31
Luís Fernando Torres
The Science of Smart Investing: Portfolio Evaluation with Python
This is the second part of my series on Quant Investing with Python. If you haven’t had the
chance to read the first part, I highly…
11 min read · Mar 30
Luís Fernando Torres in LatinXinAI
S&P500 Volatility: ARCH vs GARCH Models
Deciding the ideal model for volatility forecasting
11 min read · Apr 17
Luís Fernando Torres
Telco Churn Prediction With Machine Learning
Introduction
14 min read · Mar 30
17
See all from Luís Fernando Torres
Recommended from Medium
Ng Wai Foong in Better Programming
The Beginner’s Guide to Unconditional Image Generation Using
Diffusers
Explore and generate unique and imaginative images based on existing datasets
· 8 min read · Jan 31
74 1
Mason McGough in Towards Data Science
Stable Diffusion as an API
Remove people from photos with a Stable Diffusion microservice
· 12 min read · Feb 4
63
Umberto Grando
Updating AUTOMATIC1111/stable-diffusion-webui to Torch 2 for
amazing performance
Improve your image generation speed by 20%!
· 4 min read · Mar 19
28 3
Youssef Hosni in Towards AI
Getting Started With Stable Diffusion
Stable Diffusion is a text-to-image latent diffusion model created by researchers and
engineers from CompVis, Stability AI, and LAION. It’s…
· 12 min read · Nov 10, 2022
660 1
Jim Clyde Monge in Geek Culture
A Simple Way To Run Stable Diffusion 2.0 Locally On Your PC — No Code
Guide
An easy and no-code guide on how to run Stable Diffusion 2.0 on local PC with Web UI.
· 3 min read · Dec 1, 2022
178 9
Ng Wai Foong in Towards Data Science
How to Fine-tune Stable Diffusion using Dreambooth
Personalized generated images with custom styles or objects
· 10 min read · Nov 15, 2022
113 6
See more recommendations