0% found this document useful (0 votes)
188 views19 pages

Text-to-Image Generation with Python

Uploaded by

maxwell.amaral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views19 pages

Text-to-Image Generation with Python

Uploaded by

maxwell.amaral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Open in app

Text-to-Image with Stable Diffusion


How to easily generate images from text using Stable Diffusion on any Python
environment

Luís Fernando Torres


6 min read · 18 hours ago

Listen Share More

Image generated using Stable Diffusion

Introduction
Stable Diffusion is a text-to-image model trained on 512x512 images from a subset
of the LAION-5B dataset.

The goal of this notebook is to demonstrate how easily you can implement text-to-
image generation using the 🤗 Diffusers library, which is the go-to library for
state-of-the-art pretrained diffusion models for generating images, audio, and 3D
structures.

Before jumping in to the coding, however, we need to know what exactly is Stable
Diffusion.

What is Stable Diffusion?

Architecture of the Stable Diffusion

Stable Diffusion is based on a type of diffusion model that is called Latent


Diffusion, which details can be seen in the paper High-Resolution Image Synthesis
with Latent Diffusion Models.

Diffusion models are a type of generative model that are trained to denoise an
object, such as an image, to obtain a sample of interest. The model is trained to
slightly denoise the image in each step, until a sample is obtained. This process can
be seen below:

Image denoising process


These diffusion models have gained popularity in recent years, specially for their
ability to achieve state-of-the-art results in generating image data. However,
diffusion models can consume a lot of memory and be computationally expensive
to work with.

Latent Diffusion, on the other side, reduces complexity and memory usage by
applying the diffusion process over a lower dimensional latent space. In latent
diffusion, the model is trained to generate compressed representations of images.

There are three main components in latent diffusion.

1. Autoencoder (VAE).

2. U-Net.

3. A text-encoder.

Autoencoder (VAE)

The Variational Autoencoder architecture

The Variational Autoencoder (VAE) is a model that consists of both an encoder and
a decoder. While the encoder is used to convert the image into a low dimensional
latent representation to serve an input to the U-Net model, the decoder transforms
the latent representation back into an image.

U-Net
The U-Net architecture

The U-Net is a convolutional neural network that is widely used in image


segmentation tasks. It also has an encoder and a decoder, both comprised of
ResNet blocks. The encoder compresses an image into a lower resolution image,
while the decoder decodes this lower resolution image back to the original higher
resolution, that is supposed to be less noisy.

Text Encoder

How does a text encoder work?

The text-encoder is responsible for transforming the text input prompt into an
embedding space that can be understood by the U-Net. It is usually a simple
transformer-based encoder that maps a sequence of input tokens to a sequence of
latent text-embeddings.

Stable Diffusion Pipeline with 🤗 Diffusers


The StableDiffusionPipeline is a pipeline created by the 🤗 Diffusers library that
allows us to generate images from text with just a few lines of code in Python. It
has many versions and checkpoints available, which you can take a loot at by
visiting the Text-to-Image Generation page of the library documentation.

For this notebook, we are going to use the Stable Diffusion version 1.4
(CompVis/stable-diffusion-v1-4). We are also using torch_dtype = torch.float16 to
load the fp16 weights, this is helpful to reduce the cost of memory used.

Let's start our project by installing the diffusers library.

# Installing diffusers library


!pip install diffusers

We may now import all relevant libraries

# Library imports

# Importing PyTorch library, for building and training neural networks


import torch

# Importing StableDiffusionPipeline to use pre-trained Stable Diffusion models


from diffusers import StableDiffusionPipeline

# Image is a class for the PIL module to visualize images in a Python Notebook
from PIL import Image

Let’s create an instance of the pipeline.

The .from_pretrained("CompVis/stable-diffusion-v1-4") will initialize the diffusion


model with pretrained weights and settings, as well a pretrained VAE, U-Net, and
text encoder to generate images from text.
The torch_dtype = torch.float16 sets the datatype of the model to float16, a lower-
precision floating-point format, to help speed up inference.

# Creating pipeline
pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v
torch_dtype=torch.float16)

Now, we may define a function that is going to create and display a grid of images
generated with Stable Diffusion.

# Defining function for the creation of a grid of images


def image_grid(imgs, rows, cols):
assert len(imgs) == rows*cols

w, h = imgs[0].size
grid = [Link]('RGB', size = (cols*w,
rows * w))
grid_w, grid_h = [Link]

for i, img in enumerate(imgs):


[Link](img, box = (i%cols*w, i // cols*h))
return grid

Next, we use PyTorch’s to method to move our pipeline to the GPU, which speed
up the training and inference of neural networks.

# Moving pipeline to GPU


pipeline = [Link]('cuda'

Now, we can finally use Stable Diffusion to generate images from text!

In the code below, n_images is used to define how many images will be generated.
Whereas prompt is the text that is going to be used to generate the images we would
like to generate.
n_images = 6 # Let's generate 6 images based on the prompt below
prompt = ['Sunset on a beach'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid

Sunset on a beach

n_images = 6
prompt = ['Portrait of Napoleon Bonaparte'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid
Portrait of Napoleon Bonaparte

n_images = 6
prompt = ['Skyline of a cyberpunk megalopolis'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid
Skyline of a cyberpunk megalopolis

n_images = 6
prompt = ['Painting of a woman in the style of Van Gogh'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid
Painting of a woman in the style of Van Gogh

n_images = 6
prompt = ['Picture of an astronaut in space'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid
Picture of an astronaut in space

n_images = 6
prompt = ['Renaissance marble bust sculpture'] * n_images

images = pipeline(prompt).images

grid = image_grid(images, rows=2, cols = 3)


grid
Renaissance marble bust sculpture

Thank you for reading,

Luís Fernando Torres

LinkedIn

Kaggle

Stable Diffusion Python Text To Image Generation Deep Learning

Neural Networks
Edit profile

Written by Luís Fernando Torres


69 Followers

Data Scientist | Machine Learning | Commodities Trader & Investor

More from Luís Fernando Torres

Luís Fernando Torres

Introduction to Quant Investing with Python


Introduction

15 min read · Mar 30

31
Luís Fernando Torres

The Science of Smart Investing: Portfolio Evaluation with Python


This is the second part of my series on Quant Investing with Python. If you haven’t had the
chance to read the first part, I highly…

11 min read · Mar 30

Luís Fernando Torres in LatinXinAI


S&P500 Volatility: ARCH vs GARCH Models
Deciding the ideal model for volatility forecasting

11 min read · Apr 17

Luís Fernando Torres

Telco Churn Prediction With Machine Learning


Introduction

14 min read · Mar 30

17

See all from Luís Fernando Torres

Recommended from Medium


Ng Wai Foong in Better Programming

The Beginner’s Guide to Unconditional Image Generation Using


Diffusers
Explore and generate unique and imaginative images based on existing datasets

· 8 min read · Jan 31

74 1

Mason McGough in Towards Data Science


Stable Diffusion as an API
Remove people from photos with a Stable Diffusion microservice

· 12 min read · Feb 4

63

Umberto Grando

Updating AUTOMATIC1111/stable-diffusion-webui to Torch 2 for


amazing performance
Improve your image generation speed by 20%!

· 4 min read · Mar 19

28 3
Youssef Hosni in Towards AI

Getting Started With Stable Diffusion


Stable Diffusion is a text-to-image latent diffusion model created by researchers and
engineers from CompVis, Stability AI, and LAION. It’s…

· 12 min read · Nov 10, 2022

660 1

Jim Clyde Monge in Geek Culture


A Simple Way To Run Stable Diffusion 2.0 Locally On Your PC — No Code
Guide
An easy and no-code guide on how to run Stable Diffusion 2.0 on local PC with Web UI.

· 3 min read · Dec 1, 2022

178 9

Ng Wai Foong in Towards Data Science

How to Fine-tune Stable Diffusion using Dreambooth


Personalized generated images with custom styles or objects

· 10 min read · Nov 15, 2022

113 6

See more recommendations

You might also like