0% found this document useful (0 votes)

46 views44 pages

Understanding Variational Autoencoders

This document discusses Deep Generative Models, particularly Variational Autoencoders (VAEs), which are designed to learn continuous latent spaces for effective generative modeling. It explains the architecture of VAEs, their advantages over standard autoencoders, and the importance of regularization to ensure continuity and completeness in the latent space. The document also highlights the significance of KL divergence in optimizing the encoding process, enabling the generation of realistic outputs and smooth interpolation between data points.

Uploaded by

Arnab Seal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views44 pages

Understanding Variational Autoencoders

Uploaded by

Arnab Seal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Generative Models - 1

Variational Autoencoders (VAEs)

The machine learning framework

y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples {(x1,y1), …,

(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)

■Slide credit: L. Lazebnik

Structured Learning
Machine learning is to find a function f

Regression: output a scalar

Classification: output a “class” (one-hot vector)
1 0 0 0 1 0 0 0 1
Class 1 Class 2 Class 3
Structured Learning/Prediction: output a
sequence, a matrix, a graph, a tree ……
Output is composed of components with dependency
Output Sequence
Machine Translation
“機器學習及其深層與 “Machine learning and having
結構化” it deep and structured”
(sentence of language 1) (sentence of language 2)
Speech Recognition

感謝大家來上課”
(speech) (transcription
Chat-bot )

“How are you?” “I’m fine.”

(what a user says) (response of machine)
Output Matrix
Image to Image Colorization:

Ref: [Link]

Text to Image

“this white and yellow flower

have thin white petals and a
round yellow stamen”
ref: [Link]
What does Deep Learning (DL) offer?
A machine learning subfield is the learning of representations in data.
Deep learning algorithms attempt to learn (multiple levels of) representation by using a hierarchy of
multiple layers.
If you provide the system tons of information, it begins to understand it and respond in useful ways.

[Link]
Deep Learning in One Slide (Review)
Fully connected feedforward network
Many kinds of
Convolutional neural network (CNN)
network
structures: Recurrent neural network (RNN)
Different networks can take different kinds of input/output.

Vector

Matrix Network
Vector Seq
(speech, video, sentence)

How to find Given the example inputs/outputs as

the function? training data: {(x1,y1),(x2,y2), ……,
(x1000,y1000)}
Predict the category/class of Example:
Predictive
an object by learning a model CNN
Two deep inference
models
Generate a new object that closely
Generative matches to those in a limited
training set by closely Example:
approximating the underlying (but VAE, GAN
unknown) probability
distribution of the set of objects.

8
So what are these Deep Nets in a nutshell?.....

News Feature: What are the limits of deep learning?

9 M. Mitchell Waldrop, Proceedings of the National Academy of Sciences
Jan 2019, 116 (4) 1074-1077; DOI: 10.1073/pnas.1821594116
The 2012 Breakthrough in Predictive Learning….

• A Krizhevsky, I Sutskever, and GE Hinton, “Imagenet classification with

deep convolutional neural networks”, Advances in neural information
processing systems (NIPS 2012), 1097-1105

Has 42,000+ citations by now!!

Reduced the error rate on imagenet under LSVRC

to 16%

Later evolved into Alexnet.

10
What happens inside a Convolutional Net….?

Liu et al., A survey of deep neural network architectures and their applications.
11 Neurocomputing 234: 11-26 (2017)
The standard autoencoder: An autoencoder network is actually a pair of two connected networks, an encoder
and a decoder. An encoder network takes in an input, and converts it into a smaller, dense representation,
which the decoder network can use to convert it back to the original input.

The convolutional layers of any CNN take in a large image (eg. rank 3 tensor of size 299x299x3),
and convert it to a much more compact, dense representation (eg. rank 1 tensor of size 1000). This
dense representation is then used by the fully connected classifier network to classify the image.
The encoder is similar, it is simply is a network that takes in an input and produces a much smaller
representation (the encoding), that contains enough information for the next part of the network to process it
into the desired output format. Typically, the encoder is trained together with the other parts of the network,
optimized via back-propagation, to produce encodings specifically useful for the task at hand. In CNNs, the
1000-dimensional encodings produced are such that they’re specifically useful for classification.

Autoencoders take this idea, and slightly flip it on its head, by making the encoder generate encodings
specifically useful for reconstructing its own input.
A simple neural network based autoencoder architecture:
Because neural networks are capable of learning nonlinear relationships, this can be thought of as a more
powerful (nonlinear) generalization of PCA. Whereas PCA attempts to discover a lower dimensional hyperplane
which describes the original data, autoencoders are capable of learning nonlinear manifolds (a manifold is
defined in simple terms as a continuous, non-intersecting surface). The difference between these two approaches
is visualized below.
The problem with standard autoencoders:

Standard autoencoders learn to generate compact

representations and reconstruct their inputs well, but asides from
a few applications like denoising autoencoders, they are fairly
limited.

The fundamental problem with autoencoders, for generation, is

that the latent space they convert their inputs to and where their
encoded vectors lie, may not be continuous, or allow easy
interpolation.
For example, training an autoencoder on the
MNIST dataset, and visualizing the encodings
from a 2D latent space reveals the formation of
distinct clusters. This makes sense, as distinct
encodings for each image type makes it far
easier for the decoder to decode them. This is
fine if you’re just replicating the same images.
But when you’re building a generative model,
you don’t want to prepare to replicate the
same image you put in. You want to randomly
sample from the latent space, or generate
variations on an input image, from a
continuous latent space.
If the space has discontinuities (eg. gaps
between clusters) and you sample/generate a
variation from there, the decoder will simply
generate an unrealistic output, because the
decoder has no idea how to deal with that
region of the latent space. During training,
it never saw encoded vectors coming from that
region of latent space.
Let’s suppose we've trained an autoencoder model on a large dataset of faces with a encoding dimension of 6.
An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is
wearing glasses, etc. in an attempt to describe an observation in some compressed representation.

Slides made from Jeremy Jordon’s blog

In the last example, we've described the input image in terms of its latent attributes using a single value to
describe each attribute. However, we may prefer to represent each latent attribute as a range of possible
values. For instance, what single value would you assign for the smile attribute if you feed in a photo of the
Mona Lisa? Using a variational autoencoder, we can describe latent attributes in probabilistic terms.
With this approach, we'll now represent each latent attribute for a given input as a probability distribution.
When decoding from the latent state, we'll randomly sample from each latent state distribution to generate a
vector as input for our decoder model.
By constructing our encoder model to output a range of possible values (a statistical distribution) from which
we'll randomly sample to feed into our decoder model, we're essentially enforcing a continuous, smooth
latent space representation. For any sampling of the latent distributions, we're expecting our decoder model
to be able to accurately reconstruct the input. Thus, values which are nearby to one another in latent space should
correspond with very similar reconstructions.
Variational Autoencoders (VAEs) have one fundamentally
unique property that separates them from vanilla
autoencoders, and it is this property that makes them so
useful for generative modeling: their latent spaces are, by
design, continuous, allowing easy random sampling and
interpolation.

It achieves this by doing something that seems rather

surprising at first: making its encoder not output an
encoding vector of size n, rather, outputting two vectors of
size n: a vector of means, μ, and another vector of
standard deviations, σ.

[Link]
They form the parameters of a vector of random variables of length n, with the i th element of μ and σ being
the mean and standard deviation of the i th random variable, X i, from which we sample, to obtain the
sampled encoding which we pass onward to the decoder:
This stochastic generation means, that even for the same input, while the mean and standard deviations
remain the same, the actual encoding will somewhat vary on every single pass simply due to sampling
• Intuitively, the mean vector controls where the encoding of an input should be centered around, while the
standard deviation controls the “area”, how much from the mean the encoding can vary.

• As encodings are generated at random from anywhere inside the “circle” (the distribution), the decoder
learns that not only is a single point in latent space referring to a sample of that class, but all nearby points
refer to the same as well.

• This allows the decoder to not just decode single, specific encodings in the latent space (leaving the
decodable latent space discontinuous), but ones that slightly vary too, as the decoder is exposed to a range
of variations of the encoding of the same input during training.

• The model is now exposed to a certain degree of local variation by varying the encoding of one
sample, resulting in smooth latent spaces on a local scale, that is, for similar samples.

• Ideally, we want overlap between samples that are not very similar too, in order to
interpolate between classes.
The regularity that is expected from the latent space in order to make generative process possible can be
expressed through two main properties: continuity (two close points in the latent space should not give
two completely different contents once decoded) and completeness (for a chosen distribution, a point
sampled from the latent space should give “meaningful” content once decoded).
The only fact that VAEs encode inputs as distributions instead of simple points is not sufficient to ensure
continuity and completeness. Without a well defined regularization term, the model can learn, in order to
minimize its reconstruction error, to “ignore” the fact that distributions are returned and behave almost
like classic autoencoders (leading to overfitting). To do so, the encoder can either return distributions with
tiny variances (that would tend to be punctual distributions) or return distributions with very different means
(that would then be really far apart from each other in the latent space). In both cases, distributions are used
the wrong way (cancelling the expected benefit) and continuity and/or completeness are not satisfied.

So, in order to avoid these effects we have to regularize both the covariance matrix and the mean of the
distributions returned by the encoder. In practice, this regularization is done by enforcing distributions to be
close to a standard normal distribution (centred and reduced). This way, we require the covariance matrices to
be close to the identity, preventing punctual distributions, and the mean to be close to 0, preventing encoded
distributions to be too far apart from each others.
Regularization tends to create a “gradient” over the
information encoded in the latent space.
However, since there are no limits on what values vectors μ and σ can take on, the encoder can learn to
generate very different μ for different classes, clustering them apart, and minimize σ, making sure the
encodings themselves don’t vary much for the same sample (that is, less uncertainty for the decoder). This
allows the decoder to efficiently reconstruct the training data.
What we ideally want are encodings, all of which are as close as
possible to each other while still being distinct, allowing smooth
interpolation, and enabling the construction of new samples.

In order to force this, we introduce the Kullback–Leibler

divergence (KL divergence) into the loss function. The KL
divergence between two probability distributions simply measures
how much they diverge from each other. Minimizing the KL
divergence here means optimizing the probability distribution
parameters (μ and σ) to closely resemble that of the target
distribution.
For discrete case:

For continuous case:

Homework:
Intuitively, this loss encourages the
encoder to distribute all encodings
(for all types of inputs, eg. all MNIST
numbers), evenly around the center
of the latent space. If it tries to
“cheat” by clustering them apart
into specific regions, away from
the origin, it will be penalized.

Now, using purely KL loss results in

a latent space results in encodings
densely placed randomly, near the
center of the latent space, with
little regard for similarity among
nearby encodings. The decoder
finds it impossible to decode
anything meaningful from this space,
simply because there really isn’t any
meaning.
Optimizing the two
together, however, results
in the generation of a
latent space which
maintains the similarity of
nearby encodings on
the local scale via
clustering, yet globally, is
very densely packed near
the latent space origin
(compare the axes with
the original).
Intuitively, this is the equilibrium reached
by the cluster-forming nature of the
reconstruction loss, and the dense
packing nature of the KL loss, forming
distinct clusters the decoder can decode.
This is great, as it means when randomly
generating, if you sample a vector from the
same prior distribution of the encoded
vectors, N(0, I), the decoder will
successfully decode it. And if you’re
interpolating, there are no sudden gaps
between clusters, but a smooth mix of
features a decoder can understand.
For latent space
visualizations, we can
train a VAE with 2-D latent
variables (though this
space is generally too
small for the intrinsic
dimensionality of
real-world data). Picturing
this compressed latent
space lets us see how the
model has disentangled
complex raw data into
abstract higher-order
features.

This is how the

encoder/inference network
learns to map the training
set from the input data
space to the latent
space…

[Link]
…and this is how the decoder/generative network learns to map latent coordinates into reconstructions of the
original data space:

Here we are sampling

evenly-spaced percentiles along
the latent manifold and plotting
their corresponding output from the
decoder, with the same axis labels
as above.
❑ This tableau highlights the overall smoothness of the latent manifold—and how any “unrealistic”
outputs from the generative decoder correspond to apparent discontinuities in the variational
posterior of the encoder (e.g. between the “7-space” and the “1-space”). These gaps could
probably be improved by experimenting with model hyperparameters.

❑ Whereas the original data dotted a sparse landscape in 784 dimensions, where “realistic” images
were few and far between, this 2-dimensional latent manifold is densely populated with such
samples. Beyond its inherent visual coolness, latent space smoothness shows the model’s ability to
leverage its “understanding” of the underlying data-generating process to generalize beyond the
training set.

❑ Smooth interpolation within and between digits—in contrast to the spotty latent space characteristic
of many autoencoders—is a direct result of the variational regularization intrinsic to VAEs.
When using generative models, you could simply want to generate a random, new output, that looks similar to
the training data, and you can certainly do that too with VAEs. But more often, you’d like to alter, or explore
variations on data you already have, and not just in a random way either, but in a desired, specific direction. This
is where VAEs work better than any other method currently available.

And that’s
it…..

Understanding Autoencoders and VAEs
No ratings yet
Understanding Autoencoders and VAEs
11 pages
Understanding Autoencoders and VAEs
100% (1)
Understanding Autoencoders and VAEs
22 pages
Autoencoder Architecture for Image Denoising
No ratings yet
Autoencoder Architecture for Image Denoising
11 pages
VAE Applications in Image Generation
No ratings yet
VAE Applications in Image Generation
10 pages
Autoencoders and Generative Models Overview
No ratings yet
Autoencoders and Generative Models Overview
17 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
37 pages
Understanding Autoencoders and VAEs
No ratings yet
Understanding Autoencoders and VAEs
19 pages
Understanding Autoencoders and VAEs
No ratings yet
Understanding Autoencoders and VAEs
21 pages
Understanding Autoencoders in ML
No ratings yet
Understanding Autoencoders in ML
65 pages
VAE vs GAN: Key Differences Explained
100% (1)
VAE vs GAN: Key Differences Explained
3 pages
Generative Models: Understanding VAEs
No ratings yet
Generative Models: Understanding VAEs
10 pages
Deep Learning: Autoencoders & GANs
No ratings yet
Deep Learning: Autoencoders & GANs
60 pages
Deep Learning: Autoencoders & GANs
No ratings yet
Deep Learning: Autoencoders & GANs
22 pages
Autoencoders and Generative Models Overview
No ratings yet
Autoencoders and Generative Models Overview
46 pages
VAEs and GANs in Deep Learning
No ratings yet
VAEs and GANs in Deep Learning
46 pages
Understanding Autoencoders in Deep Learning
No ratings yet
Understanding Autoencoders in Deep Learning
16 pages
Variational Autoencoders Overview
No ratings yet
Variational Autoencoders Overview
9 pages
Understanding Deep Generative Models
No ratings yet
Understanding Deep Generative Models
5 pages
Unit - 2 - GenAI - Final Notes - KR23
No ratings yet
Unit - 2 - GenAI - Final Notes - KR23
39 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
6 pages
Variational Autoencoders Explained
No ratings yet
Variational Autoencoders Explained
136 pages
Auto Encoder S
No ratings yet
Auto Encoder S
12 pages
Auto Encoder
No ratings yet
Auto Encoder
3 pages
Autoencoders and Generative Models Overview
No ratings yet
Autoencoders and Generative Models Overview
31 pages
Overview of Autoencoders in ML
No ratings yet
Overview of Autoencoders in ML
11 pages
Overview of Neural Network Architectures
No ratings yet
Overview of Neural Network Architectures
4 pages
Understanding Autoencoders in Deep Learning
No ratings yet
Understanding Autoencoders in Deep Learning
248 pages
Recurrent Networks and Autoencoders
No ratings yet
Recurrent Networks and Autoencoders
53 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
29 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
57 pages
Understanding Generative AI Models
No ratings yet
Understanding Generative AI Models
36 pages
Deep Generative Models Overview
No ratings yet
Deep Generative Models Overview
49 pages
Understanding Representation Learning
No ratings yet
Understanding Representation Learning
50 pages
LSTM Variational Autoencoders for Anomaly Detection
No ratings yet
LSTM Variational Autoencoders for Anomaly Detection
46 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
5 pages
Understanding Autoencoders and VAEs
No ratings yet
Understanding Autoencoders and VAEs
9 pages
VAE: Neural Networks for Image Generation
No ratings yet
VAE: Neural Networks for Image Generation
26 pages
Variational Autoencoders Explained
No ratings yet
Variational Autoencoders Explained
14 pages
Understanding Autoencoders in ML
No ratings yet
Understanding Autoencoders in ML
25 pages
Understanding Autoencoders: Functions & Uses
No ratings yet
Understanding Autoencoders: Functions & Uses
4 pages
Autoencoders: Long-Term Dependency Optimization
No ratings yet
Autoencoders: Long-Term Dependency Optimization
3 pages
Autoencoders and Generative Models Overview
No ratings yet
Autoencoders and Generative Models Overview
19 pages
Variational Autoencoders Explained
No ratings yet
Variational Autoencoders Explained
4 pages
Autoencoders and Their Applications
No ratings yet
Autoencoders and Their Applications
25 pages
Understanding CNNs, LSTMs, and VAEs
No ratings yet
Understanding CNNs, LSTMs, and VAEs
15 pages
UNIT - V Deep Learning
No ratings yet
UNIT - V Deep Learning
19 pages
Understanding Autoencoders and GANs
No ratings yet
Understanding Autoencoders and GANs
29 pages
Autoencoders and Generative Models Explained
No ratings yet
Autoencoders and Generative Models Explained
4 pages
Variational Autoencoders and GANs Explained
No ratings yet
Variational Autoencoders and GANs Explained
10 pages
Module IICore Generative Model Families
No ratings yet
Module IICore Generative Model Families
23 pages
Understanding Autoencoders in Deep Learning
No ratings yet
Understanding Autoencoders in Deep Learning
5 pages
Autoencoders and GANs: Learning Representations
No ratings yet
Autoencoders and GANs: Learning Representations
23 pages
Generative vs Discriminative Modeling Explained
No ratings yet
Generative vs Discriminative Modeling Explained
6 pages
DL 2
No ratings yet
DL 2
30 pages
Understanding Autoencoders in AI
No ratings yet
Understanding Autoencoders in AI
13 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
18 pages
VAE Performance in Small Data Learning
No ratings yet
VAE Performance in Small Data Learning
7 pages
Deep Generative Models Overview
No ratings yet
Deep Generative Models Overview
11 pages
Electropneumatics Lesson Plan Guide
No ratings yet
Electropneumatics Lesson Plan Guide
5 pages
Installation Method for DX Split Units
No ratings yet
Installation Method for DX Split Units
29 pages
Faculty Positions at Thapar Institute
No ratings yet
Faculty Positions at Thapar Institute
1 page
Student Report Card Management System
100% (1)
Student Report Card Management System
37 pages
Basic Chemistry Concepts and Calculations
No ratings yet
Basic Chemistry Concepts and Calculations
2 pages
Understanding Geography: Definitions and Themes
No ratings yet
Understanding Geography: Definitions and Themes
53 pages
Literature Review and Citation Guide
No ratings yet
Literature Review and Citation Guide
3 pages
Q3 LE Science 4 Lesson 2 Week 2
No ratings yet
Q3 LE Science 4 Lesson 2 Week 2
16 pages
CE 412: Structural Design Overview
No ratings yet
CE 412: Structural Design Overview
137 pages
SigmaPlot 12 User's Guide Overview
No ratings yet
SigmaPlot 12 User's Guide Overview
781 pages
MELSERVO-J4 Troubleshooting Manual
No ratings yet
MELSERVO-J4 Troubleshooting Manual
174 pages
Array Searching: Linear & Binary Methods
No ratings yet
Array Searching: Linear & Binary Methods
12 pages
Urban Water Infrastructure Asset Management Plan
No ratings yet
Urban Water Infrastructure Asset Management Plan
9 pages
Data Structures Lab: Search & Linked Lists
No ratings yet
Data Structures Lab: Search & Linked Lists
12 pages
Why Java Rejects Multiple Inheritance
No ratings yet
Why Java Rejects Multiple Inheritance
10 pages
Advances in Tissue Engineering
No ratings yet
Advances in Tissue Engineering
7 pages
Microphone Placement for Guitar Recording
100% (1)
Microphone Placement for Guitar Recording
4 pages
Experimental Design Lecture Notes
No ratings yet
Experimental Design Lecture Notes
73 pages
Painting Works Inspection Plan
No ratings yet
Painting Works Inspection Plan
5 pages
Ieee Paper
No ratings yet
Ieee Paper
6 pages
Jimex Invoice for Toyota Harrier Shipment
No ratings yet
Jimex Invoice for Toyota Harrier Shipment
1 page
3x6 mm² Cu/XLPE/SWA/PVC Cable Details
No ratings yet
3x6 mm² Cu/XLPE/SWA/PVC Cable Details
1 page
IRFP450 Power Amplifier Setup Guide
100% (1)
IRFP450 Power Amplifier Setup Guide
10 pages
Eastern Region Fall CDE Details 2024
No ratings yet
Eastern Region Fall CDE Details 2024
5 pages
Sony HBD Tz130
No ratings yet
Sony HBD Tz130
28 pages
Language Acquisition Overview
No ratings yet
Language Acquisition Overview
19 pages
Fall 2018 Midterm Exam Schedule
No ratings yet
Fall 2018 Midterm Exam Schedule
4 pages
Industrial Valve Manufacturing Leader
No ratings yet
Industrial Valve Manufacturing Leader
23 pages
Profile of Dr. Chetan Jawale
No ratings yet
Profile of Dr. Chetan Jawale
15 pages
Impact of Technology on Teens' Learning
No ratings yet
Impact of Technology on Teens' Learning
6 pages