0% found this document useful (0 votes)
109 views67 pages

GANs: Concepts and Training Methods

Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks compete against each other in a game. One network generates new data instances, while the other evaluates them for authenticity. They are trained using an adversarial process where the generating network incorporates feedback from the discriminating network to improve quality of new instances, while the discriminating network is improving at detecting fakes from reals. GANs have been used to generate highly realistic images, videos, text and more.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views67 pages

GANs: Concepts and Training Methods

Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks compete against each other in a game. One network generates new data instances, while the other evaluates them for authenticity. They are trained using an adversarial process where the generating network incorporates feedback from the discriminating network to improve quality of new instances, while the discriminating network is improving at detecting fakes from reals. GANs have been used to generate highly realistic images, videos, text and more.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ONE WEEK ATAL FDP

r
on
a
j Convolutional Neural Networks with Generative Adversarial
k Networks
u
m
a
r Conceptual View of Generative Adversarial
s
Networks
@
v
i
13th December 2023 (Wednesday)
t
.
a Organized by
c
Department of Information Technology
.
i
Sir C R Reddy College of Engineering,
n Eluru, Andhra Pradesh, India.

[Link]/view/rajkumars1987
Dr. Rajkumar S
School of Computer Science and Engineering
Vellore Institute of Technology (Vellore)
Tamil Nadu, India – 632 014.
rajkumarsrajkumar@[Link]
Presentation Outline

Introduction to GANs

Some Challenges with GANs

Applications of GAN

Advanced GAN Extensions

Demo
Introduction to GAN

r
a
j
k
u
m
a “This (GANS), and the variations that are now being
r proposed is the most interesting idea in the last 10 years in
s
@ ML, in my opinion”
v
i
t
. –Yann LeCun
a
c
.
i
n
Introduction to GAN

r
a
j  GAN was first introduced by Ian Goodfellow et al in 2014
k
u  Have been used in generating images, videos, poems,
m some simple conversation.
a
r  Note, image processing is easy (all animals can do it), NLP
s
@
is hard (only human can do it)
v
i
t
.
a
c Ian Goodfellow:
. [Link]
i Radford, (generate voices also here)
n
[Link]
Tips for training GAN: [Link]
WHAT ARE GANS?

r
a
j › Generative Adversarial Networks
k
u
m
a
r
s
@
v
i
t
.
a
c
.
i
n
WHAT ARE GANS?

r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
Generative Models
@
v
We try to learn the underlying the distribution
i from which our dataset comes from.
t
. Eg:VariationalAutoEncoders(VAE)
a
c
.
i
n
WHAT ARE GANS?

r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
Generative Models
@
v
We try to learn the underlying the distribution
i from which our dataset comes from.
t
. Eg:VariationalAutoEncoders(VAE)
a
c
.
i
n
WHAT ARE GANS?

r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
AdversarialTraining
@
v
GANS are made up of two
i competing networks
t
. (adversaries) that are trying beat
a
c
each other.
.
i
n
WHAT ARE GANS?

r
a › Generative Adversarial Networks
j
k
u
m
a
r Neural Networks
s
@
v
i
t
.
a
c
.
i
n
WHAT ARE GANS?

r
a › Generative Adversarial Networks
j
k
u
m
a
r Neural Networks
s
@
v
i
t
.
a
c
.
i
n
W HATAREGANS?

Generative Adversarial Networks

Generative Models Neural Networks


We try to learn the underlying the distribution
from which our dataset comes from.
Eg:Variational AutoEncoders (VAE)

Adversarial Training
GANS are made up of two competing networks (adversaries) that
are trying beat each other.
W HATAREGANS?

Generated
P(z) Generator
Data
W HATAREGANS?

Generated
P(z) Generator
Data

Discriminator Real/Fake?
W HATAREGANS?

Generated
P(z) Generator
Data

Discriminator Real/Fake?

Real
Data
HOW TOTRAINA GAN?
HOW TOTRAINA GAN?

At t = 0,

Latent Generated (fake image)


Generator
Vector Image

(fake data) Generated


Data

Discriminator Real/Fake?
Given
(Real data)
Trainin
g Data
HOW TOTRAINA GAN?

At t = 0,

Latent Generated (fake image)


Generator
Vector Image

Binary
Classifier
(fake data) Generated
Data

Discriminator Real/Fake?
Given
(Real data)
Trainin
g Data
HOW TO TRAIN A GAN?
Which network should I train first?
HOW TOTRAINA GAN?

Which network should I train first?


Discriminator!
HOW TOTRAINA GAN?
› Which network should I train first?
› Discriminator!

› But with what training data?


HOW TOTRAINA GAN?

Which network should I train first?


Discriminator!

But with what training data?


The Discriminator is a Binary classifier.
The Discriminator has two class - Real and Fake.
The data for Real class if already given:THETRAINING DATA
The data for Fake class?-> generate from the Generator
HOW TOTRAINA GAN?

What’s next?-> Train the Generator

But how?What’s our training objective?


HOW TOTRAINA GAN?

What’s next?-> Train the Generator

But how?What’s our training objective?


Generate images from the Generator
such that they are classified incorrectly by the Discriminator!
HOW TOTRAINA GAN?

Discriminator

Step 1:
Train the Discriminator
using the current ability
of the Generator.
HOW TOTRAINA GAN?

Discriminator Generator

Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.
HOW TOTRAINA GAN?

Discriminator Generator

Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.

Generate images from the Generator


such that they are classified incorrectly by the Discriminator!
HOW TOTRAINA GAN?

Discriminator Generator

Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.
Why Generative Models?

• We’ve only seen discriminative models so far


• Given an image X, predict a label Y
• Estimates P(Y|X)

• Discriminative models have several key limitations


• Can’t model P(X), i.e. the probability of seeing a certain image
• Thus, can’t sample from P(X), i.e. can’t generate new images

• Generative models (in general) cope with all of above


• Can model P(X)
• Can generate new images
Magic of GANs…

Lotter, William, Gabriel Kreiman,and David Cox. "Unsupervised learning of visual structure using predictive generativenetworks." arXiv preprint arXiv:1511.06380 (2015).
Magic of GANs…

Which one is Computer generated?

Ledig, Christian,et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprintarXiv:1609.04802 (2016).
Magic of GANs…

[Link]
Adversarial Training
• we saw:
• We can generate adversarial samples to fool a discriminative model
• We can use those adversarial samples to make models robust
• We then require more effort to generate adversarial samples
• Repeat this and we get better discriminative model

• GANs extend that idea to generative models:


• Generator: generate fake samples, tries to fool the Discriminator
• Discriminator: tries to distinguish between real and fake samples
• Train them against each other
• Repeat this and we get better Generator and Discriminator
GAN’s
Architecture

D D(x)

G
z
G(z)
D(G(z))

• Z is some random noise (Gaussian/Uniform).


• Z can be thought as the latent representation of the image.
[Link]
Training Discriminator

[Link]
Training Generator

[Link]
GAN’s formulation

min max 𝑉 𝐷, 𝐺
𝐺 𝐷

• It is formulated as a minimax game, where:


• The Discriminator is trying to maximize its reward 𝑽 𝑫, 𝑮
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)

𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)

• The Nash equilibrium of this particular game is achieved at:


• 𝑃𝑑𝑎𝑡𝑎𝑥 = 𝑃D𝑒𝑛 𝑥 ∀𝑥
1
• D 𝑥 = ∀𝑥
2
Discriminator
updates

Generator
updates
Vanishing gradient strikes back again…
min max 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)

𝛻𝜃𝘎 𝑉(𝐷, 𝐺) = 𝛻𝜃𝘎 D𝑧~𝑞(𝑧) log 1 − 𝐷 𝐺 𝑧

• 𝛻𝑎 log 1 − 𝜎 𝑎 = –𝛻 𝑎 T(𝑎) = –T 𝑎 1–T 𝑎


= −𝜎 𝑎 = −𝐷 𝐺 𝑧
1 – T(𝑎) 1 – T(𝑎)

• Gradient goes to 0 if 𝐷 is confident, i.e. 𝐷 𝐺 𝑧 →0

• Minimize −D𝑧~𝑞 𝑧 log 𝐷 𝐺 𝑧 for Generator instead (keep Discriminator as it is)


Faces

Goodfellow, Ian, et al. "Generativeadversarial nets." Advances in neural information processingsystems.2014.


DCGAN: Bedroom
images

Radford,Alec, Luke Metz, and Soumith Chintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs
(DCGANs)

Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions

• Use Batch Normalization after


each layer

• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer

Radford,Alec, Luke Metz, and Soumith Chintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Latent vectors capture
interesting patterns…

Radford,Alec, Luke Metz, and SoumithChintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Advantages of GANs
• Plenty of existing work on Deep Generative Models
• Boltzmann Machine
• Deep Belief Nets
• Variational AutoEncoders (VAE)

• Why GANs?
• Sampling (or generation) is straightforward.
• Training doesn't involve Maximum Likelihood estimation.
• Robust to Overfitting since Generator never sees the training data.
• Empirically, GANs are good at capturing the modes of the distribution.

Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Problems with GANs
• Probability Distribution is Implicit
• Not straightforward to compute P(X).
• Thus Vanilla GANs are only good for Sampling/Generation.

• Training is Hard
• Non-Convergence
• Mode-Collapse

Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Training Problems

• Non-Convergence
• Mode-Collapse
• Deep Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.

min
𝐺
𝐿𝐺

• GANs instead involve two (or more) players


• Discriminator is trying to maximize its reward.
• Generator is trying to minimize Discriminator’s reward.
min max 𝑉 𝐷, 𝐺
𝐺 𝐷

• SGD was not designed to find the Nash equilibrium of a game.


• Problem: We might not converge to the Nash equilibrium at all.

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Non-Convergence
min max 𝑉 𝑥, 𝑦
𝑥 𝑦
Let 𝑉 𝑥, 𝑦 = 𝑥𝑦

• State 1: x>0 y>0 V> 0 Increase y Decrease x

• State 2: x<0 y>0 V< 0 Decrease y Decrease x

• State 3: x<0 y<0 V> 0 Decrease y Increase x

• State 4 : x>0 y<0 V< 0 Increase y Increase x

• State 5: x>0 y>0 V> 0 == State 1 Increase y Decrease x


Mode-Collapse
• Generator fails to output diverse samples

Target

Expected

Output

Metz, Luke, et al. "Unrolled Generative AdversarialNetworks." arXiv preprint arXiv:1611.02163 (2016).
Some Solutions
• Mini-Batch GANs
• Supervision with labels
Basic (Heuristic) Solutions
• Mini-Batch GANs
• Supervision with labels
How to reward sample diversity?

• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.

• To address this problem,


• Let the Discriminator know about this edge-case.

• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake

• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch

• Feed those features to the discriminator along with the image

• Feature values will differ b/w diverse and non-diverse batches


• Thus, Discriminator will rely on those features for classification

• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Basic (Heuristic) Solutions
• Mini-Batch GANs
• Supervision with labels
Supervision with Labels
• Label information of the real data might help
Car

Dog

Real

D D Human
Fake
Fake

• Empirically generates much better samples

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Alternate view of GANs

min max 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)

𝐷∗ = 𝑎𝑟𝑔max 𝑉 𝐷, 𝐺 𝐺∗ = 𝑎𝑟𝑔m𝑖𝑛 𝑉 𝐷, 𝐺
𝐷 𝐺

• In this formulation, Discriminator’s strategy was 𝐷 𝑥 → 1, 𝐷 𝐺 𝑧 →0

• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0

𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 1 − 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 𝐷 𝐺(𝑧)

• In this new formulation, Discriminator’s strategy will be 𝐷 𝑥 → 0, 𝐷 𝐺 𝑧 →1

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If all we want to encode is 𝐷 𝑥 → 0, 𝐷 𝐺 𝑧 →1

𝐷∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝐷 D 𝑥∼𝑝(𝑥) log 1 − 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 𝐷 𝐺(𝑧)

We can use this 𝐷∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝐷 D𝑥∼𝑝 𝑥 log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺 𝑧

• Now, we can replace cross-entropy with any loss function (Hinge Loss)

𝐷∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝐷 D𝑥∼𝑝 𝑥 𝐷 𝑥 + D𝑧∼𝑞 𝑧 max 0, 𝑚 − 𝐷 𝐺 𝑧


• And thus, instead of outputting probabilities, Discriminator just has to output :-
• High values for fake samples
• Low values for real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs

• Modified game plans


• Generator will try to generate samples with
low values 𝐷 𝑥 = ||𝐷𝑒𝑐 𝐸𝑛𝑐 𝑥 − 𝑥||𝑀𝑆𝐸
• Discriminator will try to assign high scores to
fake values

• Use AutoEncoder inside the Discriminator

• Use Mean-Squared Reconstruction error as 𝑫 𝒙


• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms

More Bedrooms…

More Bedrooms…

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Celebs..

Celebs…
GAN Applications

› Image-to-Image Translation
r
a
j
› Text-to-Image Synthesis
k
u › Face Aging
m
a
r
s
@
v
i
t
.
a
c
.
i
n
Image-to-Image Translation

Figure 1 in the original paper.


Link to an interactive demo of this paper

Isola, P., Zhu, J. Y., Zhou, T.,& Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture

• Training is conditioned on the images


from the source domain.

• Conditional GANs provide an effective


way to handle many complex domains
without worrying about designing Figure 2 in the original paper.
structured loss functions explicitly.
Isola, P., Zhu, J. Y., Zhou, T.,& Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis
Motivation

Given a text description, generate


images closely associated.

Uses a conditional GAN with the


generator and discriminator being
condition on “dense” text
embedding.
Figure 1 in the original paper.

Reed,S., Akata,Z., Yan, X., Logeswaran,L.,Schiele, B., & Lee,H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis

Figure 2 in the original paper.

Positive Example: Negative Examples:


Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed,S., Akata,Z., Yan, X., Logeswaran,L.,Schiele, B., & Lee,H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional
GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.

Figure 1 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “FaceAging With Conditional Generative AdversarialNetworks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional
GANs

Figure 3 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “FaceAging With Conditional Generative AdversarialNetworks”. arXiv preprint arXiv:1702.01983.

You might also like