ONE WEEK ATAL FDP
r
on
a
j Convolutional Neural Networks with Generative Adversarial
k Networks
u
m
a
r Conceptual View of Generative Adversarial
s
Networks
@
v
i
13th December 2023 (Wednesday)
t
.
a Organized by
c
Department of Information Technology
.
i
Sir C R Reddy College of Engineering,
n Eluru, Andhra Pradesh, India.
[Link]/view/rajkumars1987
Dr. Rajkumar S
School of Computer Science and Engineering
Vellore Institute of Technology (Vellore)
Tamil Nadu, India – 632 014.
rajkumarsrajkumar@[Link]
Presentation Outline
Introduction to GANs
Some Challenges with GANs
Applications of GAN
Advanced GAN Extensions
Demo
Introduction to GAN
r
a
j
k
u
m
a “This (GANS), and the variations that are now being
r proposed is the most interesting idea in the last 10 years in
s
@ ML, in my opinion”
v
i
t
. –Yann LeCun
a
c
.
i
n
Introduction to GAN
r
a
j GAN was first introduced by Ian Goodfellow et al in 2014
k
u Have been used in generating images, videos, poems,
m some simple conversation.
a
r Note, image processing is easy (all animals can do it), NLP
s
@
is hard (only human can do it)
v
i
t
.
a
c Ian Goodfellow:
. [Link]
i Radford, (generate voices also here)
n
[Link]
Tips for training GAN: [Link]
WHAT ARE GANS?
r
a
j › Generative Adversarial Networks
k
u
m
a
r
s
@
v
i
t
.
a
c
.
i
n
WHAT ARE GANS?
r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
Generative Models
@
v
We try to learn the underlying the distribution
i from which our dataset comes from.
t
. Eg:VariationalAutoEncoders(VAE)
a
c
.
i
n
WHAT ARE GANS?
r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
Generative Models
@
v
We try to learn the underlying the distribution
i from which our dataset comes from.
t
. Eg:VariationalAutoEncoders(VAE)
a
c
.
i
n
WHAT ARE GANS?
r
a › Generative Adversarial Networks
j
k
u
m
a
r
s
AdversarialTraining
@
v
GANS are made up of two
i competing networks
t
. (adversaries) that are trying beat
a
c
each other.
.
i
n
WHAT ARE GANS?
r
a › Generative Adversarial Networks
j
k
u
m
a
r Neural Networks
s
@
v
i
t
.
a
c
.
i
n
WHAT ARE GANS?
r
a › Generative Adversarial Networks
j
k
u
m
a
r Neural Networks
s
@
v
i
t
.
a
c
.
i
n
W HATAREGANS?
Generative Adversarial Networks
Generative Models Neural Networks
We try to learn the underlying the distribution
from which our dataset comes from.
Eg:Variational AutoEncoders (VAE)
Adversarial Training
GANS are made up of two competing networks (adversaries) that
are trying beat each other.
W HATAREGANS?
Generated
P(z) Generator
Data
W HATAREGANS?
Generated
P(z) Generator
Data
Discriminator Real/Fake?
W HATAREGANS?
Generated
P(z) Generator
Data
Discriminator Real/Fake?
Real
Data
HOW TOTRAINA GAN?
HOW TOTRAINA GAN?
At t = 0,
Latent Generated (fake image)
Generator
Vector Image
(fake data) Generated
Data
Discriminator Real/Fake?
Given
(Real data)
Trainin
g Data
HOW TOTRAINA GAN?
At t = 0,
Latent Generated (fake image)
Generator
Vector Image
Binary
Classifier
(fake data) Generated
Data
Discriminator Real/Fake?
Given
(Real data)
Trainin
g Data
HOW TO TRAIN A GAN?
Which network should I train first?
HOW TOTRAINA GAN?
Which network should I train first?
Discriminator!
HOW TOTRAINA GAN?
› Which network should I train first?
› Discriminator!
› But with what training data?
HOW TOTRAINA GAN?
Which network should I train first?
Discriminator!
But with what training data?
The Discriminator is a Binary classifier.
The Discriminator has two class - Real and Fake.
The data for Real class if already given:THETRAINING DATA
The data for Fake class?-> generate from the Generator
HOW TOTRAINA GAN?
What’s next?-> Train the Generator
But how?What’s our training objective?
HOW TOTRAINA GAN?
What’s next?-> Train the Generator
But how?What’s our training objective?
Generate images from the Generator
such that they are classified incorrectly by the Discriminator!
HOW TOTRAINA GAN?
Discriminator
Step 1:
Train the Discriminator
using the current ability
of the Generator.
HOW TOTRAINA GAN?
Discriminator Generator
Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.
HOW TOTRAINA GAN?
Discriminator Generator
Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.
Generate images from the Generator
such that they are classified incorrectly by the Discriminator!
HOW TOTRAINA GAN?
Discriminator Generator
Step 1: Step 2:
Train the Discriminator Train the Generator
using the current ability to beat
of the Generator. the Discriminator.
Why Generative Models?
• We’ve only seen discriminative models so far
• Given an image X, predict a label Y
• Estimates P(Y|X)
• Discriminative models have several key limitations
• Can’t model P(X), i.e. the probability of seeing a certain image
• Thus, can’t sample from P(X), i.e. can’t generate new images
• Generative models (in general) cope with all of above
• Can model P(X)
• Can generate new images
Magic of GANs…
Lotter, William, Gabriel Kreiman,and David Cox. "Unsupervised learning of visual structure using predictive generativenetworks." arXiv preprint arXiv:1511.06380 (2015).
Magic of GANs…
Which one is Computer generated?
Ledig, Christian,et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprintarXiv:1609.04802 (2016).
Magic of GANs…
[Link]
Adversarial Training
• we saw:
• We can generate adversarial samples to fool a discriminative model
• We can use those adversarial samples to make models robust
• We then require more effort to generate adversarial samples
• Repeat this and we get better discriminative model
• GANs extend that idea to generative models:
• Generator: generate fake samples, tries to fool the Discriminator
• Discriminator: tries to distinguish between real and fake samples
• Train them against each other
• Repeat this and we get better Generator and Discriminator
GAN’s
Architecture
D D(x)
G
z
G(z)
D(G(z))
• Z is some random noise (Gaussian/Uniform).
• Z can be thought as the latent representation of the image.
[Link]
Training Discriminator
[Link]
Training Generator
[Link]
GAN’s formulation
min max 𝑉 𝐷, 𝐺
𝐺 𝐷
• It is formulated as a minimax game, where:
• The Discriminator is trying to maximize its reward 𝑽 𝑫, 𝑮
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)
• The Nash equilibrium of this particular game is achieved at:
• 𝑃𝑑𝑎𝑡𝑎𝑥 = 𝑃D𝑒𝑛 𝑥 ∀𝑥
1
• D 𝑥 = ∀𝑥
2
Discriminator
updates
Generator
updates
Vanishing gradient strikes back again…
min max 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)
𝛻𝜃𝘎 𝑉(𝐷, 𝐺) = 𝛻𝜃𝘎 D𝑧~𝑞(𝑧) log 1 − 𝐷 𝐺 𝑧
• 𝛻𝑎 log 1 − 𝜎 𝑎 = –𝛻 𝑎 T(𝑎) = –T 𝑎 1–T 𝑎
= −𝜎 𝑎 = −𝐷 𝐺 𝑧
1 – T(𝑎) 1 – T(𝑎)
• Gradient goes to 0 if 𝐷 is confident, i.e. 𝐷 𝐺 𝑧 →0
• Minimize −D𝑧~𝑞 𝑧 log 𝐷 𝐺 𝑧 for Generator instead (keep Discriminator as it is)
Faces
Goodfellow, Ian, et al. "Generativeadversarial nets." Advances in neural information processingsystems.2014.
DCGAN: Bedroom
images
Radford,Alec, Luke Metz, and Soumith Chintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs
(DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions
• Use Batch Normalization after
each layer
• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer
Radford,Alec, Luke Metz, and Soumith Chintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Latent vectors capture
interesting patterns…
Radford,Alec, Luke Metz, and SoumithChintala. "Unsupervised representationlearning with deep convolutional generativeadversarial networks." arXiv:1511.06434 (2015).
Advantages of GANs
• Plenty of existing work on Deep Generative Models
• Boltzmann Machine
• Deep Belief Nets
• Variational AutoEncoders (VAE)
• Why GANs?
• Sampling (or generation) is straightforward.
• Training doesn't involve Maximum Likelihood estimation.
• Robust to Overfitting since Generator never sees the training data.
• Empirically, GANs are good at capturing the modes of the distribution.
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Problems with GANs
• Probability Distribution is Implicit
• Not straightforward to compute P(X).
• Thus Vanilla GANs are only good for Sampling/Generation.
• Training is Hard
• Non-Convergence
• Mode-Collapse
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Training Problems
• Non-Convergence
• Mode-Collapse
• Deep Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.
min
𝐺
𝐿𝐺
• GANs instead involve two (or more) players
• Discriminator is trying to maximize its reward.
• Generator is trying to minimize Discriminator’s reward.
min max 𝑉 𝐷, 𝐺
𝐺 𝐷
• SGD was not designed to find the Nash equilibrium of a game.
• Problem: We might not converge to the Nash equilibrium at all.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Non-Convergence
min max 𝑉 𝑥, 𝑦
𝑥 𝑦
Let 𝑉 𝑥, 𝑦 = 𝑥𝑦
• State 1: x>0 y>0 V> 0 Increase y Decrease x
• State 2: x<0 y>0 V< 0 Decrease y Decrease x
• State 3: x<0 y<0 V> 0 Decrease y Increase x
• State 4 : x>0 y<0 V< 0 Increase y Increase x
• State 5: x>0 y>0 V> 0 == State 1 Increase y Decrease x
Mode-Collapse
• Generator fails to output diverse samples
Target
Expected
Output
Metz, Luke, et al. "Unrolled Generative AdversarialNetworks." arXiv preprint arXiv:1611.02163 (2016).
Some Solutions
• Mini-Batch GANs
• Supervision with labels
Basic (Heuristic) Solutions
• Mini-Batch GANs
• Supervision with labels
How to reward sample diversity?
• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.
• To address this problem,
• Let the Discriminator know about this edge-case.
• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake
• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch
• Feed those features to the discriminator along with the image
• Feature values will differ b/w diverse and non-diverse batches
• Thus, Discriminator will rely on those features for classification
• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Basic (Heuristic) Solutions
• Mini-Batch GANs
• Supervision with labels
Supervision with Labels
• Label information of the real data might help
Car
Dog
Real
D D Human
Fake
Fake
• Empirically generates much better samples
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information ProcessingSystems.2016.
Alternate view of GANs
min max 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺(𝑧)
𝐷∗ = 𝑎𝑟𝑔max 𝑉 𝐷, 𝐺 𝐺∗ = 𝑎𝑟𝑔m𝑖𝑛 𝑉 𝐷, 𝐺
𝐷 𝐺
• In this formulation, Discriminator’s strategy was 𝐷 𝑥 → 1, 𝐷 𝐺 𝑧 →0
• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0
𝑉 𝐷, 𝐺 = D𝑥∼𝑝(𝑥) log 1 − 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 𝐷 𝐺(𝑧)
• In this new formulation, Discriminator’s strategy will be 𝐷 𝑥 → 0, 𝐷 𝐺 𝑧 →1
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If all we want to encode is 𝐷 𝑥 → 0, 𝐷 𝐺 𝑧 →1
𝐷∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝐷 D 𝑥∼𝑝(𝑥) log 1 − 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 𝐷 𝐺(𝑧)
We can use this 𝐷∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝐷 D𝑥∼𝑝 𝑥 log 𝐷 𝑥 + D𝑧∼𝑞(𝑧) log 1 − 𝐷 𝐺 𝑧
• Now, we can replace cross-entropy with any loss function (Hinge Loss)
𝐷∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝐷 D𝑥∼𝑝 𝑥 𝐷 𝑥 + D𝑧∼𝑞 𝑧 max 0, 𝑚 − 𝐷 𝐺 𝑧
• And thus, instead of outputting probabilities, Discriminator just has to output :-
• High values for fake samples
• Low values for real samples
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs
• Modified game plans
• Generator will try to generate samples with
low values 𝐷 𝑥 = ||𝐷𝑒𝑐 𝐸𝑛𝑐 𝑥 − 𝑥||𝑀𝑆𝐸
• Discriminator will try to assign high scores to
fake values
• Use AutoEncoder inside the Discriminator
• Use Mean-Squared Reconstruction error as 𝑫 𝒙
• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms
More Bedrooms…
More Bedrooms…
Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-basedgenerative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Celebs..
Celebs…
GAN Applications
› Image-to-Image Translation
r
a
j
› Text-to-Image Synthesis
k
u › Face Aging
m
a
r
s
@
v
i
t
.
a
c
.
i
n
Image-to-Image Translation
Figure 1 in the original paper.
Link to an interactive demo of this paper
Isola, P., Zhu, J. Y., Zhou, T.,& Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture
• Training is conditioned on the images
from the source domain.
• Conditional GANs provide an effective
way to handle many complex domains
without worrying about designing Figure 2 in the original paper.
structured loss functions explicitly.
Isola, P., Zhu, J. Y., Zhou, T.,& Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis
Motivation
Given a text description, generate
images closely associated.
Uses a conditional GAN with the
generator and discriminator being
condition on “dense” text
embedding.
Figure 1 in the original paper.
Reed,S., Akata,Z., Yan, X., Logeswaran,L.,Schiele, B., & Lee,H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis
Figure 2 in the original paper.
Positive Example: Negative Examples:
Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed,S., Akata,Z., Yan, X., Logeswaran,L.,Schiele, B., & Lee,H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional
GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.
Figure 1 in the original paper.
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “FaceAging With Conditional Generative AdversarialNetworks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional
GANs
Figure 3 in the original paper.
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “FaceAging With Conditional Generative AdversarialNetworks”. arXiv preprint arXiv:1702.01983.