0% found this document useful (0 votes)
25 views4 pages

Autoencoders and Generative Models Explained

The document discusses deep learning techniques focusing on autoencoders and generative models. It outlines the structure and types of autoencoders, including undercomplete, regularized, and denoising autoencoders, as well as deep generative models like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Each model's objectives, architectures, and training mechanisms are detailed, emphasizing their applications and advantages in data representation and generation.

Uploaded by

Mayank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views4 pages

Autoencoders and Generative Models Explained

The document discusses deep learning techniques focusing on autoencoders and generative models. It outlines the structure and types of autoencoders, including undercomplete, regularized, and denoising autoencoders, as well as deep generative models like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Each model's objectives, architectures, and training mechanisms are detailed, emphasizing their applications and advantages in data representation and generation.

Uploaded by

Mayank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning: Autoencoders and Generative Models

1. Autoencoders (AEs)

• Definition: A specific type of feedforward neural network where the input is the
same as the output ($\text{input} = \text{output}$)1.

• Objective: To compress the input into a lower-dimensional code (latent-space


representation) and then reconstruct the output from this representation2. The goal
is an output identical to the input3.

• Components/Architecture: An AE has three main components: encoder, code, and


decoder4.

o Encoder: Compresses the input to produce the code5.

o Code (Bottleneck / Latent-space representation): A compact summary of the


input6666.

o Decoder: Reconstructs the input from the code7.

o The dimensionality of the input and output needs to be the same8. The
decoder architecture is typically the mirror image of the encoder, though this
is not a requirement9.

• Key Properties:

o Data-specific: Can only meaningfully compress data similar to what they were
trained on10.

o Lossy: The output will not be exactly the same as the input; it will be a close
but degraded representation11.

o Unsupervised / Self-supervised: Considered an unsupervised learning


technique, but more precisely self-supervised because they generate their
own labels from the training data12121212.

• Hyperparameters:

o Code size: Number of nodes in the middle layer; smaller size results in more
compression13.

o Number of layers and Number of nodes per layer14141414.

o Loss function: Typically Mean Squared Error (MSE) or Binary Crossentropy


(used if input values are in the range [0, 1])15.

• Training: Trained via backpropagation, the same way as ANNs16.


2. Types of Autoencoders

2.1. Undercomplete Autoencoders

• Architecture: Have a smaller dimension for hidden layer compared to the input
layer17.

• Objective: To learn the most salient and important features of the data
distribution181818181818181818.

• Loss Function: Minimizes $L(x, g(f(x)))$19. $L$ is the loss function (e.g., mean squared
error or mean absolute error) penalizing $g(f(x))$ from diverging from the original
input $x$20202020.

• Comparison to PCA: When the decoder is linear and MSE is used, it generates a
reduced feature space similar to PCA21. Non-linear $f$ (encoder) and $g$ (decoder)
functions yield a powerful nonlinear generalization of PCA22.

2.2. Regularized Autoencoders

• Objective: To encourage the model to have properties (like sparsity or robustness to


noise) other than just copying the input to the output23. This allows the use of non-
linear, overcomplete architectures without learning a trivial identity function 24.

• Mechanism: Uses a loss function with a regularization term25252525.

• Types of Regularization (Sparse Autoencoders):

o L1 Regularization: Adds the absolute value of magnitude of coefficients as a


penalty term26. This regularization tends to shrink the penalty coefficient to
zero 27, resulting in a sparse representation28. The objective function
includes the term:

$$Obj = L(x,\hat{x})+regularization+\lambda\sum_{i}|a_{i}^{(h)}|$$

where the third term penalizes the absolute value of the vector of activations $a$ in layer
$h$ for sample $i$29.

o Other methods include KL-divergence30.

• Common Types: Sparse autoencoder and denoising autoencoder31.

2.3. Denoising Autoencoders (DAEs)

• Objective: To reconstruct the original version of the input signal from a


stochastically corrupted (noisy) version32323232.

• Mechanism: The DAE is presented with clean input examples and their
corresponding noisy versions during training33333333. It minimizes a reconstruction
loss function to evaluate the disparity between the clean input and the
reconstructed output34.

• Applications: Image Denoising, Fraud Detection, Data Imputation, Data Compression,


Anomaly Detection35353535.

3. Deep Generative Models

Generative Models aim to reproduce the training items and use the decoder to generate
new items of a similar "style"36. They achieve this by choosing latent variables $z$ from a
standard Normal distribution and feeding them to the decoder37.

3.1. Variational Autoencoders (VAEs)

• Generative Model Type: Explicit generative model38.

• Objective: To capture the underlying probability distribution of a given dataset and


generate novel samples39.

• Architecture: Comprises an encoder-decoder structure40404040.

o Encoder (Stochastic): Transforms input data into a latent code4141. It outputs


two vectors, $\mu$ (mean) and $\sigma$ (standard deviation), which are the
parameters of a Gaussian distribution42424242. This is a stochastic encoder,
generalizing the encoding function $f(x)$ to an encoding distribution
$p_{encoder}(h|x)$43.

o Sampling Layer: The actual latent vector is obtained by sampling from the
Gaussian distribution defined by $\mu$ and $\sigma$44. Sampling from $Z
\sim N(\mu,\sigma^2)$ is the same as sampling from $\mu + \sigma X$
where $X \sim N(0,1)$ is the standard normal sample45.

o Decoder: Reconstructs the original data from the sampled latent code46464646.
The decoder defines a conditional probability distribution $p_{decoder}(x|z)$
of output $x$ given $z$47.

• Key Advantage: The latent space is continuous, allowing the decoder to generate
new data points that seamlessly interpolate among training data points48.

3.2. Generative Adversarial Networks (GANs)

• Generative Model Type: Implicit generative model49.

• Objective: Consist of two models that compete with each other to discover, learn,
and replicate the patterns within a dataset, generating new, plausible
examples50505050.
• Components:

o Generator ($G$): A neural network that takes a fixed-length random vector


(noise) as input and creates a fake data sample51515151. Its main aim is to
make the Discriminator classify its output as real52525252.

o Discriminator ($D$): A neural network that identifies real data (positive


samples) from the fake data (negative samples) created by the
Generator53535353.

• Training (Adversarial Game): Both $G$ and $D$ play an adversarial game, working
simultaneously54545454.

o $D$ is trained to classify both real data and fake data, and the Discriminator
Loss penalizes misclassification55.

o $G$ is trained to increase $D$'s probability of making mistakes56. The


Generator Loss penalizes $G$ for failing to fool $D$57.

• Mathematical Equation: The training is represented as a minimax game:

$$min_{G}max_{D}V(D,G

58

where the value function is:

$$V(D,G)=\mathbb{E}_{x\sim p_{data}(x)}[logD(x)]+\mathbb{E}_{z\sim
p_{z}(z)}[log(1-D(G(z))]$$
59

o $D$ tries to maximize $V(D,G)$60.

o $G$ tries to minimize $V(D,G)$61.

You might also like