0% found this document useful (0 votes)
20 views51 pages

Understanding Autoencoders in Neural Networks

Uploaded by

vnandi1802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views51 pages

Understanding Autoencoders in Neural Networks

Uploaded by

vnandi1802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chapter 3

AUTOENCODERS :
UNSUPERVISED
LEARNING
• Autoencoders are a variant of feed-forward neural networks that have an extra bias for
calculating the error of reconstructing the original input. After training, autoencoders
are then used as a normal feed-forward neural network for activations. This is an
unsupervised form of feature extraction because the neural network uses only the
original input for learning weights rather than backpropagation, which has labels.
• Deep networks can use either RBMs or autoencoders as building blocks for larger
networks (a single network rarely uses both). The aim of an autoencoder is to learn a
lower-dimensional representation (encoding) for a higher-dimensional data, typically
for dimensionality reduction, by training the network to capture the most important
parts of the input image.

• Autoencoder is a type of neural network where the output layer has the same
dimensionality as the input layer. In simpler words, the number of output units in the
output layer is equal to the number of input units in the input layer. An autoencoder
replicates the data from the input to the output in an unsupervised manner and is
therefore sometimes referred to as a replicator neural network.
• The autoencoders reconstruct each dimension of the input by passing it through
the network. It may seem trivial to use a neural network for the purpose of
replicating the input, but during the replication process, the size of the input is
reduced into its smaller representation. The middle layers of the neural network
have a fewer number of units as compared to that of input or output layers.
Therefore, the middle layers hold the reduced representation of the input. The
output is reconstructed from this reduced representation of the input.
• Autoencoders have various applications, such as in data compression, image
denoising, and anomaly detection. They can also be used for generating new data
by sampling from the latent space.
• The main difference between Autoencoders and Principle Component Analysis
(PCA) is that while PCA finds the directions along which you can project the data
with maximum variance, Autoencoders reconstruct our original input given just a
compressed version of it.
• If anyone needs the original data can reconstruct it from the compressed data
using an autoencoder.
Architecture of autoencoders

• Encoder: An encoder is a feedforward, fully connected neural network that


compresses the input into a latent space representation and encodes the input
image as a compressed representation in a reduced dimension. The compressed
image is the distorted version of the original image.

• Bottleneck/ Code: A module that contains the compressed knowledge


representations and is therefore the most important part of the network. This
part of the network contains the reduced representation of the input that is fed
into the decoder.

• Decoder: Decoder is also a feedforward network like the encoder and has a
similar structure to the encoder. This network is responsible for reconstructing
the input back to the original dimensions from the code. A module that helps the
network “decompress” the knowledge representations and reconstructs the data
back from its encoded form. The output is then compared with a ground truth.
1. Encoder

• It compress the input data into a smaller, more manageable form by


reducing its dimensionality while preserving important information.
It has three layers which are:
• Input Layer: This is where the original data enters the network. It
can be images, text features or any other structured data.
• Hidden Layers: These layers perform a series of transformations on
the input data. Each hidden layer applies weights and
activation functions to capture important patterns, progressively
reducing the data's size and complexity.
• Output(Latent Space): The encoder outputs a compressed vector
known as the latent representation or encoding. This vector
captures the important features of the input data in a condensed
form helps in filtering out noise and redundancies
2. Bottleneck (Latent Space)

• It is the smallest layer of the network which represents the most compressed
version of the input data. It serves as the information bottleneck which force the
network to prioritize the most significant features. This compact representation
helps the model learn the underlying structure and key patterns of the input helps
in enabling better generalization and efficient data encoding.
• Since the bottleneck is designed in such a way that the maximum information
possessed by an image is captured in it, we can say that the bottleneck helps us
form a knowledge-representation of the input.
• A bottleneck as a compressed representation of the input further prevents the
neural network from memorising the input and overfitting on the data.
• As a rule of thumb, remember this: The smaller the bottleneck, the lower the risk of
overfitting.
• However—
• Very small bottlenecks would restrict the amount of information storable, which
increases the chances of important information slipping out through the pooling
layers of the encoder.
3. Decoder

• It is responsible for taking the compressed representation from the


latent space and reconstructing it back into the original data form.
• Since the input to the decoder is a compressed knowledge
representation provided as an output by the bottleneck layer, the
decoder serves as a “decompressor” and builds back the image from its
latent attributes.

• Hidden Layers: These layers progressively expand the latent vector back
into a higher-dimensional space. Through successive transformations
decoder attempts to restore the original data shape and details
• Output Layer: The final layer produces the reconstructed output which
aims to closely resemble the original input. The quality of reconstruction
depends on how well the encoder-decoder pair can minimize the
difference between the input and output during training.
First, the input goes through the encoder where it is compressed and stored in the
layer called Code, then the decoder decompresses the original input from the code.
The main objective of the autoencoder is to get an output identical to the input.
Note that the decoder architecture is the mirror image of the encoder.
This is not a requirement but it’s typically the case. The only requirement is the
dimensionality of the input and output must be the same.
Loss Function in Autoencoder Training
• During training an autoencoder’s goal is to minimize the reconstruction loss which measures how different the
reconstructed output is from the original input. The choice of loss function depends on the type of data being
processed:
• Mean Squared Error (MSE): This is commonly used for continuous data. It measures the average squared
differences between the input and the reconstructed data.
• Binary Cross-Entropy: Used for binary data (0 or 1 values). It calculates the difference in probability between the
original and reconstructed output.
• During training the network updates its weights using backpropagation to minimize this reconstruction loss. By
doing this it learns to extract and retain the most important features of the input data which are encoded in the
latent space.
• Efficient Representations in Autoencoders
• Constraining an autoencoder helps it learn meaningful and compact features from the input data which leads to
more efficient representations. After training only the encoder part is used to encode similar data for future
tasks. Various techniques are used to achieve this are as follows:
• Keep Small Hidden Layers: Limiting the size of each hidden layer forces the network to focus on the most
important features. Smaller layers reduce redundancy and allows efficient encoding.
• Regularization: Techniques like L1 or L2 regularization add penalty terms to the loss function. This prevents
overfitting by removing excessively large weights which helps in ensuring the model to learns general and useful
representations.
• Denoising: In denoising autoencoders random noise is added to the input during training. It learns to remove
this noise during reconstruction which helps it focus on core, noise-free features and helps in improving
robustness.
• Tuning the Activation Functions: Adjusting activation functions can promote sparsity by activating only a few
neurons at a time. This sparsity reduces model complexity and forces the network to capture only the most
relevant features.
How to train autoencoders?
• You need to set 4 hyperparameters before training an autoencoder:

• Code size: The code size or the size of the bottleneck is the most important
hyperparameter used to tune the autoencoder. The bottleneck size decides how
much the data has to be compressed. This can also act as a regularisation term.

• Number of layers: Like all neural networks, an important hyperparameter to tune


autoencoders is the depth of the encoder and the decoder. While a higher depth
increases model complexity, a lower depth is faster to process.

• Number of nodes per layer: The number of nodes per layer defines the weights we
use per layer. Typically, the number of nodes decreases with each subsequent layer in
the autoencoder as the input to each of these layers becomes smaller across the
layers.

• Reconstruction Loss: The loss function we use to train the autoencoder is highly
dependent on the type of input and output we want the autoencoder to adapt to. If
we are working with image data, the most popular loss functions for reconstruction
are MSE Loss and L1 Loss. In case the inputs and outputs are within the range [0,1],
Types of Autoencoders
Undercomplete Autoencoder
• The objective of undercomplete autoencoder is to capture
the most important features present in the data.
Undercomplete autoencoders have a smaller dimension for
hidden layer compared to the input layer. This helps to
obtain important features from the data. It minimizes the
loss function by penalizing the g(f(x)) for being different
from the input x.
• Advantages-
• Undercomplete autoencoders do not need any
regularization as they maximize the probability of data
rather than copying the input to the output.
• Drawbacks-
• Using an overparameterized model due to lack of sufficient
training data can create overfitting.
Vanilla Autoencoder
• The most basic type of autoencoder, which consists of a decoder network that
reconstructs the input from the compressed representation after the input has
been compressed by an encoder network.
• Purpose: The Vanilla Autoencoder is a straightforward neural network design
with the aim of learning to compress input data into a low-
dimensional representation and then recover the original input from this
representation. This particular autoencoder's objective is to preserve the most
crucial aspects of the input data while minimizing the amount of data that is lost
during compression.
• Architecture: Encoder and decoder networks make up the two portions of the
Vanilla Autoencoder's architecture. The decoder network converts the lower-
dimensional representation back to the original input data once the encoder
network has converted the input data to it.
• Working: The Vanilla Autoencoder reduces the reconstruction error between the
input's original value and the decoder network's output. Usually, a loss function
like mean squared error (MSE) or binary cross-entropy (BCE) is used for this.
• Application: Image and video compression, anomaly detection, and
dimensionality reduction are just a few of the many uses for vanilla
autoencoders.
Applications of Vanilla Autoencoders

• Some key applications include:


• Data Compression: They learn a compact version of the
input data making storage and transmission more
efficient.
• Feature Learning: It extract important patterns from
data which is useful in image processing, natural
language processing and sensor analysis.
• Anomaly Detection: If the reconstructed output is
different from the original input, it can show an anomaly
or outlier which makes autoencoders useful for fraud
detection and system monitoring.
Practical implementation
• Here we will be using Numpy, Matplotlib and Tensorflow libraries
for its implementation and also we are using inbuilt dataset for this.
• (x_train, _), (x_test, _) = fashion_mnist.load_data(): Loads Fashion
MNIST dataset into training and testing sets, ignoring labels.
• encoded = [Link](encoding_dim, activation='relu')
(input_img): Encodes input into 32-dimensional vector with ReLU
activation.
• decoded = [Link](784, activation='sigmoid')
(encoded): Decodes the compressed vector back to 784
dimensions with sigmoid activation.
• autoencoder = [Link](input_img, decoded): Creates the
autoencoder model connecting input to output.
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist

• (x_train, _), (x_test, _) = fashion_mnist.load_data()


• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.

• x_train_flat = x_train.reshape(len(x_train), 784)


• x_test_flat = x_test.reshape(len(x_test), 784)

• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)

• autoencoder = [Link](input_img, decoded)


• [Link](optimizer='adam', loss='binary_crossentropy')
• [Link](x_train_flat, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_flat, x_test_flat))

• decoded_imgs = [Link](x_test_flat)

• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')

• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• [Link]()
Denoising Autoencoder

• Using corrupted data as training input and clean data as the objective, a
denoising autoencoder can be trained to remove noise from input data.
• Purpose: The Denoising Autoencoder is a subclass of autoencoder that
learns to filter out noise from input data. A robust representation of the
input data that is less sensitive to noise is what this kind of autoencoder
aims to learn.
• Architecture: The Denoising Autoencoder shares the same design as the
Vanilla Autoencoder, with the exception that it trains on corrupted input
data rather than the original data. Training involves feeding intentionally
corrupted inputs and minimizing the reconstruction error against the clean
version.
• This approach forces the model to capture robust features that are
invariant to noise.
• Working: The Denoising Autoencoder minimizes the reconstruction error
between the original input data and the output of the decoder network,
given the input of corrupted input data.
Applications of Denoising Autoencoders

• Image Denoising: Removes noise from images


to increase quality and improve downstream
processing.
• Signal Cleaning: Filters noise from audio and
sensor signals helps in boosting detection
accuracy.
• Data Preprocessing: Cleans corrupted data
before input to other models helps in
increasing robustness and performance.
Practical implementation

• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist

• (x_train, _), (x_test, _) = fashion_mnist.load_data()


• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.

• x_train_flat = x_train.reshape(len(x_train), 784)


• x_test_flat = x_test.reshape(len(x_test), 784)

• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)

• denoising_autoencoder = [Link](input_img, decoded)


• denoising_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
• noise_factor = 0.5
• x_train_noisy = x_train_flat + noise_factor * [Link](loc=0.0, scale=1.0,
size=x_train_flat.shape)
• x_test_noisy = x_test_flat + noise_factor * [Link](loc=0.0, scale=1.0,
size=x_test_flat.shape)
• x_train_noisy = [Link](x_train_noisy, 0., 1.)
• x_test_noisy = [Link](x_test_noisy, 0., 1.)
• denoising_autoencoder.fit(x_train_noisy, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_noisy, x_test_flat))
• decoded_imgs = denoising_autoencoder.predict(x_test_noisy)
• [Link](figsize=(20, 6))
• for i in range(n):
• ax = [Link](3, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](3, n, i + 1 + n)
• [Link](x_test_noisy[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](3, n, i + 1 + 2*n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
Convolutional Autoencoder

• An autoencoder type created exclusively for picture data, using convolutional


layers in the encoder and decoder networks.
• Purpose: A particular kind of autoencoder created especially for image data is
the convolutional autoencoder. This kind of autoencoder's objective is to
discover a compressed representation of an input image while preserving
spatial information.
• Architecture: Convolutional layers are used in both the encoder and decoder
networks of the Convolutional Autoencoder, which allows it to extract spatial
information from the input image.
• Working: Convolutional autoencoding works by reducing the reconstruction
error between the input image's original version and the decoder network's
output.
• Applications: Image and video compression, image denoising, and image
synthesis are just a few applications where convolutional autoencoders have
been put to use.
Variational Autoencoder
• A sort of generative model that can produce fresh samples from the learned
distribution after learning a latent representation of the input data.
• Purpose: The Variational Autoencoder is a form of autoencoder that may produce new
samples from the learned distribution of the input data. This particular autoencoder's
objective is to learn a compressed representation of the input data that may be
applied to the creation of new samples.
• Architecture: The Variational Autoencoder's architecture consists of a decoder
network and an encoder network that transfer the input data to a probability
distribution in latent space. The VAE, on the other hand, learns a probability
distribution over the latent space and has a probabilistic interpretation, unlike other
kinds of autoencoders.
• Working: The VAE samples a latent vector from a multivariate Gaussian
distribution by encoding the input data into a mean and variance vector. After
receiving this latent vector as input, the decoder network creates a new sample using
the discovered distribution. The objective is to reduce the divergence between the
learned distribution and the prior distribution over the latent space, as well as the
reconstruction error between the original input data and the decoder network's
output.
• Applications: The uses include image synthesis, data compression, and anomaly
detection. Variational Autoencoders have been employed in these fields.
Sparse Autoencoders

• An autoencoder that learns a sparse representation of the input data by punishing the
activation of hidden units is known as a sparse autoencoder.
• Purpose: The Sparse Autoencoder is a form of autoencoder that enforces sparsity in
the learned representation while learning a compressed version of the input data. This
kind of autoencoder learns a condensed representation of the input data
that captures its key characteristics.
• Inventor: A. Ng et al. made the initial suggestion for the sparse autoencoder in 2011.
• Architecture: The design of the Sparse Autoencoder is identical to that of the Vanilla
Autoencoder, but it includes an additional regularization term that promotes sparse
activations of the hidden units.
• Working: The Sparse Autoencoder minimizes the reconstruction error between the
input data used initially and the output produced by the decoder network while
simultaneously punishing the activation of concealed units that do not aid in the
reconstruction.
• Applications: Sparse Autoencoders have been applied to features such as feature
selection, anomaly detection, and image denoising.
Sparse Autoencoder
• Sparse autoencoders have hidden nodes greater than input nodes. They can still discover
important features from the data. A generic sparse autoencoder is visualized where the
obscurity of a node corresponds with the level of activation. Sparsity constraint is
introduced on the hidden layer. This is to prevent output layer copy input data. Sparsity
may be obtained by additional terms in the loss function during the training process, either
by comparing the probability distribution of the hidden unit activations with some low
desired value,or by manually zeroing all but the strongest hidden unit activations. Some of
the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep
neural networks.
• Advantages-
• Sparse autoencoders have a sparsity penalty, a value close to zero but not exactly zero.
Sparsity penalty is applied on the hidden layer in addition to the reconstruction error. This
prevents overfitting.
• They take the highest activation values in the hidden layer and zero out the rest of the
hidden nodes. This prevents autoencoders to use all of the hidden nodes at a time and
forcing only a reduced number of hidden nodes to be used.
• Drawbacks-
• For it to be working, it's essential that the individual nodes of a trained model which
activate are data dependent, and that different inputs will result in activations of different
Sparse Autoencoder
• Sparse Autoencoder add sparsity constraints that encourage only a small subset of
neurons in the hidden layer to activate at once helps in creating a more efficient and
focused representation.
• Unlike vanilla models, they include regularization methods like L1 penalty and dropout to
enforce sparsity.
• KL Divergence is used to maintain the sparsity level by matching the latent distribution to
a predefined sparse target.
• This selective activation helps in feature selection and learning meaningful patterns
while ignoring irrelevant noise.
• Applications of Sparse Autoencoders
• Feature Selection: Highlights the most relevant features by encouraging sparse
activation helps in improving interpretability.
• Dimensionality Reduction: Creates efficient, low-dimensional representations by limiting
active neurons.
• Noise Reduction: Reduces irrelevant information and noise by activating only key
neurons helps in improving model generalization.
• Now lets see the practical implementation.
• encoded = [Link](encoding_dim, activation='relu',
activity_regularizer=[Link].l1(1e-5))(input_img): Creates the encoded
layer with ReLU activation and adds L1 regularization to encourage sparsity.
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist
• (x_train, _), (x_test, _) = fashion_mnist.load_data()
• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.
• x_train_flat = x_train.reshape(len(x_train), 784)
• x_test_flat = x_test.reshape(len(x_test), 784)
• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu',
• activity_regularizer=[Link].l1(1e-5))(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)

• sparse_autoencoder = [Link](input_img, decoded)


• sparse_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
• sparse_autoencoder.fit(x_train_flat, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_flat, x_test_flat))
• decoded_imgs = sparse_autoencoder.predict(x_test_flat)
• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• [Link]()
Contractive Autoencoder

• An autoencoder type that learns a reliable representation of the input data


by punishing the model's susceptibility to minor input perturbations.
• Purpose: The Contractive Autoencoder is a form of autoencoder that
enforces robustness to minor input perturbations while learning a
compressed representation of the input data. A compact representation that
captures the most crucial aspects of the input data and is impervious to
minute alterations is what this kind of autoencoder aims to learn.
• Inventor: A. Rifai et al. first put forth the Contractive Autoencoder in 2011.
• Architecture: The Contractive Autoencoder shares the same design as the
Vanilla Autoencoder, but adds a regularization term to reduce the hidden
units' sensitivity to slight input changes.
• Working: The Contractive Autoencoder minimizes the reconstruction error
between the original input data and the decoder network's output while also
penalizing the hidden units' sensitivity to minute input changes.
• Application: Contractive Autoencoders have been applied to feature
extraction, anomaly detection, and image denoising.
Contractive Autoencoder
• The objective of a contractive autoencoder is to have a robust
learned representation which is less sensitive to small variation in
the data. Robustness of the representation for the data is done by
applying a penalty term to the loss function. Contractive
autoencoder is another regularization technique just like sparse and
denoising autoencoders. However, this regularizer corresponds to
the Frobenius norm of the Jacobian matrix of the encoder
activations with respect to the input. Frobenius norm of the Jacobian
matrix for the hidden layer is calculated with respect to input and it
is basically the sum of square of all elements.
• Advantages-
• Contractive autoencoder is a better choice than denoising
autoencoder to learn useful feature extraction.
• This model learns an encoding in which similar inputs have similar
encodings. Hence, we're forcing the model to learn how to contract
a neighborhood of inputs into a smaller neighborhood of outputs.
Contractive Autoencoder

• Contractive Autoencoders introduce an additional penalty during


training to make the learned representations robust to small changes in
input data.
• They minimize both reconstruction error and a regularization term that
penalizes sensitivity to input perturbations.
• This results in stable, invariant features useful in noisy or fluctuating
environments.
• Applications of Contractive Autoencoders
• Stable Representation: Learns features that remain consistent despite
small input variations.
• Transfer Learning: Provides robust feature vectors for tasks with limited
labeled data.
• Data Augmentation: Generates stable variants of input data to increase
training diversity.
• Now lets see the practical implementation.
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist
• (x_train, _), (x_test, _) = fashion_mnist.load_data()
• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.
• x_train_flat = x_train.reshape(len(x_train), 784)
• x_test_flat = x_test.reshape(len(x_test), 784)
• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)
• contractive_autoencoder = [Link](input_img, decoded)
• def contractive_loss(y_true, y_pred):
• mse = [Link].mean_squared_error(y_true, y_pred)
• W = contractive_autoencoder.layers[1].kernel
• dh = [Link](contractive_autoencoder.layers[1].output, input_img)[0]
• contractive = tf.reduce_sum([Link](W)) * tf.reduce_sum([Link](dh))
• return mse + 1e-4 * contractive
• contractive_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
• contractive_autoencoder.fit(x_train_flat, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_flat, x_test_flat))
• decoded_imgs = contractive_autoencoder.predict(x_test_flat)
• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• [Link]()
Under Complete Autoencoders

• Under complete autoencoders is an unsupervised neural


network that you can use to generate a compressed
version of the input data.
• It is done by taking in an image and trying to predict the
same image as output, thus reconstructing the image
from its compressed bottleneck region.
• The primary use for autoencoders like these is
generating a latent space or bottleneck, which forms a
compressed substitute of the input data and can be
easily decompressed back with the help of the network
when needed.
Undercomplete Autoencoder

• Undercomplete Autoencoders intentionally restrict the size of the hidden


layer to be smaller than the input layer.
• This bottleneck forces the model to compress the data helps in learning
only the most significant features and discarding redundant information.
• The model is trained by minimizing the reconstruction error while ensuring
the latent space remains compact.
• Applications of Undercomplete Autoencoders
• Anomaly Detection: Detects unusual data points by capturing deviations in
compressed features.
• Feature Extraction: Focuses on key data characteristics to improve
classification and analysis.
• Data Compression: Encodes input data efficiently to save storage and
speed up transmission.
• Now lets see the practical implementation.
• encoded = [Link](encoding_dim, activation='relu',
(input_img): Builds the encoder layer with ReLU activation.
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist

• (x_train, _), (x_test, _) = fashion_mnist.load_data()


• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.

• x_train_flat = x_train.reshape(len(x_train), 784)


• x_test_flat = x_test.reshape(len(x_test), 784)

• n = 10
• encoding_dim = 16
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)

• undercomplete_autoencoder = [Link](input_img, decoded)


• undercomplete_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

• undercomplete_autoencoder.fit(x_train_flat, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_flat, x_test_flat))

• decoded_imgs = undercomplete_autoencoder.predict(x_test_flat)

• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')

• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
Overcomplete Autoencoders

• Overcomplete Autoencoders are a type of autoencoders that use more hidden units than the number of input
units. This means that the encoder and decoder layers have more units than the input layer. The idea behind
using more hidden units is to learn a more complex, non-linear function that can better capture the structure of
the data.
• Advantages of using Overcomplete Autoencoders include their ability to learn more complex representations of
the data, which can be useful for tasks such as feature extraction and denoising. Overcomplete Autoencoders are
also more robust to noise and can handle missing data better than traditional autoencoders
• However, there are also some disadvantages to using Overcomplete Autoencoders. One of the main
disadvantages is that they can be more difficult to train than traditional autoencoders. This is because the extra
hidden units can cause overfitting, which can lead to poor performance on new data.
• Implementing Overcomplete Autoencoders with PyTorch
• To implement Overcomplete Autoencoders with PyTorch, we need to follow several steps:
• Dataset preparation: In order to train the model, we must first prepare the dataset. The loading of the data, it is
preliminary processing, and its division into training and test sets are examples of this.
• constructing the architectural model Using PyTorch, we must define the Overcomplete Autoencoder's
architecture. This entails specifying the loss function and the encoder and decoder layers that will be used to
train the model.
• Model training: Using the provided dataset, we must train the Overcomplete Autoencoder. The optimization
process must be specified, the hyperparameters must be configured, and the model weights must be updated
after iterating through the training set of data.
• Evaluation of the model's performance: When the model has been trained, we must assess how well it performs
using the test data. Calculating metrics like the reconstruction error and viewing the model's output are required
for this.
Fully Connected Autoencoder
• Purpose: In order to learn unsupervised material, neural networks with fully connected autoencoders
(FCAEs) are used. They are made to train a neural network to encode input data into a lower-dimensional
space, learn a compressed representation of the data, and then decode the data back into the original
space with the least amount of information loss. Anomaly detection, data visualization, and feature
extraction are just a few downstream tasks that can be performed using compressed representation.
• Inventor: The concept of autoencoders was first proposed in the 1980s, and the design has undergone
numerous changes over time. In a 1986 publication, Rumelhart, Hinton, and Williams suggested the first
fully connected autoencoder.
• Architecture: A fully linked autoencoder is made up of a bottleneck layer in the middle, a decoder
network, and an encoder network. The input data is received by the encoder network, which then
transfers it to a compressed form in the bottleneck layer. The compressed form is then used by the
decoder network to map it back to the original data space. The input and output layers of an encoder and
decoder network are normally symmetrical, and the bottleneck layer has fewer neurons than the other
layers.
• Working: To reduce the reconstruction error between the input data and the decoder network's output,
the fully connected autoencoder is trained.
• For binary input data, this is often accomplished by reducing the binary cross-entropy loss or the mean
squared error (MSE) loss between the input and output.
• Backpropagation and gradient descent is used to update the encoder and decoder networks' weights
during training. In order to minimize the loss, the weights are updated and the gradients are computed
with respect to the loss function.
• Applications: Following training, the bottleneck layer's compressed representation can be applied to a
variety of downstream applications. Fully lined autoencoders have been employed in a wide range of
applications, including dimensionality reduction, anomaly detection, data denoising, and picture
compression. They have also been employed in machine translation and language synthesis jobs
Regularization

• Regularization helps with the effects of out-of-control parameters by


using different methods to minimize parameter size over time.
• In mathematical notation, we see regularization represented by the
coefficient lambda, controlling the trade-off between finding a good fit
and keeping the value of certain feature weights low as the exponents on
features increase.
• Regularization coefficients L1 and L2 help fight overfitting by making
certain weights smaller. Smaller-valued weights lead to simpler
hypotheses, which are the most generalizable. Unregularized weights
with several higher-order polynomials in the feature sets tend to overfit
the training set.
• As the input training set size grows, the effect of regularization decreases,
and the parameters tend to increase in magnitude. This is appropriate
because an excess of features relative to training set examples leads to
overfitting in the first place. Bigger data is the ultimate regularizer.
Regularization

• Regularization helps with the effects of out-of-control


parameters by using different methods to minimize parameter
size over time.
• In mathematical notation, we see regularization represented
by the coefficient lambda, controlling the trade-off between
finding a good fit and keeping the value of certain feature
weights low as the exponents on features increase.
• Regularization coefficients L1 and L2 help fight overfitting by
making certain weights smaller. Smaller-valued weights lead to
simpler hypotheses, which are the most generalizable.
Unregularized weights with several higher-order polynomials in
the feature sets tend to overfit the training set.
Regularization
Regularized autoencoders
• Autoencoders are neural networks that aim to learn a lower-dimensional representation of the input
data. Regularization techniques are often used to prevent overfitting, which occurs when the model
learns the noise in the data instead of the underlying pattern. Regularization aims to impose
constraints on the model parameters to reduce the model’s flexibility and improve generalization
performance.
• In the context of autoencoders, regularization techniques can be used to prevent overfitting and
improve the quality of the learned representation. There are various types of regularization
techniques that can be applied to autoencoders, including:
• L1 and L2 regularization: These techniques add a penalty term to the loss function that encourages
the model to have smaller weights. This can help to prevent overfitting and improve the sparsity of
the learned representation.
• Dropout: Dropout is a technique that randomly drops out neurons during training to prevent
overfitting. This can be applied to the input layer, the hidden layers, or both.
• Batch normalization: Batch normalization is a technique that normalizes the input to each layer to
have zero mean and unit variance. This can help to improve the stability of the training process and
prevent overfitting.
• Data augmentation: Data augmentation involves applying transformations to the input data, such as
cropping or flipping, to increase the size of the training set and improve generalization performance.
• Overall, regularization techniques can be an effective way to improve the performance of
autoencoders and prevent overfitting. The choice of regularization technique will depend on the
specific problem and data set being considered.
• In the above diagram, the first row contains original images. We can see in the
second row that random noise is added to the original images; this noise is called
Gaussian noise. The input of the autoencoder will not get the original images, but
autoencoders are trained in such a way that they will remove noise and generate
the original images.
[Link]

• Regularized autoencoders
• There are other ways to constrain the reconstruction of an autoencoder than to impose a hidden
layer of smaller dimensions than the input. The regularized autoencoders use a loss function that
helps the model to have other properties besides copying input to the output. We can generally
find two types of regularized autoencoder: the denoising autoencoder and the sparse autoencoder.
• Denoising autoencoder
• We can modify the autoencoder to learn useful features is by changing the inputs; we can add
random noise to the input and recover it to the original form by removing noise from the input
data. This prevents the autoencoder from copying the data from input to output because it
contains random noise. We ask it to subtract the noise and produce meaningful underlying data.
This is called a denoising autoencoder.
• Sparse autoencoders
• Another way of regularizing the autoencoder is by using a sparsity constraint. In this way of
regularization, only fraction nodes are allowed to do forward and backward propagation. These
nodes have non-zero values and are called active nodes.
• To do so, we add a penalty term to the loss function, which helps to activate the fraction of nodes.
This forces the autoencoder to represent each input as a combination of a small number of nodes
and demands it to discover interesting structures in the data. This method is efficient even if the
code size is large because only a small subset of the nodes will be active.
Denoising autoencoders
• Denoising autoencoders are a type of autoencoder that is designed to remove noise from
data. They work by learning a compressed representation of the noisy input data and then
using the decoder part of the network to generate a denoised output.
• The denoising autoencoder is trained on pairs of noisy input data and clean output data.
During training, the autoencoder learns to map the noisy input data to the clean output
data, while also learning to ignore the noise in the input.
• The main advantage of denoising autoencoders is that they can remove different types of
noise from data, including Gaussian noise, salt-and-pepper noise, and random dropout
noise. They can also be used for various types of data, such as images, audio, and text.
• One common approach to using denoising autoencoders is to add noise to the input data
during training, and then use the output of the autoencoder as the denoised output. The
noise can be added in various ways, such as randomly setting some pixels to zero or adding
Gaussian noise with a specific variance.
• Denoising autoencoders have many practical applications, such as in medical imaging, where
images are often degraded by noise, or in natural language processing, where text data can
be noisy due to spelling errors or typos.
Limitations of Autoencoders

• Autoencoders are useful but also have some limitations:


• Memorizing Instead of Learning Patterns: It can sometimes memorize
the training data rather than learning meaningful patterns which reduces
their ability to generalize to new data.
• Reconstructed Data Might Not Be Perfect: Output may be blurry or
distorted with noisy inputs or if the model architecture lacks sufficient
complexity to capture all details.
• Requires a Large Dataset and Good Parameter Tuning: It require large
amounts of data and careful parameter tuning (latent dimension size,
learning rate, etc) to perform well. Insufficient data or poor tuning can
result in weak feature representations.
• Mastering autoencoders is important for applications in image processing,
anomaly detection and feature extraction where efficient data
representation is important.
Similarities of autoencoders to multilayer perceptron

• Autoencoders are identical to multilayer


perceptron neural networks because, like
multilayer perceptrons, autoencoders have an
input layer, some hidden layers, and an output
layer. The key difference between a multilayer
perceptron network and an autoencoder is
that the output layer of an autoencoder has
the same number of neurons as that of the
input layer.

You might also like