Understanding Autoencoders in Neural Networks
Understanding Autoencoders in Neural Networks
AUTOENCODERS :
UNSUPERVISED
LEARNING
• Autoencoders are a variant of feed-forward neural networks that have an extra bias for
calculating the error of reconstructing the original input. After training, autoencoders
are then used as a normal feed-forward neural network for activations. This is an
unsupervised form of feature extraction because the neural network uses only the
original input for learning weights rather than backpropagation, which has labels.
• Deep networks can use either RBMs or autoencoders as building blocks for larger
networks (a single network rarely uses both). The aim of an autoencoder is to learn a
lower-dimensional representation (encoding) for a higher-dimensional data, typically
for dimensionality reduction, by training the network to capture the most important
parts of the input image.
• Autoencoder is a type of neural network where the output layer has the same
dimensionality as the input layer. In simpler words, the number of output units in the
output layer is equal to the number of input units in the input layer. An autoencoder
replicates the data from the input to the output in an unsupervised manner and is
therefore sometimes referred to as a replicator neural network.
• The autoencoders reconstruct each dimension of the input by passing it through
the network. It may seem trivial to use a neural network for the purpose of
replicating the input, but during the replication process, the size of the input is
reduced into its smaller representation. The middle layers of the neural network
have a fewer number of units as compared to that of input or output layers.
Therefore, the middle layers hold the reduced representation of the input. The
output is reconstructed from this reduced representation of the input.
• Autoencoders have various applications, such as in data compression, image
denoising, and anomaly detection. They can also be used for generating new data
by sampling from the latent space.
• The main difference between Autoencoders and Principle Component Analysis
(PCA) is that while PCA finds the directions along which you can project the data
with maximum variance, Autoencoders reconstruct our original input given just a
compressed version of it.
• If anyone needs the original data can reconstruct it from the compressed data
using an autoencoder.
Architecture of autoencoders
• Decoder: Decoder is also a feedforward network like the encoder and has a
similar structure to the encoder. This network is responsible for reconstructing
the input back to the original dimensions from the code. A module that helps the
network “decompress” the knowledge representations and reconstructs the data
back from its encoded form. The output is then compared with a ground truth.
1. Encoder
• It is the smallest layer of the network which represents the most compressed
version of the input data. It serves as the information bottleneck which force the
network to prioritize the most significant features. This compact representation
helps the model learn the underlying structure and key patterns of the input helps
in enabling better generalization and efficient data encoding.
• Since the bottleneck is designed in such a way that the maximum information
possessed by an image is captured in it, we can say that the bottleneck helps us
form a knowledge-representation of the input.
• A bottleneck as a compressed representation of the input further prevents the
neural network from memorising the input and overfitting on the data.
• As a rule of thumb, remember this: The smaller the bottleneck, the lower the risk of
overfitting.
• However—
• Very small bottlenecks would restrict the amount of information storable, which
increases the chances of important information slipping out through the pooling
layers of the encoder.
3. Decoder
• Hidden Layers: These layers progressively expand the latent vector back
into a higher-dimensional space. Through successive transformations
decoder attempts to restore the original data shape and details
• Output Layer: The final layer produces the reconstructed output which
aims to closely resemble the original input. The quality of reconstruction
depends on how well the encoder-decoder pair can minimize the
difference between the input and output during training.
First, the input goes through the encoder where it is compressed and stored in the
layer called Code, then the decoder decompresses the original input from the code.
The main objective of the autoencoder is to get an output identical to the input.
Note that the decoder architecture is the mirror image of the encoder.
This is not a requirement but it’s typically the case. The only requirement is the
dimensionality of the input and output must be the same.
Loss Function in Autoencoder Training
• During training an autoencoder’s goal is to minimize the reconstruction loss which measures how different the
reconstructed output is from the original input. The choice of loss function depends on the type of data being
processed:
• Mean Squared Error (MSE): This is commonly used for continuous data. It measures the average squared
differences between the input and the reconstructed data.
• Binary Cross-Entropy: Used for binary data (0 or 1 values). It calculates the difference in probability between the
original and reconstructed output.
• During training the network updates its weights using backpropagation to minimize this reconstruction loss. By
doing this it learns to extract and retain the most important features of the input data which are encoded in the
latent space.
• Efficient Representations in Autoencoders
• Constraining an autoencoder helps it learn meaningful and compact features from the input data which leads to
more efficient representations. After training only the encoder part is used to encode similar data for future
tasks. Various techniques are used to achieve this are as follows:
• Keep Small Hidden Layers: Limiting the size of each hidden layer forces the network to focus on the most
important features. Smaller layers reduce redundancy and allows efficient encoding.
• Regularization: Techniques like L1 or L2 regularization add penalty terms to the loss function. This prevents
overfitting by removing excessively large weights which helps in ensuring the model to learns general and useful
representations.
• Denoising: In denoising autoencoders random noise is added to the input during training. It learns to remove
this noise during reconstruction which helps it focus on core, noise-free features and helps in improving
robustness.
• Tuning the Activation Functions: Adjusting activation functions can promote sparsity by activating only a few
neurons at a time. This sparsity reduces model complexity and forces the network to capture only the most
relevant features.
How to train autoencoders?
• You need to set 4 hyperparameters before training an autoencoder:
• Code size: The code size or the size of the bottleneck is the most important
hyperparameter used to tune the autoencoder. The bottleneck size decides how
much the data has to be compressed. This can also act as a regularisation term.
• Number of nodes per layer: The number of nodes per layer defines the weights we
use per layer. Typically, the number of nodes decreases with each subsequent layer in
the autoencoder as the input to each of these layers becomes smaller across the
layers.
• Reconstruction Loss: The loss function we use to train the autoencoder is highly
dependent on the type of input and output we want the autoencoder to adapt to. If
we are working with image data, the most popular loss functions for reconstruction
are MSE Loss and L1 Loss. In case the inputs and outputs are within the range [0,1],
Types of Autoencoders
Undercomplete Autoencoder
• The objective of undercomplete autoencoder is to capture
the most important features present in the data.
Undercomplete autoencoders have a smaller dimension for
hidden layer compared to the input layer. This helps to
obtain important features from the data. It minimizes the
loss function by penalizing the g(f(x)) for being different
from the input x.
• Advantages-
• Undercomplete autoencoders do not need any
regularization as they maximize the probability of data
rather than copying the input to the output.
• Drawbacks-
• Using an overparameterized model due to lack of sufficient
training data can create overfitting.
Vanilla Autoencoder
• The most basic type of autoencoder, which consists of a decoder network that
reconstructs the input from the compressed representation after the input has
been compressed by an encoder network.
• Purpose: The Vanilla Autoencoder is a straightforward neural network design
with the aim of learning to compress input data into a low-
dimensional representation and then recover the original input from this
representation. This particular autoencoder's objective is to preserve the most
crucial aspects of the input data while minimizing the amount of data that is lost
during compression.
• Architecture: Encoder and decoder networks make up the two portions of the
Vanilla Autoencoder's architecture. The decoder network converts the lower-
dimensional representation back to the original input data once the encoder
network has converted the input data to it.
• Working: The Vanilla Autoencoder reduces the reconstruction error between the
input's original value and the decoder network's output. Usually, a loss function
like mean squared error (MSE) or binary cross-entropy (BCE) is used for this.
• Application: Image and video compression, anomaly detection, and
dimensionality reduction are just a few of the many uses for vanilla
autoencoders.
Applications of Vanilla Autoencoders
• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)
• decoded_imgs = [Link](x_test_flat)
• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• [Link]()
Denoising Autoencoder
• Using corrupted data as training input and clean data as the objective, a
denoising autoencoder can be trained to remove noise from input data.
• Purpose: The Denoising Autoencoder is a subclass of autoencoder that
learns to filter out noise from input data. A robust representation of the
input data that is less sensitive to noise is what this kind of autoencoder
aims to learn.
• Architecture: The Denoising Autoencoder shares the same design as the
Vanilla Autoencoder, with the exception that it trains on corrupted input
data rather than the original data. Training involves feeding intentionally
corrupted inputs and minimizing the reconstruction error against the clean
version.
• This approach forces the model to capture robust features that are
invariant to noise.
• Working: The Denoising Autoencoder minimizes the reconstruction error
between the original input data and the output of the decoder network,
given the input of corrupted input data.
Applications of Denoising Autoencoders
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist
• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)
• An autoencoder that learns a sparse representation of the input data by punishing the
activation of hidden units is known as a sparse autoencoder.
• Purpose: The Sparse Autoencoder is a form of autoencoder that enforces sparsity in
the learned representation while learning a compressed version of the input data. This
kind of autoencoder learns a condensed representation of the input data
that captures its key characteristics.
• Inventor: A. Ng et al. made the initial suggestion for the sparse autoencoder in 2011.
• Architecture: The design of the Sparse Autoencoder is identical to that of the Vanilla
Autoencoder, but it includes an additional regularization term that promotes sparse
activations of the hidden units.
• Working: The Sparse Autoencoder minimizes the reconstruction error between the
input data used initially and the output produced by the decoder network while
simultaneously punishing the activation of concealed units that do not aid in the
reconstruction.
• Applications: Sparse Autoencoders have been applied to features such as feature
selection, anomaly detection, and image denoising.
Sparse Autoencoder
• Sparse autoencoders have hidden nodes greater than input nodes. They can still discover
important features from the data. A generic sparse autoencoder is visualized where the
obscurity of a node corresponds with the level of activation. Sparsity constraint is
introduced on the hidden layer. This is to prevent output layer copy input data. Sparsity
may be obtained by additional terms in the loss function during the training process, either
by comparing the probability distribution of the hidden unit activations with some low
desired value,or by manually zeroing all but the strongest hidden unit activations. Some of
the most powerful AIs in the 2010s involved sparse autoencoders stacked inside of deep
neural networks.
• Advantages-
• Sparse autoencoders have a sparsity penalty, a value close to zero but not exactly zero.
Sparsity penalty is applied on the hidden layer in addition to the reconstruction error. This
prevents overfitting.
• They take the highest activation values in the hidden layer and zero out the rest of the
hidden nodes. This prevents autoencoders to use all of the hidden nodes at a time and
forcing only a reduced number of hidden nodes to be used.
• Drawbacks-
• For it to be working, it's essential that the individual nodes of a trained model which
activate are data dependent, and that different inputs will result in activations of different
Sparse Autoencoder
• Sparse Autoencoder add sparsity constraints that encourage only a small subset of
neurons in the hidden layer to activate at once helps in creating a more efficient and
focused representation.
• Unlike vanilla models, they include regularization methods like L1 penalty and dropout to
enforce sparsity.
• KL Divergence is used to maintain the sparsity level by matching the latent distribution to
a predefined sparse target.
• This selective activation helps in feature selection and learning meaningful patterns
while ignoring irrelevant noise.
• Applications of Sparse Autoencoders
• Feature Selection: Highlights the most relevant features by encouraging sparse
activation helps in improving interpretability.
• Dimensionality Reduction: Creates efficient, low-dimensional representations by limiting
active neurons.
• Noise Reduction: Reduces irrelevant information and noise by activating only key
neurons helps in improving model generalization.
• Now lets see the practical implementation.
• encoded = [Link](encoding_dim, activation='relu',
activity_regularizer=[Link].l1(1e-5))(input_img): Creates the encoded
layer with ReLU activation and adds L1 regularization to encourage sparsity.
• import numpy as np
• import [Link] as plt
• import tensorflow as tf
• from [Link] import fashion_mnist
• (x_train, _), (x_test, _) = fashion_mnist.load_data()
• x_train = x_train.astype('float32') / 255.
• x_test = x_test.astype('float32') / 255.
• x_train_flat = x_train.reshape(len(x_train), 784)
• x_test_flat = x_test.reshape(len(x_test), 784)
• n = 10
• encoding_dim = 32
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu',
• activity_regularizer=[Link].l1(1e-5))(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)
• n = 10
• encoding_dim = 16
• input_img = [Link](shape=(784,))
• encoded = [Link](encoding_dim, activation='relu')(input_img)
• decoded = [Link](784, activation='sigmoid')(encoded)
• undercomplete_autoencoder.fit(x_train_flat, x_train_flat,
• epochs=50,
• batch_size=256,
• shuffle=True,
• validation_data=(x_test_flat, x_test_flat))
• decoded_imgs = undercomplete_autoencoder.predict(x_test_flat)
• [Link](figsize=(20, 4))
• for i in range(n):
• ax = [Link](2, n, i + 1)
• [Link](x_test_flat[i].reshape(28, 28), cmap='gray')
• [Link]('off')
• ax = [Link](2, n, i + 1 + n)
• [Link](decoded_imgs[i].reshape(28, 28), cmap='gray')
• [Link]('off')
Overcomplete Autoencoders
• Overcomplete Autoencoders are a type of autoencoders that use more hidden units than the number of input
units. This means that the encoder and decoder layers have more units than the input layer. The idea behind
using more hidden units is to learn a more complex, non-linear function that can better capture the structure of
the data.
• Advantages of using Overcomplete Autoencoders include their ability to learn more complex representations of
the data, which can be useful for tasks such as feature extraction and denoising. Overcomplete Autoencoders are
also more robust to noise and can handle missing data better than traditional autoencoders
• However, there are also some disadvantages to using Overcomplete Autoencoders. One of the main
disadvantages is that they can be more difficult to train than traditional autoencoders. This is because the extra
hidden units can cause overfitting, which can lead to poor performance on new data.
• Implementing Overcomplete Autoencoders with PyTorch
• To implement Overcomplete Autoencoders with PyTorch, we need to follow several steps:
• Dataset preparation: In order to train the model, we must first prepare the dataset. The loading of the data, it is
preliminary processing, and its division into training and test sets are examples of this.
• constructing the architectural model Using PyTorch, we must define the Overcomplete Autoencoder's
architecture. This entails specifying the loss function and the encoder and decoder layers that will be used to
train the model.
• Model training: Using the provided dataset, we must train the Overcomplete Autoencoder. The optimization
process must be specified, the hyperparameters must be configured, and the model weights must be updated
after iterating through the training set of data.
• Evaluation of the model's performance: When the model has been trained, we must assess how well it performs
using the test data. Calculating metrics like the reconstruction error and viewing the model's output are required
for this.
Fully Connected Autoencoder
• Purpose: In order to learn unsupervised material, neural networks with fully connected autoencoders
(FCAEs) are used. They are made to train a neural network to encode input data into a lower-dimensional
space, learn a compressed representation of the data, and then decode the data back into the original
space with the least amount of information loss. Anomaly detection, data visualization, and feature
extraction are just a few downstream tasks that can be performed using compressed representation.
• Inventor: The concept of autoencoders was first proposed in the 1980s, and the design has undergone
numerous changes over time. In a 1986 publication, Rumelhart, Hinton, and Williams suggested the first
fully connected autoencoder.
• Architecture: A fully linked autoencoder is made up of a bottleneck layer in the middle, a decoder
network, and an encoder network. The input data is received by the encoder network, which then
transfers it to a compressed form in the bottleneck layer. The compressed form is then used by the
decoder network to map it back to the original data space. The input and output layers of an encoder and
decoder network are normally symmetrical, and the bottleneck layer has fewer neurons than the other
layers.
• Working: To reduce the reconstruction error between the input data and the decoder network's output,
the fully connected autoencoder is trained.
• For binary input data, this is often accomplished by reducing the binary cross-entropy loss or the mean
squared error (MSE) loss between the input and output.
• Backpropagation and gradient descent is used to update the encoder and decoder networks' weights
during training. In order to minimize the loss, the weights are updated and the gradients are computed
with respect to the loss function.
• Applications: Following training, the bottleneck layer's compressed representation can be applied to a
variety of downstream applications. Fully lined autoencoders have been employed in a wide range of
applications, including dimensionality reduction, anomaly detection, data denoising, and picture
compression. They have also been employed in machine translation and language synthesis jobs
Regularization
• Regularized autoencoders
• There are other ways to constrain the reconstruction of an autoencoder than to impose a hidden
layer of smaller dimensions than the input. The regularized autoencoders use a loss function that
helps the model to have other properties besides copying input to the output. We can generally
find two types of regularized autoencoder: the denoising autoencoder and the sparse autoencoder.
• Denoising autoencoder
• We can modify the autoencoder to learn useful features is by changing the inputs; we can add
random noise to the input and recover it to the original form by removing noise from the input
data. This prevents the autoencoder from copying the data from input to output because it
contains random noise. We ask it to subtract the noise and produce meaningful underlying data.
This is called a denoising autoencoder.
• Sparse autoencoders
• Another way of regularizing the autoencoder is by using a sparsity constraint. In this way of
regularization, only fraction nodes are allowed to do forward and backward propagation. These
nodes have non-zero values and are called active nodes.
• To do so, we add a penalty term to the loss function, which helps to activate the fraction of nodes.
This forces the autoencoder to represent each input as a combination of a small number of nodes
and demands it to discover interesting structures in the data. This method is efficient even if the
code size is large because only a small subset of the nodes will be active.
Denoising autoencoders
• Denoising autoencoders are a type of autoencoder that is designed to remove noise from
data. They work by learning a compressed representation of the noisy input data and then
using the decoder part of the network to generate a denoised output.
• The denoising autoencoder is trained on pairs of noisy input data and clean output data.
During training, the autoencoder learns to map the noisy input data to the clean output
data, while also learning to ignore the noise in the input.
• The main advantage of denoising autoencoders is that they can remove different types of
noise from data, including Gaussian noise, salt-and-pepper noise, and random dropout
noise. They can also be used for various types of data, such as images, audio, and text.
• One common approach to using denoising autoencoders is to add noise to the input data
during training, and then use the output of the autoencoder as the denoised output. The
noise can be added in various ways, such as randomly setting some pixels to zero or adding
Gaussian noise with a specific variance.
• Denoising autoencoders have many practical applications, such as in medical imaging, where
images are often degraded by noise, or in natural language processing, where text data can
be noisy due to spelling errors or typos.
Limitations of Autoencoders