0% found this document useful (0 votes)
61 views102 pages

Artificial Neural Networks Overview

The document outlines the syllabus and key concepts related to Artificial Neural Networks (ANNs) in a Machine Learning course, including the perceptron algorithm, multilayer perceptrons, and various types of neural networks. It explains the structure and function of neurons, activation functions, and the importance of non-linearity in neural networks. Additionally, it covers training methods such as backpropagation and the application of ANNs in classification and regression tasks.

Uploaded by

Arman Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views102 pages

Artificial Neural Networks Overview

The document outlines the syllabus and key concepts related to Artificial Neural Networks (ANNs) in a Machine Learning course, including the perceptron algorithm, multilayer perceptrons, and various types of neural networks. It explains the structure and function of neurons, activation functions, and the importance of non-linearity in neural networks. Additionally, it covers training methods such as backpropagation and the application of ANNs in classification and regression tasks.

Uploaded by

Arman Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CSF 344

Machine Learning

Faculty & Course coordinator :


Dr. Anushikha Singh
1
06/01/2025 2
SYLLABUS : Unit 5

 Artificial Neural Networks: The perceptron algorithm,


multilayer perceptron, back propagation,
 Introduction to Deep Neural networks,
 Recurrent Neural Networks
 Convolutional Neural Networks.

06/01/2025 3
Artificial Neural
Networks
• Neural network derives its name from human nervous system which
consist of massive large parallel interconnection of a large number of
neurons that achieve different tasks in small amount of time.
• The Neural network is part of the nervous system, which contains a large
number of nerve cells called neurons, which have complex
interconnections, like a network.
• Some of the earliest AI work aimed to create artificial neural networks by
building mathematical model of brain activity. Generally the term neural
network means artificial neural network.
• A neural network is just a collection of units connected together; the
properties of the network are determined by its topology and the
properties of the “neurons”.
• A simple mathematical model of the neuron devised by -McCulloch and
Pitts (1973). It “fires” when a linear combination of its inputs exceeds
some (hard or soft) threshold.
Artificial Neural
Networks
• Neurone: primary function unit of nerve cell
• A neuron only fires if its input signal exceeds a certain amount (the
threshold) in a short time period.
• A neuron has a cell body, a branching input structure (the dendrite) and a
branching output structure (the axon). Axons connect to dendrites via
synapses.
• Synapses vary in strength
– Good connections allowing a large signal
– Slight connections allow only a weak signal.
– Synapses can be either excitatory or inhibitory.
– (Note*: An excitatory transmitter generates a signal
called an action potential in the receiving neuron.
An inhibitory transmitter prevents it)
Artificial Neural
Networks
A neural network consists of a set of nodes
(neurons/units) connected by links. Each link has a
numeric weight. Each unit has:
• A set of input links from other units.
• A set of output links to other units.
• A current activation level.
• An activation function to compute the activation
level in the next time step.
Artificial Neural
Networks
Q2. Compute the net output given inputs 0.3,0.5,0.6 and weighs
0.2,0.1,-0.3

= .06+.05-.18
= -0.07
Artificial Neural
Networks
Q3. Obtain the output of the neuron Y for the network shown in the figure using activation
functions as Binary Sigmoidal

Q4 Calculate the output after using binary


sigmoid function given inputs 0.7, 0.2, 0.1
and weights 0.1, 0.3, -0.2 and the bias is
0.85.
Ans:
0.07+0.06-0.02+0.85 = .314
Activation function
• An activation function in the context of neural networks is a
mathematical function applied to the output of a neuron.
The purpose of an activation function is to introduce non-
linearity into the model, allowing the network to learn and
represent complex patterns in the data. Without non-
linearity, a neural network would essentially behave like a
linear regression model, regardless of the number of layers
it has.
• The activation function decides whether a neuron should be
activated or not by calculating the weighted sum and
further adding bias to it.

06/01/2025 10
Elements of a Neural
Network
• Input Layer: This layer accepts input features. It provides
information from the outside world to the network, no
computation is performed at this layer, nodes here just pass
on the information(features) to the hidden layer.
• Hidden Layer: Nodes of this layer are not exposed to the
outer world, they are part of the abstraction provided by
any neural network. The hidden layer performs all sorts of
computation on the features entered through the input
layer and transfers the result to the output layer.
• Output Layer: This layer bring up the information learned by
the network to the outer world.

06/01/2025 11
Why do we need Non-
linear activation
function?
• A neural network without an activation function is
essentially just a linear regression model. The
activation function does the non-linear
transformation to the input making it capable to
learn and perform more complex tasks.

06/01/2025 12
06/01/2025 13
Variants of Activation Function

06/01/2025 14
06/01/2025 15
Tanh Function

06/01/2025 16
Softmax Function

The softmax function is also a type of sigmoid function but is handy when we
are trying to handle multi- class classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems. The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
Output:- The softmax function is ideally used in the output layer of the classifier
where we are actually trying to attain the probabilities to define the class of
each input.

06/01/2025 17
06/01/2025 18
• The basic rule of thumb is if you really don’t know
what activation function to use, then simply
use RELU as it is a general activation function in
hidden layers and is used in most cases these days.
• If your output is for binary classification
then, sigmoid function is very natural choice for
output layer.
• If your output is for multi-class classification then,
Softmax is very useful to predict the probabilities of
each classes.

06/01/2025 19
06/01/2025 20
Question: If a neural network has an input layer with 3 neurons, one hidden layer
with 4 neurons, and an output layer with 2 neurons, calculate the following:
• a) Number of parameters (weights and biases) if using sigmoid activation in the
hidden layer.
• b) Number of parameters if using ReLU activation in the hidden layer (assuming
biases are included in both cases).

06/01/2025 21
Types of Artificial
Neural Networks
1. Feed Forward Artificial
Neural Network – In this the
information flow is
unidirectional. A unit sends
information to other unit
from which it does not
receive any information.
There are no feedback
loops. They are used in
pattern generation,
recognition and
classification. They have
fixed inputs and outputs.
Types of Artificial
Neural Networks
2. Feed Back
Artificial Neural
Network – In this the
information flow is
bidirectional.
Feedback loops are
allowed.
The perceptron algorithm
• A single-layer feedforward neural network was introduced
in the late 1950s by Frank Rosenblatt.
• Perceptron is one of the first and most straightforward
models of artificial neural networks. Despite being a
straightforward model, the perceptron has been proven to
be successful in solving specific categorization issues.
• consisting of a single layer of input nodes that are fully
connected to a layer of output nodes. It can learn the
linearly separable patterns. it uses slightly different types of
artificial neurons known as threshold logic units (TLU). it
was first introduced by McCulloch and Walter Pitts in the
1940s.
06/01/2025 24
Types of Perceptron

• Single-Layer Perceptron: This type of perceptron is


limited to learning linearly separable patterns.
effective for tasks where the data can be divided
into distinct categories through a straight line.
• Multilayer Perceptron: Multilayer perceptrons
possess enhanced processing capabilities as they
consist of two or more layers, adept at handling
more complex patterns and relationships within the
data.

06/01/2025 25
Basic Components of
Perceptron
A perceptron, the basic unit of a neural network, comprises
essential components that collaborate in information processing.
• Input Features: The perceptron takes multiple input features, each
input feature represents a characteristic or attribute of the input
data.
• Weights: Each input feature is associated with a weight,
determining the significance of each input feature in influencing
the perceptron’s output. During training, these weights are
adjusted to learn the optimal values.
• Summation Function: The perceptron calculates the weighted
sum of its inputs using the summation function. The summation
function combines the inputs with their respective weights to
produce a weighted sum.
06/01/2025 26
• Activation Function: The weighted sum is then passed through
an activation function. Perceptron uses Heaviside step function
functions. which take the summed values as input and
compare with the threshold and provide the output as 0 or 1.
• Output: The final output of the perceptron, is determined by the
activation function’s result. For example, in binary classification
problems, the output might represent a predicted class (0 or 1).
• Bias: A bias term is often included in the perceptron model. The
bias allows the model to make adjustments that are
independent of the input. It is an additional parameter that is
learned during training.
• Learning Algorithm (Weight Update Rule): During training, the
perceptron learns by adjusting its weights and bias based on a
learning algorithm. A common approach is the perceptron
learning algorithm, which updates weights based on the
difference between the predicted output and the true output.

06/01/2025 27
How does Perceptron
work?
• A weight is assigned to each input node of a
perceptron, indicating the significance of that input
to the output. The perceptron’s output is a
weighted sum of the inputs that have been run
through an activation function to decide whether
or not the perceptron will fire. it computes the
weighted sum of its inputs as:
z = w1x1 + w1x2 + ... + wnxn = XTW

06/01/2025 28
• The step function compares this weighted sum to
the threshold, which outputs 1 if the input is larger
than a threshold value and 0 otherwise, is the
activation function that perceptrons utilize the
most frequently. The most common step function
used in perceptron is the Heaviside step function:

06/01/2025 29
06/01/2025 30
• When all the neurons in a layer are connected to
every neuron of the previous layer, it is known as a
fully connected layer or dense layer.
• The output of the fully connected layer can be:

• where X is the input W is the weight for each inputs


neurons and b is the bias and h is the step function.

06/01/2025 31
• During training, The perceptron’s weights are
adjusted to minimize the difference between the
predicted output and the actual output. Usually,
supervised learning algorithms like the delta rule or
the perceptron learning rule are used for this.

• Here wi,j is the weight between the ith input and jth
output neuron, xi is the ith input value. yi and yj is
(and the jth) actual and predicted value . n is the
learning rate.

06/01/2025 32
Algorithm
Build the single Layer Perceptron Model
• Initialize the weight and learning rate, Here we are
considering the weight values number of input + 1. i.e +1 for
bias.
• Define the first linear layer
• Define the activation function. Here we are using the
Heaviside Step function.
• Define the Prediction
• Define the loss function.
• Define training, in which weight and bias are updated
accordingly.
• fitting the model.
06/01/2025 33
Multilayer Perceptron
• Multi-layer perception is also known as MLP. It is
fully connected dense layers, which transform any
input dimension to the desired dimension. A multi-
layer perception is a neural network that has
multiple layers. To create a neural network we
combine neurons together so that the outputs of
some neurons are inputs of other neurons.

06/01/2025 34
• A multi-layer perceptron has one input layer and for
each input, there is one neuron(or node), it has one
output layer with a single node for each output and
it can have any number of hidden layers and each
hidden layer can have any number of nodes. A
schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below.

06/01/2025 35
• In the multi-layer perceptron diagram, we can see
that there are three inputs and thus three input
nodes and the hidden layer has three nodes. The
output layer gives two outputs, therefore there are
two output nodes. The nodes in the input layer take
input and forward it for further process, in the
diagram above the nodes in the input layer
forwards their output to each of the three nodes in
the hidden layer, and in the same way, the hidden
layer processes the information and passes it to the
output layer.

06/01/2025 36
• Every node in the multi-layer perception uses a
sigmoid activation function. The sigmoid activation
function takes real values as input and converts
them to numbers between 0 and 1 using the
sigmoid formula.

06/01/2025 37
Structure of MLP:

• Input Layer: This layer receives the input features.


• Hidden Layer(s): One or more layers that process
inputs with learned weights and biases, followed by
activation functions. Each hidden layer is fully
connected to the previous layer.
• Output Layer: Provides the output predictions. The
number of neurons in this layer depends on the
task (e.g., one neuron for binary classification,
multiple neurons for multi-class classification).

06/01/2025 38
Key Components:

• Weights: Parameters associated with connections


between neurons, adjusted during training.
• Biases: Additional parameters added to the
weighted sum of inputs before passing through the
activation function.
• Activation Function: Introduces non-linearity into
the model, enabling it to capture complex patterns.
Common activation functions include:
• ReLU (Rectified Linear Unit)
• Sigmoid (for binary classification)
• Softmax (for multi-class classification)
06/01/2025 39
• Forward Pass: The data is fed forward through the
network from the input layer, through the hidden
layers, to the output layer. Each layer applies a
weighted sum, adds biases, and then passes the
result through an activation function.

• Backpropagation: MLP uses backpropagation for


training. The difference between the predicted and
actual values (error) is propagated backward
through the network, adjusting weights and biases
using gradient descent.

06/01/2025 40
Use cases
• Classification: MLP can be used for tasks like image
recognition, speech recognition, and spam
detection.

• Regression: It can predict continuous values by


adjusting the output layer and loss function.

06/01/2025 41
Back Propagation
• Backpropagation is a key algorithm used to train
multilayer neural networks, such as a Multilayer
Perceptron (MLP). It calculates the gradient of the
loss function with respect to each weight in the
network by using the chain rule of calculus. This
gradient is then used to update the weights via
gradient descent to minimize the error. The process
of back propagation involves two main phases:
forward pass and backward pass.

06/01/2025 42
06/01/2025 43
Steps in Backpropagation:

06/01/2025 44
06/01/2025 45
06/01/2025 46
06/01/2025 47
06/01/2025 48
Example: Back Propagation

06/01/2025 49
06/01/2025 50
06/01/2025 51
06/01/2025 52
06/01/2025 53
06/01/2025 54
06/01/2025 55
06/01/2025 56
06/01/2025 57
06/01/2025 58
06/01/2025 59
Numerical 2

06/01/2025 60
06/01/2025 61
06/01/2025 62
Introduction to Deep
Neural networks
• A Deep Neural Network (DNN) is an artificial neural
network (ANN) with multiple layers between the input and
output layers. While a simple neural network may consist of
just an input, hidden, and output layer, a DNN contains
multiple hidden layers, allowing the network to learn
increasingly complex patterns from the data.
• DNNs have gained significant attention and success in
various fields due to their ability to model complex, non-
linear relationships in data. These networks are the
backbone of deep learning and are widely used in tasks
such as image recognition, natural language processing,
speech recognition, and many more.

06/01/2025 63
Key Concepts of Deep
Neural Networks

06/01/2025 64
06/01/2025 65
06/01/2025 66
06/01/2025 67
06/01/2025 68
Why Use Deep Neural
Networks?
• Feature Learning: DNNs can automatically extract
features from raw data without the need for hand-
crafted features.
• Non-Linearity: Through multiple layers and
activation functions, DNNs can model highly non-
linear relationships.
• Complex Problem Solving: DNNs are capable of
solving complex tasks like image recognition,
speech processing, and language translation, tasks
where shallow models (i.e., single-layer neural
networks) fail.
06/01/2025 69
Types of Deep Neural
Networks
1. Feedforward Neural Networks (FNNs):
• Data moves in one direction (forward) through the
network from input to output. There are no loops or
cycles in the architecture.
• Example: MLP (Multilayer Perceptron).
2. Convolutional Neural Networks (CNNs):
• Used primarily for tasks like image recognition and
computer vision. CNNs apply convolutional layers that
preserve spatial relationships in data, such as pixels in an
image.
• Example: Image classification tasks (e.g., recognizing
handwritten digits).
06/01/2025 70
3. Recurrent Neural Networks (RNNs):
• Designed for sequential data, such as time series or natural
language. RNNs maintain a "memory" of previous inputs by
looping information back into the network.
• Example: Language modeling and text generation.

5. Generative Adversarial Networks (GANs):


• GANs consist of two networks, a generator and a
discriminator, that work against each other. GANs are used
to generate new data samples, such as creating realistic
images.

06/01/2025 71
Applications of Deep
Neural Networks
• Computer Vision: Object detection, image classification,
facial recognition.
• Natural Language Processing (NLP): Text classification,
sentiment analysis, language translation, chatbots.
• Speech Recognition: Voice-to-text applications, virtual
assistants.
• Healthcare: Disease prediction, medical image analysis,
personalized medicine.
• Autonomous Vehicles: DNNs play a crucial role in
perception systems for self-driving cars.

06/01/2025 72
Recurrent Neural Networks

• Recurrent Neural Networks (RNNs) are a class of


neural networks designed to handle sequential
data, such as time series, speech, or natural
language. Unlike traditional feedforward neural
networks, which assume inputs and outputs are
independent of each other, RNNs can maintain a
memory of previous inputs by introducing loops in
their architecture. This memory allows RNNs to
capture patterns in data that unfold over time,
making them particularly useful for tasks where the
order of inputs matters.
06/01/2025 73
How RNN differs from
Feed forward Neural
Network?
• Artificial neural networks that do not have looping nodes
are called feed forward neural networks. Because all
information is only passed forward, this kind of neural
network is also referred to as a multi-layer neural network.
• Information moves from the input layer to the output layer
– if any hidden layers are present – unidirectionally in a
feedforward neural network. These networks are
appropriate for image classification tasks, for example,
where input and output are independent. Nevertheless,
their inability to retain previous inputs automatically
renders them less useful for sequential data analysis.

06/01/2025 74
Architecture of a Simple
RNN
• In a simple RNN, the data flows through a series of
steps, each one taking the output of the previous
step as part of its input. Consider the following
architecture for a basic RNN:
• Input Layer: The input at each time step is passed
into the network.
• Hidden Layer: The hidden state is updated based
on the current input and the previous hidden state.
• Output Layer: The output at each time step is
computed based on the current hidden state.
06/01/2025 75
06/01/2025 76
06/01/2025 77
2. Hidden State: The hidden state serves as the
"memory" of the network. It carries information
from previous time steps and is updated at each
step based on the current input and the previous
hidden state.

3. Backpropagation Through Time (BPTT)RNNs are


trained using a version of backpropagation called
Backpropagation Through Time (BPTT). Since RNNs
deal with sequences, the loss must be propagated
through each time step. This means that errors
from future time steps can influence the updates to
weights at earlier steps.
• BPTT unfolds the RNN over the sequence length,
and gradients are computed for each time step.
06/01/2025 78
Variants of Recurrent
Neural Networks
• Long Short-Term Memory (LSTM)
• LSTMs are a type of RNN specifically designed to capture
long-term dependencies by introducing gates that
control the flow of information through the network.
LSTMs have three main gates:
• Forget Gate: Decides how much of the previous hidden state to
forget.
• Input Gate: Decides how much of the new input to store.
• Output Gate: Determines the final output from the hidden
state.
• The architecture of LSTMs helps them avoid the
vanishing gradient problem and learn long-range
dependencies better than standard RNNs.
06/01/2025 79
• Gated Recurrent Units (GRU): GRUs are a simplified version
of LSTMs that also use gating mechanisms but have fewer
parameters, making them faster to train. GRUs have two
gates:
• Reset Gate: Controls how much of the previous hidden state to
forget.
• Update Gate: Controls how much of the hidden state to retain.
• GRUs are computationally efficient and can achieve similar
performance to LSTMs in many tasks.

• Bidirectional RNNs: In a bidirectional RNN, two RNNs are


run in parallel: one in the forward direction and one in the
backward direction. This allows the network to take into
account both past and future context when making
predictions.

06/01/2025 80
Issues of Standard
RNNs:
1. Vanishing Gradient
2. Exploding Gradient

• Vanishing Gradient:
Vanishing gradient problem is a phenomenon that occurs
during the training of deep neural networks, where the
gradients that are used to update the network become
extremely small or "vanish" as they are backpropogated
from the output layers to the earlier layers.
The vanishing gradient problem can cause: Slow convergence,
The network getting stuck in low minima, Impaired learning
of deep representations, and The training process to
completely stall

06/01/2025 81
Some techniques that can help alleviate the
vanishing gradient problem include:
1. ReLU activation function: Can be used to replace
the sigmoid activation function
2. Batch normalization: Normalizes the inputs to
each layer within a mini-batch
3. Proper weight initialization: Using Xavier or He
initialization ensures that the gradients do not
vanish or explode
4. Residual networks (ResNets): Use skip
connections to allow the gradients to bypass
some layers and directly flow to deeper layers

06/01/2025 82
Exploding Gradient
The exploding gradient problem is a common issue in deep neural networks
that occurs when the gradient increases significantly during training. This
can cause the network to become unstable, making it unable to learn from
training data.

• Causes of Exploding Gradients:The root cause of exploding gradients can


often be traced back to the network architecture and the choice of
activation functions. In deep networks, when multiple layers have weights
greater than 1, the gradients can grow exponentially as they propagate back
through the network during training. This is exacerbated when using
activation functions with outputs that are not bounded, such as the
hyperbolic tangent or the sigmoid function.
• Another contributing factor is the initialization of the network's weights. If
the initial weights are too large, even a small gradient can be amplified
through the layers, leading to very large updates during training

06/01/2025 83
Several strategies can be employed to mitigate the exploding gradient problem:
1. Gradient Clipping: This technique involves setting a threshold value, and if
the gradient exceeds this threshold, it is scaled down to keep it within a
manageable range. This prevents any single update from being too large.
2. Weight Initialization: Using a proper weight initialization strategy, such as
Xavier or He initialization, can help prevent gradients from becoming too
large at the start of training.
3. Use of Batch Normalization: Batch normalization can help maintain the
output of each layer within a certain range, reducing the risk of exploding
gradients.
4. Change of Network Architecture: Simplifying the network architecture or
using architectures that are less prone to exploding gradients, such as those
with skip connections like ResNet, can be effective.
5. Proper Activation Functions: Using activation functions that are less likely to
produce large gradients, such as the ReLU function and its variants, can help
control the gradient's magnitude.

06/01/2025 84
Convolutional Neural
Network
• Convolutional Neural Network (CNN) is the
extended version of artificial neural networks (ANN)
which is predominantly used to extract the feature
from the grid-like matrix dataset. For example
visual datasets like images or videos where data
patterns play an extensive role.
• ANN takes vectors as input so there is a need to
convert images in the vector. Spatial information is
lost in this process.
• CNN leading to sparse connections between input
and output neurons: there wont be many weights
such as ANN
06/01/2025 85
06/01/2025 86
How does a
Convolutional Neural
Network (CNN) work?
• A convolutional neural network, or ConvNet, is just a neural
network that uses convolution.
• Convolution is a mathematical operation that allows the
merging of two sets of information. In the case of CNN,
convolution is applied to the input data to filter the
information and produce a feature map.
• This filter is also called a kernel, or feature detector, and its
dimensions can be, for example, 3x3. To perform
convolution, the kernel goes over the input image, doing
matrix multiplication element after element. The result for
each receptive field (the area where convolution takes
place) is written down in the feature map.
06/01/2025 87
Convolution: Example

06/01/2025 88
CNN Architecture
• Convolutional Neural Network consists of multiple layers like
the input layer, Convolutional layer, Pooling layer, and fully
connected layers.
• The Convolutional layer applies filters to the input image to
extract features, the Pooling layer downsamples the image
to reduce computation, and the fully connected layer makes
the final prediction. The network learns the optimal filters
through backpropagation and gradient descent.

06/01/2025 89
What is stride?
• The number of pixels we slide over the input image
by the kernel is called a stride.

06/01/2025 90
What is padding?

• At corners, we can not place the kernel so it is clear


that the output of the convolution operation is
smaller than the input image.
• What if we want the output as the same size as
the input? or
• what if the filter does not fit on the input image?
• In that case, we add artificial padding of 0’s around
the image as per the required size and kernel size,
shown in the below image which is also called Zero-
padding.
06/01/2025 91
06/01/2025 92
Pooling
• A pooling layer is a component of a convolutional neural network (CNN)
that reduces the spatial dimensions of feature maps and decreases the
amount of data and parameters. It's also known as a downsampling
layer.
• 2 Types: MAX Pooling and Average Pooling

06/01/2025 93
Layers Used to Build
ConvNets
• A complete Convolution Neural Networks
architecture is also known as covnets. A covnets is a
sequence of layers, and every layer transforms one
volume to another through a differentiable
function.
Types of layers: Input layer, convolutional layer,
pooling layer, activation layer, flattening and fully
connected layer

06/01/2025 94
Datasets: Let’s take an example by running a covnets on of
image of dimension 32 x 32 x 3.

• Input Layers: It’s the layer in which we give input to our


model. In CNN, Generally, the input will be an image or a
sequence of images. This layer holds the raw input of the
image with width 32, height 32, and depth 3.
• Convolutional Layers: This is the layer, which is used to
extract the feature from the input dataset. It applies a set of
learnable filters known as the kernels to the input images.
The filters/kernels are smaller matrices usually 2×2, 3×3, or
5×5 shape. it slides over the input image data and computes
the dot product between kernel weight and the
corresponding input image patch. The output of this layer is
referred as feature maps. Suppose we use a total of 12
filters for this layer we’ll get an output volume of dimension
32 x 32 x 12.
06/01/2025 95
• Activation Layer: By adding an activation function to the output
of the preceding layer, activation layers add nonlinearity to the
network. it will apply an element-wise activation function to the
output of the convolution layer. Some common activation
functions are RELU: max(0, x), Tanh, Leaky RELU, etc. The
volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.

• Pooling layer: This layer is periodically inserted in the covnets


and its main function is to reduce the size of volume which
makes the computation fast reduces memory and also prevents
overfitting. Two common types of pooling layers are max
pooling and average pooling. If we use a max pool with 2 x 2
filters and stride 2, the resultant volume will be of dimension
16x16x12.

06/01/2025 96
• Flattening: The resulting feature maps are flattened into a
one-dimensional vector after the convolution and pooling
layers so they can be passed into a completely linked layer
for categorization or regression.

• Fully Connected Layers: It takes the input from the previous


layer and computes the final classification or regression
task.

• Output Layer: The output from the fully connected layers is


then fed into a logistic function for classification tasks like
sigmoid or softmax which converts the output of each class
into the probability score of each class.

06/01/2025 97
Example: CNNs
Architecture LeNet-5
This is the architecture of LeNet-5 created by Yann LeCun in
1998 and widely used for written digits recognition (MNIST).

06/01/2025 98
06/01/2025 99
06/01/2025 100
06/01/2025 101
Thank You

Khushboo Jain CS401 and ML Applications using R Unit -1

You might also like