0% found this document useful (0 votes)
46 views71 pages

Understanding Neural Networks Basics

The document provides an overview of artificial neural networks (ANNs), detailing their structure, including input, hidden, and output layers, as well as the learning process involving input computation, output generation, and iterative refinement. It explains key components such as neurons, connections, weights, biases, and activation functions, along with various types of activation functions and their importance in achieving non-linearity. Additionally, it discusses the McCulloch-Pitts neuron model, perceptron learning, and multi-layer perceptron algorithms, highlighting their applications and advantages, as well as drawbacks.

Uploaded by

2205733
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views71 pages

Understanding Neural Networks Basics

The document provides an overview of artificial neural networks (ANNs), detailing their structure, including input, hidden, and output layers, as well as the learning process involving input computation, output generation, and iterative refinement. It explains key components such as neurons, connections, weights, biases, and activation functions, along with various types of activation functions and their importance in achieving non-linearity. Additionally, it discusses the McCulloch-Pitts neuron model, perceptron learning, and multi-layer perceptron algorithms, highlighting their applications and advantages, as well as drawbacks.

Uploaded by

2205733
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Neural Networks

Introduction to Neural networks


• An artificial neural network (ANN) or neural network may be defined as an
information-processing model that is inspired by the way biological nervous systems (brain)
process information.

• An ANN is composed of a large number of highly interconnected processing units (neurons)


working in unison to solve specific problems.

• Neural networks are capable of learning and identifying patterns directly from data without
pre-defined rules.

• An ANN is configured for the specific application, such as spam classification, face recognition,
pattern recognition and decision-making..
Layers in Neural Network Architecture
• Input Layer: This is where the network receives its input data. Each input neuron in the layer
corresponds to a feature in the input data.
• Hidden Layers: These layers perform most of the computational heavy lifting. A neural network
can have one or multiple hidden layers. Each layer consists of processing units (neurons) that
transform the inputs into something that the output layer can use.
• Output Layer: The final layer produces the output of the model. The format of these outputs
varies depending on the specific task (e.g., classification, regression).
Three stage Process of Neural networks
Learning in neural networks follows a structure of three-stage process:
1. Input Computation: On the basis of weights and biases (given) it processes the input and
the processed input is fed into the network.
2. Output Generation: Based on the current parameters (learning rate etc.), the network
generates an output.
3. Iterative Refinement: The network refines its output by adjusting weights and biases,
gradually improving its performance on diverse tasks.
Key Points of Neural networks
Neural networks are built from several key components:
• Neurons: The basic units that receive inputs, each neuron is governed by a threshold and an
activation function.
• Connections: Links between neurons that carry information, regulated by weights and biases.
• Weights and Biases: These parameters determine the strength and influence of connections.
• Propagation Functions/ Activation Function: Mechanisms that help process and transfer
data across layers of neurons.
• Learning Rule: The method that adjusts weights and biases over time to improve accuracy.
Working of Neural Networks

Working of Neural Networks...
2. BackPropagation
After forward propagation, the network evaluates its performance (error) using a loss function.
The goal of training is to minimize this loss/error. It has two steps:
a. Loss Calculation: The network calculates the loss, which provides a measure of error in
the predictions.
b. Gradient Calculation: The network computes the gradients of the loss function w.r.t.
weight and bias of the network.
c. Weight Update: Once the gradients are calculated, the weights and biases are updated
using an optimization algorithm (like Batch/Mini-Batch/Stochastic gradient descent).
The weights are adjusted in the opposite direction of the gradient to minimize the loss.
The size of the step taken in each update is determined by the learning rate.
Iteration
This process of forward propagation, loss calculation, backpropagation, and weight update is
repeated for many iterations (epoch) over the dataset. Over time, this iterative process
reduces the loss, and the network’s predictions become more accurate.
Activation Function
• The Activation Function is applied over the net input to calculate
the output of an ANN.
• The activation function is a mathematical "gate" in between the
input feeding in the current node and its output to the next layer.

Why do we need Activation Functions?


• We use Activation Function to achieve non-linearity.
• Nonlinear function are used to achieve the advantages of multilayer network, when an input
passes through the multilayer network with linear activation function, the output remains
same as that it could be obtained using a single layer network.
• If we do not apply activation function then the output would be linear function, and that
will be a Simple Linear Regression Model.
Threshold Value
• Threshold is a set of values, based on which the final output of
the network will be calculated.
• A comparison is made between the calculated net input and the
threshold value to obtain the final output.
• The activation function using threshold value can be defined as:
Types of Activation Function
1. Identity Function
2. Binary Step Function
3. Bipolar Step Function
4. Sigmoidal Function
5. Hyperbolic Tangent Activation Function - Tanh
6. ReLU (Rectified Linear Unit) Activation Function
Types of Activation Function...
1. Identity Function
f(x) = x for all x
• The output remains same as the input.
• It is a Linear Activation Function.
2. Binary Step Function

• Ө is threshold value. This Activation


Function converts input to binary output.
3. Bipolar Step Function

• Ө is threshold value. This Activation


Function converts input to bipolar output.
Types of Activation Function...
4. Sigmoidal Function
• Sigmoidal functions are widely used in back-propagation.
• There are two types -
Binary sigmoidal function
The range is 0 to 1.

Bipolar sigmoidal function


The range is -1 to +1.
Types of Activation Function...
5. Hyperbolic Tangent Activation Function - Tanh

• This is closely related to Bipolar


Sigmoidal Function.

6. ReLU (Rectified Linear Unit) Activation Function

• ReLU is the most commonly used


activation function in neural networks,
especially in CNNs.
McCulloch Pitts Neuron (M-P Neuron)
• Earliest neural network(1943).
• Internally connected by directed weighted paths.
• Activation function is Binary (neuron may fire or may not fire.)
• It depends on the Threshold Value .
• Has both Positive weights w (w>0) & Negative weights -p (p<0).
• Mostly used in logic functions.
Activation Function for M-P Neuron

The threshold with activation function should satisfy the condition:


Ө>nw- p Here, n: number of inputs, w: positive weight,
p : negative weight
Training Algorithm of McCulloch Pitts Neuron
• There is no particular training algorithm. Generally used for implementing logic function
• It depends on the Threshold Value and Avtivation function.
• There are two kinds of input units, excitatory, and inhibitory. In Fig. the excitatory inputs
are shown as inputs X1, X2 & X3 and the inhibitory inputs are X4 & X5. The excitatory
inputs are connected to the output unit through positively weighted links. The Inhibitory
inputs have negative weights on their connecting paths to the output unit.

• All excitatory weights have the same


positive magnitude w and all inhibitory
weights have the same negative
magnitude -p.
• The activation y_out is binary, i.e.,
either 1 (neuron fires), or 0 ( neuron
does not fire).
McCulloch Pitts Neuron Problem
Q: Obtain the output of the neuron Y for the network shown in the Figure using activation
function as: (1)binary sigmoidal and (ii) bipolar sigmoidal.
Solution: The given network has three neurons with
bias and output neuron.
These form a single layer network.
The inputs are given as [X1,X2,X3]= [0.8,0.6,0.4]
and the weights are [W1,W2,W3]=[0.1,0.3,-0.2]
with bias b=0.35 (its input is always 1).

The net input to the output neuron is


Yin = b + X1 × W1 + X2 × W2 + X3 × W3
= 0.35 + 0.8 × 0.1 + 0.6 × 0.3 + 0.4 × (-0.2)
= 0.35 + 0.08 + 0.18 - 0.08
= 0.53
McCulloch Pitts Neuron Problem...
Implementation of Logic Gate McCulloch Pitts Neuron
• M-P Pit Neuron network generally is used for implemention of the logic gate.
• The M-P Pit Neuron network finds the correct network by adjusting weight value for
activation function and threshold value manually in a hit and trial method.
• Implementation of AND Gate
• Implementation of OR Gate
• Implementation of XOR Gate
• Implementation of ANDNOT Gate
Implementation of AND Logic Gate M P Neuron ...
• The M-P neuron has no particular training algorithm.
• Through the input and output analysis it tries to construct the
network by adjusting the weight and threshold value.
• AND function is represented by simple logic function as:
Y = X1 . X2
• To implement the network there ar two inputs X1 and X2 whose
values are always 1. To get the desire AND truth table, we need
to adjust the weight value.
Implementation of AND Logic Gate McCulloch Pitts
Neuron

• For th different weight value first it is required to form


For finding the Ө value for which
the network and then to adjust the threshold value. the following inequality can be
• Assumption 1: W1=1 ana W2=1 satisfied for the input (1,1) or
Neuron will fire for input (1, 1):
For (0, 0) => Yin = X1W1 + X2W2 = 0*1 + 0*1 = 0
For (0, 1) => Yin = X1W1 + X2W2 = 0*1 + 1*1 = 1
For (1, 0) => Yin = X1W1 + X2W2 = 1*1 + 0*1 = 1 Assumptio-1: Ө =0 [Not possible]
Assumptio-1: Ө =1 [Not possible]
For (1,1) => Yin = X1W1 + X2W2 = 1*1 + 1*1 = 2 Assumptio-2: Ө = 2 [Possible]
Hence for the Ө=2 and W1= 1 and W2 = 1 the Neuron will satisfy the AND logic function
Implementation of XOR Logic Gate M P Neuron

A single layer network is not sufficient to represent this two layered structure/function. Here we
will use one intermediate layer
Implementation of XOR Logic Gate M P Neuron
Truth Table of Z1
Implementation of XOR Logic Gate M P Neuron
Truth Table of Z1

Assumptio-1: Ө=0 [Not possible]


Assumptio-2: Ө= 1 [Possible]
Implementation of XOR Logic Gate M P Neuron
Truth Table of Z2
Implementation of XOR Logic Gate M P Neuron
Truth Table of Z2

Assumptio-1: Ө =0 [Not possible]


Assumptio-2: Ө = 1 [Possible]
Implementation of XOR Logic Gate M P Neuron
For satisfaction of Second function Y=Z1+Z2 Truth Table of Y
Assumption 1: V1=1 ana V2=1
For (0, 0) => Yin = Z1V1 + Z2V2 = 0*1 + 0*1 = 0
For (0, 1) => Yin = Z1V1 + Z2V2 = 0*1 + 1*1 = 1
For (1, 0) => Yin = Z1V1 + Z2V2 = 1*1 + 0*1 = 1
For (0,0) => Yin = Z1V1 + Z2V2 = 0*1 + 0*1 = 0
For finding the Ө value for which the following inequality can be satisfied for the input (1,0)
or Neuron will fire for input (1, 0):
Assumptio-1: Ө=0 [Not possible]
Assumptio-2: Ө= 1 [Possible]
Implementation of XOR Logic Gate McCulloch Pitts Neuron
For weights: W11 = 1, W21 = -11, W12 = -1, W22 = 1, V1 = 1 and V2=1, and Ө1 = 2, Ө2 = 1,
Ө3 = 1, the Neuron will satisfy the XOR logic function
Perceptron Learning - Introduction
• The first neural network model 'Perceptron', designed in 1958
• Perceptron is a linear binary classifier used for supervised learning.
• Perceptron Learning model is a combination of two concepts: McCulloch-Pitts model of
an artificial neuron and Hebbian learning rule of adjusting weights.

The perceptron model consists of 4 steps:


1. Inputs from other neurons
2. Weights and bias
3. Net sum
4. Activation function
Perceptron Learning - ALGORITHM

Where,
ŋ: Learning Rate
xi: ith input
errori: ith error
Multi-Layer Perceptron Learning Algorithm

• A multi-layer perceptron is a type of Feed Forward Neural Network with multiple neurons
arranged in layers.
• The network has at least three layers with an input layer, one or more hidden layers and an
output layer.
• All the neurons in a layer are fully connected to the neurons in the next layer.
• The input layer is the visible layer. It just passes the input to the next layer.
• . The layers following the input layer are the hidden layers. The hidden layers neither
directly receive inputs nor send outputs to the external environment.
• The final layer is the output layer which outputs a single value or a vector of values.
Multi-Layer Perceptron Learning Algorithm...
• The activation functions used in the layers can be linear or non-linear depending on the type
of the problem modelled. Typically, a sigmoid activation function is used if the problem is a
binary classification problem and a softmax activation function is used in a multi-class
classification problem.
Multi-Layer Perceptron Learning Algorithm

where, Oi is the output from Node i


wij is the weight in the link from Node i to Node j
Өj is the bias value of Node j
Multi-Layer Perceptron Learning Algorithm

Multi-Layer Perceptron Learning Algorithm
Step 2: Backward Propagation
2. 1. Calculate Error at each node:

For each Unit k in the Output Layer

Errork= Ok(1-Ok) (Oactual- Ok)

Where, Ok is the calculated output value at Node k in the Output Layer.

Oactual is the actual output value of the Node in the Output Layer.
For each Unit j in the Hidden Layer

Errorj = 0j(1-Oj) Σ kErrork wjk

Where, Oj is the output value at Node j in the Hidden Layer.

Errork is the error at Node k in the Output Layer.


wik is the weight in the link from node i to k
Multi-Layer Perceptron Learning Algorithm

Δwij = ŋ * Errorj * Oj
wij = wij + Δwij
Advantage of Backpropagation
• Relatively simple implemenion.
• Mathematical formula used in the algorithm can be applied to any network.
• Computing time can be reduced if the weight choosen are small at the beginning.
• Well suited for continuous valued input and output
• High tolerant for noisy data.
• Ability to classify patterns for which they have not been trained
Drawback of Backpropagation
• Slow and inefficient. Can stuck in local minima resulting in sub-optimal solution.
• In case of large input, it is difficult to relate the output w.r.t. inputs.
• Require long training time.
• Poor interpretability: difficult to interpret the parameter values
Application
• Successful in real world data: Handwritten character Recognition, pathology and
laboratory medicine, tning a computer to pronounce.
Multi-Layer Perceptron or Backpropagation Algorithm
Q: Assume that the neurons have a sigmoid activation function, perform a forward
pass and backwd pass on the network to update the weights. Assume target output is 1
and learning rate is 0.9.
X1 X2 X3 W14 W15 W24 W25 W34 W35 W46 W56
1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2 -0.4 0.2 0.1
Multi-Layer Perceptron or Backpropagation Problem...

Backpropagation Problem...
Step 2 (Backpropagate) :- Update weight of each connector by evaluating the corresponding error

Updated Weight Wij = Wij + ΔWij


Where ΔWij = ŋ * Errj *
Yi
Update Weight W46 & W56 using Err6
Update Weight W14 W24 & W34 using Err4

Update Weight W15 W25 & W35 using Err5


Backpropagation Problem...
Step 2 (Backpropagate) :-
(Compute Error)

Output Layer Error:


Err6 = Y6* (1-Y6) * (T6-Y6)
= 0.474 * (1-0.474) * (1-0.474) = 0.1311
Hidden Layer Error:
Err4 = Y4 * (1-Y4) * W46 * Err6
= 0.332 * (1-0.332) * -0.3 * 0.1311
= -0.0087

Err5 = Y5 * (1-Y5) * W56 * Err6


= 0.525 * (1-0.525) * -.2 * 0.1311
= -0.0065
Backpropagation Problem...
Step 2 (Backpropagate) :-
(Update Weight and Biase)
Updated Weight Wij = Wij + ΔWij
Where ΔWij = ŋ * Errj *
Unit - 6 Yi

Update W46= W46 + ŋ * Err6 * Y4


= -0.3 + 0.9 * 0.1311 * 0.332
= -0.261

Update W56= W56 + ŋ * Err6 * Y5


= -0.2 + 0.9 * 0.1311 * 0.525
= -0.138
Backpropagation Problem...
Step 2 (Backpropagate) :-
(Update Weight and Biase)
Updated Weight Wij = Wij + ΔWij
Unit - 4 Where ΔWij = ŋ * Errj *
Yi
Update W14= W14 + ŋ * Err4 * Y1
= -0.2 + 0.9 * -0.0087 * 1
= 0.192

Update W24= W24 + ŋ * Err4 * Y2


= 0.4 + 0.9 * -0.0087 * 0
= 0.4

Update W34= W34 + ŋ * Err4 * Y3


= -0.5 + 0.9 * -0.0087 * 1
= -0.508
Backpropagation Problem...
Step 2 (Backpropagate) :-
(Update Weight and Biase)
Updated Weight Wij = Wij + ΔWij
Where ΔWij = ŋ * Errj *
Unit - 5 Yi
Update W15= W15 + ŋ * Err5 * Y1
= -0.3 + 0.9 * -0.0065 * 1
= -0.306

Update W25= W25 + ŋ * Err5 * Y2


= 0.1 + 0.9 * -0.0065 * 0
= 0.1

Update W35= W35 + ŋ * Err5 * Y3


= 0.2 + 0.9 * -0.0065 * 1
= 0.194
Backpropagation Problem...
Step 2 (Backpropagate) :-
(Update Weight and Biase)
Backpropagation Problem...
Parameter OLD NEW Parameter OLD NEW
(Weight) (Bias Value)
W14 0.2 0.192 -0.4 -0.408
W15 -0.3 -0.306 0.2 0.194
W24 0.4 0.4 0.1 0.218
W25 0.1 0.1
W34 -0.5 -0.508
W35 0.2 0.194
W46 -0.3 -0.261
W56 -0.2 -0.138

Iterate the same process of finding error and updating parameters till the error
value is not acceptable
DERIVATION OF THE GRADIENT DESCENT RULE
The loss function is a measurement of error which defines the precision lost on comparing the
predicted output to the actual output.

loss = [(actual output) - (predicted output)]2

The graphical representation of the loss/error w.r.t. the weigt is as follows

The graphical method of finding minimum of a function is called Gradient Descent.


DERIVATION OF THE GRADIENT DESCENT RULE
A random point on this graph is choosen and the slope is calculated.

A positive slope indicates an increase in weight A negative slope indicates a decrease in weight

A zero slope indicates an appropriate weight


Aim is to reach a point where slope is zero or minimum loss/error is there
DERIVATION of the GRADIENT DESCENT RULE

Gradient in the jth Output Layer =

Gradient in the jth hidden Layer =


DERIVATION OF THE GRADIENT DESCENT RULE

THE VANISHING AND EXPLODING
GRADIENT PROBLEMS
BACK PROPAGATION
THE PROBLEM : CAUSE & EFFECT
VANISHING GRADIENT
EXPLODING GRADIENT
SYMPTOMS OF THESE PROBLEMS
DEALING WITH THESE
WEIGHT INITIALIZATION
DEALING WITH THESE...
ACTIVATION FUNCTION
DEALING WITH THESE
DEALING WITH THESE
CONVOLUTIONAL NEURAL NETWORK
LINEAR TIME INVARIANT SYSTEM
REPRESENTATION OF DIGITAL IMAGE
UNDERSTANDING CONVOLUTION
CONVOLUTION AS A PROCESS
SOME FILTER TERMINOLOGY
ROLE OF KERNEL
KERNEL WITH ACTIVATION
POOLING LAYER
Responsible for Reducing the Feature Size Further
EXAMPLE OF POOLING
FLATTEN LAYER
FULLY CONNECTED LAYER FULLY CONNECTED LAYER
CNN ARCHITECTURE

You might also like