0% found this document useful (0 votes)

55 views61 pages

Understanding Neural Network Activation Functions

The document discusses activation functions in neural networks, explaining their importance in introducing non-linearities for complex tasks. It categorizes activation functions into binary step, linear, and non-linear types, detailing specific functions like sigmoid, tanh, ReLU, and their variants, along with their advantages and disadvantages. Additionally, it covers the backpropagation algorithm for training multilayer perceptrons (MLPs) and their application in classification tasks.

Uploaded by

nccxgb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views61 pages

Understanding Neural Network Activation Functions

Uploaded by

nccxgb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module 4

Perceptron
2
3
4
5
6
7
8
9
10
Multi-Layer Perceptron
12
13
Activation Functions
What is an Activation Function?
• They basically decide whether a neuron should be activated or
not.
• Whether the information/input that the neuron is receiving
is relevant fo the given prediction or should it be ignored.
• Input to the activation function is
• The activation function is the non linear transformation
that we do over the input signals of hidden neurons.
• This transformed output is then sent to the next layer
of neurons as input.

• A neural network without an activation function is essentially

just a linear regression model.

• The activation function does the non-linear transformation to

the input making it capable to learn and perform more complex
tasks.

• This is applied to the hidden neurons

Need for Activation Function

Purpose of Activation Functions is to introduce non-linearities in the network

Types of Activation functions with Neural Networks

The Activation Functions can be basically divided into

different types-

1. Binary Step functions

2. Linear Activation Function

3. Non-linear Activation Functions

1. Binary Step
• Function
A binary step function is a threshold-based activation function.
• It uses a threshold to decide whether a neuron should be
activated or not
• If the input to the activation function (Y) is above (or below) a
certain threshold, the neuron is activated and sends exactly
the same signal to the next layer.
• Otherwise, the neuron is not activated. I.e., signal is not
passed to the next layer.

Activation function f(x) =

“activated” if
Y > threshold else not
Alternatively, f(x) = 1 if Y>
threshold, 0 otherwise
Disadvantages of Binary Step Functions :

[Link] don't provide multi-value outputs – not suitable

for multi-class classification

[Link] gradient of the step function is zero, this introduces

some problem in the backpropagation process
1. Linear Activation Function
• Also known as identity
function.
• In Linear Activation Function, the
dependent Variable has a direct,
proportional relationship with the
independent variable.
• The output is proportional
to the
input.
Equation : f(x) = x
Range : (-infinity to infinity)

It doesn’t help with the

complexity or various
parameters of usual data that
is fed to the neural networks.

• The output of the functions will not be confined between any range.
Disadvantages of Linear Activation Function

• The gradient of the function doesn't involve the

input (x)
• Hence it is difficult during backpropagation to
identify the neuron's whose weight have to be
adjusted
• The neuron passes the signal as it is to the
next layer
• The last layer will be a linear function of the first
layer.
• This linear activation function is generally used by
the neurons in the input layer of NN.
Non-linear Activation Function

The Nonlinear Activation Functions are the most used activation functions.

It makes it easy for the

to generalize
model adapt
or variety of
data betwee with
differentiate n and
output

to the

The Nonlinear Activation Functions are mainly divided on the basis of

their range
or curves
Advantages of Non-Linear Activation Functions

• The gradient of the function involves input 'x'.

• Hence it is easy to understand which weights of

the input neurons have to be adjusted, during
backpropagation to give a better prediction
1. Sigmoid or Logistic Activation
Function
Input : a real number

Output : a number between 0 to 1

The main reason why we use sigmoid function is because it exists

between (0 to 1). Therefore, it is especially used for models where we have to predict
the probability as an output. Since probability of anything exists only between the range of
0 and 1, sigmoid is the right choice.
Smaller the input number (more
negative)
Adds Non-Linearity
0
Greater the input number (more
positive)
1
Disadvantages of Sigmoid Activation Function

• The gradient of the function has a significant value, only for inputs
between 3 and –3.
• For inputs out of this range, the gradient is small, and eventually it
becomes zero.
• The network stops learning and suffers from vanishing
gradient problem
2. Tanh or hyperbolic tangent Activation Function

• The output range of the tanh function is from (-1 to 1). tanh is also
sigmoidal (s - shaped).
• Tanh is zero
centered.
• Negative inputs are
mapped strongly
negative
• Positive inputs are
mapped strongly
positive
• Zero inputs are
mapped near zero

• Both tanh and logistic

sigmoid activation
Fig: tanh v/s Logistic functions are used in
Sigmoid feed-forward nets.
Disadvantages of Tanh Activation Function

• Gradient is very steep, but eventually becomes zero

• The network stops learning and suffers from vanishing
gradient problem
• But tanh is zero centered and the gradients move in all directions.
• Hence tanh non-linearity is preferred over sigmoid
Comparison of Sigmoid and Tanh Activation Functions ….
• For integers between –6 to + 6
Comparison of Sigmoid and Tanh Activation Functions...
• For integers between –6 to + 6
• Data is centered around zero for tanh meaning, Mean of the input data is zero

• Training of the neural network converges faster, if the inputs to the neurons in
each layer have a mean of zero and a variance of 1 and decorrelated.

• Since the input to each layer comes from the previous layer, it is important
that the output of the previous layers (input to the next layers) are centered
around zero.
3. ReLU (Rectified Linear Unit) Activation Function

• The ReLU is the most used activation function.

Since, it is used in almost all the convolutional
neural networks or deep learning.
• The ReLU is half rectified (from bottom). R(z) is
zero when z is less than zero and R(z) is equal to
z when z is above or equal to zero.

• Range: [ 0 to infinity)

• Any negative input given to the ReLU activation

function turns the value into zero immediately in
the graph, which in turns affects the resulting
graph by not mapping the negative values
appropriately.
Disadvantages of ReLU :

• For negative inputs, the gradient is zero.

• Hence during backpropagation, the weights and bias of some neurons
are not updated.
• This creates dead neurons, which never get activated
• This is known as "Dying ReLU problem"
4. Leaky ReLU/Parametric ReLu
• It is an attempt to solve the dying ReLU
problem

Fig : ReLU v/s Leaky ReLU

• The gradient has a slope for negative inputs .
• The leak helps to increase the range of the ReLU function.
• Usually, the value of a is 0.1 (Leaky ReLU)or some other value a
• When a is not 0.01 then it is called Randomized/Parametric ReLU.
f(x) = max(αx,
x)
• Therefore the range of the Leaky ReLU is (-infinity to infinity).
Advantages and Disadvantages of Leaky ReLU :

• For negative inputs, the gradient is a non-zero value

• Hence during backpropagation, the weights and bias of all neurons
are updated. No dead neurons
• The predictions made for negative inputs are not consistent.
• Since the gradient is a very small value for negative inputs, learning of model
parameters is time consuming
•Sigmoid functions and their combinations generally work better
in the case of classifiers. Sigmoids and tanh functions are
sometimes avoided due to the vanishing gradient problem

•ReLU function is a general activation function and is used in

most cases these days. ReLu is less computationally expensive than tanh
and sigmoid because it involves simpler mathematical operations and activates
only few neurons

•If we encounter a case of dead neurons in our networks the

leaky ReLU function is the best choice

•Always keep in mind that ReLU function should only be

used in the hidden layers. At current time, ReLu works most of
the time as a general approximator
• Variants of ReLU: Leaky ReLU, Parametric ReLU, Exponential Linear Unit
SoftMax Activation Function

Softmax is an activation function that scales numbers/logits into

probabilities. The output of a Softmax is a vector (say v ) with
probabilities of each possible outcome. The probabilities in vector v
sum to one for all possible outcomes or classes.

Used at the end of network in Multi class classification

Activation Functions
41
42
43
Back - Propagation

44
45
46
47
48
49
50
51
52
53
54
Backpropagation:

• Backpropagation is a widely used algorithm for training multilayer

neural networks.

• It works by computing the gradient of the loss function with

respect to each weight in the network, and then using this gradient
to update the weights using the gradient descent algorithm.

• The basic idea behind backpropagation is to propagate the error

backwards through the network, from the output layer to the input
layer.

• At each layer, the error is multiplied by the derivative of the

activation function with respect to the input, which gives the
gradient of the error with respect to the weights and biases at that
layer.
• Once the gradients have been computed, they can be used to
update the weights and biases using a learning rate and the
gradient descent update rule.

• This process is repeated for each input in the training set until the
network converges to a minimum of the loss function.

• Backpropagation is a powerful algorithm for training neural

networks, but it can suffer from overfitting and slow convergence.
Classification MLP
MLPs can also be used for classification tasks. For a binary
classification problem, you just need a single output neuron using
the logistic activation function: the output will be a number
between 0 and 1, which you can interpret as the estimated
probability of the positive class. Obviously, the estimated
probability of the negative class is equal to one minus that number
Multilabel Binary Classification
with MLP
For example, you could have an email classification system that
predicts whether each incoming email is ham or spam, and
simultaneously predicts whether it is an urgent or non-urgent
email. In this case, you would need two output neurons, both using
the logistic activation function: the first would output the
probability that the email is spam and the second would output
the probability that it is urgent.
A modern MLP (including ReLU and softmax) for classification

Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
137 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
124 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
43 pages
Understanding Activation Functions in Deep Learning
No ratings yet
Understanding Activation Functions in Deep Learning
34 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
6 pages
Purpose of Activation Functions in ANN
No ratings yet
Purpose of Activation Functions in ANN
22 pages
Understanding Activation Functions in ANN
No ratings yet
Understanding Activation Functions in ANN
12 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Non-Differentiable Activation Functions
No ratings yet
Non-Differentiable Activation Functions
15 pages
Neural Network Activation Functions Explained
No ratings yet
Neural Network Activation Functions Explained
25 pages
Signum Activation Function Overview
No ratings yet
Signum Activation Function Overview
36 pages
Activation Functions in Deep Learning
No ratings yet
Activation Functions in Deep Learning
10 pages
Importance of Activation Functions in ANN
No ratings yet
Importance of Activation Functions in ANN
7 pages
Overview of Binary Step Activation Function
No ratings yet
Overview of Binary Step Activation Function
7 pages
Overview of Neural Network Activation Functions
No ratings yet
Overview of Neural Network Activation Functions
10 pages
Understanding Activation Functions in ML
No ratings yet
Understanding Activation Functions in ML
19 pages
Understanding Activation Functions
No ratings yet
Understanding Activation Functions
7 pages
MLP and Activation Functions Overview
No ratings yet
MLP and Activation Functions Overview
34 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
9 pages
Deep Learning: Neural Network Architectures
No ratings yet
Deep Learning: Neural Network Architectures
31 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
92 pages
Components and Functions of Perceptron
No ratings yet
Components and Functions of Perceptron
11 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
25 pages
Choosing Neural Network Activation Functions
No ratings yet
Choosing Neural Network Activation Functions
36 pages
Neural Network Activation Functions
No ratings yet
Neural Network Activation Functions
29 pages
Types of Neural Network Activation Functions
No ratings yet
Types of Neural Network Activation Functions
16 pages
Activation Functions for Multi-Class Output
No ratings yet
Activation Functions for Multi-Class Output
15 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
7 pages
Linear vs Nonlinear Models Explained
No ratings yet
Linear vs Nonlinear Models Explained
10 pages
Neural Network Input and Activation Functions
No ratings yet
Neural Network Input and Activation Functions
41 pages
Understanding Activation Functions
No ratings yet
Understanding Activation Functions
4 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Understanding Neural Network Activation Functions
No ratings yet
Understanding Neural Network Activation Functions
31 pages
Understanding Activation Functions in ML
No ratings yet
Understanding Activation Functions in ML
7 pages
Understanding Nonlinear Activation Functions
No ratings yet
Understanding Nonlinear Activation Functions
41 pages
Activation Function Derivatives Explained
No ratings yet
Activation Function Derivatives Explained
89 pages
Understanding Neural Networks and Activation Functions
No ratings yet
Understanding Neural Networks and Activation Functions
35 pages
Importance of Non-Linear Activation Functions
No ratings yet
Importance of Non-Linear Activation Functions
15 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
DNN Training: Activation & Loss Functions
No ratings yet
DNN Training: Activation & Loss Functions
148 pages
MLP Basics: Structure and Training
No ratings yet
MLP Basics: Structure and Training
34 pages
Neural Network Training Essentials
No ratings yet
Neural Network Training Essentials
84 pages
Understanding Shallow Neural Networks
No ratings yet
Understanding Shallow Neural Networks
44 pages
Understanding Neural Network Activation Functions
No ratings yet
Understanding Neural Network Activation Functions
2 pages
Activation Functions in Machine Learning
No ratings yet
Activation Functions in Machine Learning
8 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
94 pages
Understanding Activation Functions
No ratings yet
Understanding Activation Functions
11 pages
Understanding Neural Network Activation Functions
No ratings yet
Understanding Neural Network Activation Functions
14 pages
Overfitting vs Underfitting Explained
No ratings yet
Overfitting vs Underfitting Explained
35 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
45 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
9 pages
Feedforward Neural Network Overview
No ratings yet
Feedforward Neural Network Overview
35 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
67 pages
Neural Networks Overview and Training
No ratings yet
Neural Networks Overview and Training
36 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
138 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
Neural Network Fundamentals and Functions
100% (1)
Neural Network Fundamentals and Functions
27 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
40 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
18 pages
Deep Learning Overview and Techniques
No ratings yet
Deep Learning Overview and Techniques
74 pages
B. Tech Question Bank: Artificial Neural Networks
No ratings yet
B. Tech Question Bank: Artificial Neural Networks
7 pages
Overview of Convolutional Neural Networks
No ratings yet
Overview of Convolutional Neural Networks
7 pages
Artificial Neuron Models & Learning Techniques
No ratings yet
Artificial Neuron Models & Learning Techniques
24 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
50 pages
Histologi Jaringan Saraf dan Neural Networks
No ratings yet
Histologi Jaringan Saraf dan Neural Networks
38 pages
Multilayer Feedforward Neural Networks Guide
No ratings yet
Multilayer Feedforward Neural Networks Guide
25 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
18 pages
Forward Propagation and Backpropagation in CNN
No ratings yet
Forward Propagation and Backpropagation in CNN
7 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
17 pages
Types of Neural Networks Explained
No ratings yet
Types of Neural Networks Explained
15 pages
Introduction to Soft Computing Concepts
No ratings yet
Introduction to Soft Computing Concepts
46 pages
Long-Term Stock Price Forecasting Using Deep Learning
No ratings yet
Long-Term Stock Price Forecasting Using Deep Learning
11 pages
Deep Learning Syllabus for CSE Sem 6
No ratings yet
Deep Learning Syllabus for CSE Sem 6
4 pages
Back-propagation in Neural Networks
No ratings yet
Back-propagation in Neural Networks
20 pages
Subcharacter Embeddings in Neural Networks
No ratings yet
Subcharacter Embeddings in Neural Networks
4 pages
Deep Learning for Oil Spill Detection
No ratings yet
Deep Learning for Oil Spill Detection
13 pages
Final Steps for Neural Networks Exam
No ratings yet
Final Steps for Neural Networks Exam
6 pages
Deep Learning Applications in Finance
No ratings yet
Deep Learning Applications in Finance
2 pages
Machine Learning Exam Questions
No ratings yet
Machine Learning Exam Questions
5 pages
ANN vs CNN: Key Differences Explained
No ratings yet
ANN vs CNN: Key Differences Explained
1 page
Understanding PixelCNN in Deep Learning
No ratings yet
Understanding PixelCNN in Deep Learning
13 pages
MNIST Handwritten Digit Recognition Models
No ratings yet
MNIST Handwritten Digit Recognition Models
14 pages
Deep Learning: Concepts and Applications
No ratings yet
Deep Learning: Concepts and Applications
16 pages
LeNet-5 CNN Architecture Overview
No ratings yet
LeNet-5 CNN Architecture Overview
99 pages
MobileNetV3 for Leaf Disease Detection
No ratings yet
MobileNetV3 for Leaf Disease Detection
5 pages
RNN for Stock Price Prediction
0% (1)
RNN for Stock Price Prediction
24 pages
Multi-layer Perceptron in Machine Learning
No ratings yet
Multi-layer Perceptron in Machine Learning
96 pages
Deep Learning in Computer Vision Exam
No ratings yet
Deep Learning in Computer Vision Exam
2 pages
Introduction to Machine Learning Systems
100% (1)
Introduction to Machine Learning Systems
1,748 pages

Understanding Neural Network Activation Functions

Uploaded by

Understanding Neural Network Activation Functions

Uploaded by

Module 4

• A neural network without an activation function is essentially

• The activation function does the non-linear transformation to

• This is applied to the hidden neurons

Purpose of Activation Functions is to introduce non-linearities in the network

The Activation Functions can be basically divided into

1. Binary Step functions

2. Linear Activation Function

3. Non-linear Activation Functions

Activation function f(x) =

[Link] don't provide multi-value outputs – not suitable

[Link] gradient of the step function is zero, this introduces

It doesn’t help with the

• The gradient of the function doesn't involve the

It makes it easy for the

The Nonlinear Activation Functions are mainly divided on the basis of

• The gradient of the function involves input 'x'.

• Hence it is easy to understand which weights of

Output : a number between 0 to 1

The main reason why we use sigmoid function is because it exists

• Both tanh and logistic

• Gradient is very steep, but eventually becomes zero

• The ReLU is the most used activation function.

• Any negative input given to the ReLU activation

• For negative inputs, the gradient is zero.

Fig : ReLU v/s Leaky ReLU

• For negative inputs, the gradient is a non-zero value

•ReLU function is a general activation function and is used in

•If we encounter a case of dead neurons in our networks the

•Always keep in mind that ReLU function should only be

Softmax is an activation function that scales numbers/logits into

Used at the end of network in Multi class classification

• Backpropagation is a widely used algorithm for training multilayer

• It works by computing the gradient of the loss function with

• The basic idea behind backpropagation is to propagate the error

• At each layer, the error is multiplied by the derivative of the

• Backpropagation is a powerful algorithm for training neural

You might also like