0% found this document useful (0 votes)
55 views61 pages

Understanding Neural Network Activation Functions

The document discusses activation functions in neural networks, explaining their importance in introducing non-linearities for complex tasks. It categorizes activation functions into binary step, linear, and non-linear types, detailing specific functions like sigmoid, tanh, ReLU, and their variants, along with their advantages and disadvantages. Additionally, it covers the backpropagation algorithm for training multilayer perceptrons (MLPs) and their application in classification tasks.

Uploaded by

nccxgb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views61 pages

Understanding Neural Network Activation Functions

The document discusses activation functions in neural networks, explaining their importance in introducing non-linearities for complex tasks. It categorizes activation functions into binary step, linear, and non-linear types, detailing specific functions like sigmoid, tanh, ReLU, and their variants, along with their advantages and disadvantages. Additionally, it covers the backpropagation algorithm for training multilayer perceptrons (MLPs) and their application in classification tasks.

Uploaded by

nccxgb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Module 4

Perceptron
2
3
4
5
6
7
8
9
10
Multi-Layer Perceptron
12
13
Activation Functions
What is an Activation Function?
• They basically decide whether a neuron should be activated or
not.
• Whether the information/input that the neuron is receiving
is relevant fo the given prediction or should it be ignored.
• Input to the activation function is
• The activation function is the non linear transformation
that we do over the input signals of hidden neurons.
• This transformed output is then sent to the next layer
of neurons as input.

• A neural network without an activation function is essentially


just a linear regression model.

• The activation function does the non-linear transformation to


the input making it capable to learn and perform more complex
tasks.

• This is applied to the hidden neurons


Need for Activation Function

Purpose of Activation Functions is to introduce non-linearities in the network


Types of Activation functions with Neural Networks

The Activation Functions can be basically divided into


different types-

1. Binary Step functions

2. Linear Activation Function

3. Non-linear Activation Functions


1. Binary Step
• Function
A binary step function is a threshold-based activation function.
• It uses a threshold to decide whether a neuron should be
activated or not
• If the input to the activation function (Y) is above (or below) a
certain threshold, the neuron is activated and sends exactly
the same signal to the next layer.
• Otherwise, the neuron is not activated. I.e., signal is not
passed to the next layer.

Activation function f(x) =


“activated” if
Y > threshold else not
Alternatively, f(x) = 1 if Y>
threshold, 0 otherwise
Disadvantages of Binary Step Functions :

[Link] don't provide multi-value outputs – not suitable


for multi-class classification

[Link] gradient of the step function is zero, this introduces


some problem in the backpropagation process
1. Linear Activation Function
• Also known as identity
function.
• In Linear Activation Function, the
dependent Variable has a direct,
proportional relationship with the
independent variable.
• The output is proportional
to the
input.
Equation : f(x) = x
Range : (-infinity to infinity)

It doesn’t help with the


complexity or various
parameters of usual data that
is fed to the neural networks.

• The output of the functions will not be confined between any range.
Disadvantages of Linear Activation Function

• The gradient of the function doesn't involve the


input (x)
• Hence it is difficult during backpropagation to
identify the neuron's whose weight have to be
adjusted
• The neuron passes the signal as it is to the
next layer
• The last layer will be a linear function of the first
layer.
• This linear activation function is generally used by
the neurons in the input layer of NN.
Non-linear Activation Function

The Nonlinear Activation Functions are the most used activation functions.

It makes it easy for the


to generalize
model adapt
or variety of
data betwee with
differentiate n and
output

to the

The Nonlinear Activation Functions are mainly divided on the basis of


their range
or curves
Advantages of Non-Linear Activation Functions

• The gradient of the function involves input 'x'.

• Hence it is easy to understand which weights of


the input neurons have to be adjusted, during
backpropagation to give a better prediction
1. Sigmoid or Logistic Activation
Function
Input : a real number

Output : a number between 0 to 1

The main reason why we use sigmoid function is because it exists


between (0 to 1). Therefore, it is especially used for models where we have to predict
the probability as an output. Since probability of anything exists only between the range of
0 and 1, sigmoid is the right choice.
Smaller the input number (more
negative)
Adds Non-Linearity
0
Greater the input number (more
positive)
1
Disadvantages of Sigmoid Activation Function

• The gradient of the function has a significant value, only for inputs
between 3 and –3.
• For inputs out of this range, the gradient is small, and eventually it
becomes zero.
• The network stops learning and suffers from vanishing
gradient problem
2. Tanh or hyperbolic tangent Activation Function

• The output range of the tanh function is from (-1 to 1). tanh is also
sigmoidal (s - shaped).
• Tanh is zero
centered.
• Negative inputs are
mapped strongly
negative
• Positive inputs are
mapped strongly
positive
• Zero inputs are
mapped near zero

• Both tanh and logistic


sigmoid activation
Fig: tanh v/s Logistic functions are used in
Sigmoid feed-forward nets.
Disadvantages of Tanh Activation Function

• Gradient is very steep, but eventually becomes zero


• The network stops learning and suffers from vanishing
gradient problem
• But tanh is zero centered and the gradients move in all directions.
• Hence tanh non-linearity is preferred over sigmoid
Comparison of Sigmoid and Tanh Activation Functions ….
• For integers between –6 to + 6
Comparison of Sigmoid and Tanh Activation Functions...
• For integers between –6 to + 6
• Data is centered around zero for tanh meaning, Mean of the input data is zero

• Training of the neural network converges faster, if the inputs to the neurons in
each layer have a mean of zero and a variance of 1 and decorrelated.

• Since the input to each layer comes from the previous layer, it is important
that the output of the previous layers (input to the next layers) are centered
around zero.
3. ReLU (Rectified Linear Unit) Activation Function

• The ReLU is the most used activation function.


Since, it is used in almost all the convolutional
neural networks or deep learning.
• The ReLU is half rectified (from bottom). R(z) is
zero when z is less than zero and R(z) is equal to
z when z is above or equal to zero.

• Range: [ 0 to infinity)

• Any negative input given to the ReLU activation


function turns the value into zero immediately in
the graph, which in turns affects the resulting
graph by not mapping the negative values
appropriately.
Disadvantages of ReLU :

• For negative inputs, the gradient is zero.


• Hence during backpropagation, the weights and bias of some neurons
are not updated.
• This creates dead neurons, which never get activated
• This is known as "Dying ReLU problem"
4. Leaky ReLU/Parametric ReLu
• It is an attempt to solve the dying ReLU
problem

Fig : ReLU v/s Leaky ReLU


• The gradient has a slope for negative inputs .
• The leak helps to increase the range of the ReLU function.
• Usually, the value of a is 0.1 (Leaky ReLU)or some other value a
• When a is not 0.01 then it is called Randomized/Parametric ReLU.
f(x) = max(αx,
x)
• Therefore the range of the Leaky ReLU is (-infinity to infinity).
Advantages and Disadvantages of Leaky ReLU :

• For negative inputs, the gradient is a non-zero value


• Hence during backpropagation, the weights and bias of all neurons
are updated. No dead neurons
• The predictions made for negative inputs are not consistent.
• Since the gradient is a very small value for negative inputs, learning of model
parameters is time consuming
•Sigmoid functions and their combinations generally work better
in the case of classifiers. Sigmoids and tanh functions are
sometimes avoided due to the vanishing gradient problem

•ReLU function is a general activation function and is used in


most cases these days. ReLu is less computationally expensive than tanh
and sigmoid because it involves simpler mathematical operations and activates
only few neurons

•If we encounter a case of dead neurons in our networks the


leaky ReLU function is the best choice

•Always keep in mind that ReLU function should only be


used in the hidden layers. At current time, ReLu works most of
the time as a general approximator
• Variants of ReLU: Leaky ReLU, Parametric ReLU, Exponential Linear Unit
SoftMax Activation Function

Softmax is an activation function that scales numbers/logits into


probabilities. The output of a Softmax is a vector (say v ) with
probabilities of each possible outcome. The probabilities in vector v
sum to one for all possible outcomes or classes.

Used at the end of network in Multi class classification


Activation Functions
41
42
43
Back - Propagation

44
45
46
47
48
49
50
51
52
53
54
Backpropagation:

• Backpropagation is a widely used algorithm for training multilayer


neural networks.

• It works by computing the gradient of the loss function with


respect to each weight in the network, and then using this gradient
to update the weights using the gradient descent algorithm.

• The basic idea behind backpropagation is to propagate the error


backwards through the network, from the output layer to the input
layer.

• At each layer, the error is multiplied by the derivative of the


activation function with respect to the input, which gives the
gradient of the error with respect to the weights and biases at that
layer.
• Once the gradients have been computed, they can be used to
update the weights and biases using a learning rate and the
gradient descent update rule.

• This process is repeated for each input in the training set until the
network converges to a minimum of the loss function.

• Backpropagation is a powerful algorithm for training neural


networks, but it can suffer from overfitting and slow convergence.
Classification MLP
MLPs can also be used for classification tasks. For a binary
classification problem, you just need a single output neuron using
the logistic activation function: the output will be a number
between 0 and 1, which you can interpret as the estimated
probability of the positive class. Obviously, the estimated
probability of the negative class is equal to one minus that number
Multilabel Binary Classification
with MLP
For example, you could have an email classification system that
predicts whether each incoming email is ham or spam, and
simultaneously predicts whether it is an urgent or non-urgent
email. In this case, you would need two output neurons, both using
the logistic activation function: the first would output the
probability that the email is spam and the second would output
the probability that it is urgent.
A modern MLP (including ReLU and softmax) for classification

You might also like