0% found this document useful (0 votes)

31 views26 pages

Deep Learning Activation Functions

The document discusses various activation functions used in deep learning, highlighting the limitations of sigmoid and softmax functions, and introducing ReLU and Leaky ReLU as alternatives that mitigate issues like vanishing gradients. It also covers neural network architecture, including the structure of layers, the importance of learning rates and momentum in training, and techniques for layer initialization and transfer learning. Practical examples using PyTorch are provided throughout to illustrate these concepts.

Uploaded by

hunglaikcad1247

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views26 pages

Deep Learning Activation Functions

Uploaded by

hunglaikcad1247

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Discovering

activation functions
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Limitations of the sigmoid and softmax function
Sigmoid functions:

Bounded between 0 and 1

Can be used anywhere in the network

Gradients:

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

vanishing gradients during backpropagation.

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing ReLU
Rectified Linear Unit (ReLU):

f(x) = max(x, 0)

for positive inputs, the output is equal to

the input

for strictly negative inputs, the output is

equal to zero

overcomes the vanishing gradients problem

In PyTorch:

relu = [Link]()

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Introducing Leaky ReLU
Leaky ReLU:

For positive inputs, it behaves similarly to

ReLU

For negative inputs, it multiplies the input

by a small coefficient (defaulted to 0.01)

The gradients for negative inputs are never

null

In PyTorch:

leaky_relu = [Link](negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
A deeper dive into
neural network
architecture
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Layers are made of neurons
Linear layers are fully connected
Each neuron of a layer connected to each
neuron of previous layer

A neuron of a linear layer:

computes a linear operation using all
neurons of previous layer

contains N+1 learnable parameters

where N = dimension of previous layer's

outputs

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer naming convention

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Tweaking the number of hidden layers
Input and output layers dimensions are fixed.
input layer depends on the number of features n_features

output layer depends on the number of categories n_classes

model = [Link]([Link](n_features, 8),

[Link](8, 4),
[Link](4, n_classes))

We can use as many hidden layers as we want

Increasing the number of hidden layers = increasing the number of parameters = increasing
the model capacity

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Counting the number of parameters
Given the following model: Using PyTorch:

.numel() : returns the number of elements

model = [Link]([Link](8, 4),
in the tensor
[Link](4, 2))

total = 0
Manually calculating the number of
for parameter in [Link]():
parameters:
total += [Link]()
first layer has 4 neurons, each neuron has print(total)
8+1 parameters = 36 parameters
46
second layer has 2 neurons, each neuron
has 4+1 parameters = 10 parameters
total = 46 learnable parameters

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Learning rate and
momentum
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Updating weights with SGD
Training a neural network = solving an optimization problem.
Stochastic Gradient Descent (SGD) optimizer

sgd = [Link]([Link](), lr=0.01, momentum=0.95)

Two parameters:
learning rate: controls the step size

momentum: controls the inertia of the optimizer

Bad values can lead to:

long training times

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: optimal learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: small learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Impact of the learning rate: high learning rate

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Without momentum
lr = 0.01 momentum = 0 , after 100 steps minimum found for x = -1.23 and y = -0.14

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

With momentum
lr = 0.01 momentum = 0.9 , after 100 steps minimum found for x = 0.92 and y = -2.04

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Summary

Learning rate Momentum

Controls the step size Controls the inertia
Too small leads to long training Null momentum can lead to the optimizer being stuck in a
times local minimum
Too high leads to poor
Non-null momentum can help find the function minimum
performances

Typical values between 10−2

Typical values between 0.85 and 0.99
and 10−4

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH
Layers initialization,
transfer learning
and fine tuning
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

Senior Data Scientist
Layer initialization (1)
import [Link] as nn
layer = [Link](64, 128)
print([Link](), [Link]())

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

A layer weights are initialized to small values

The outputs of a layer would explode if the inputs and the weights are not normalized.

The weights can be initialized using different methods (for example, using a uniform
distribution)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Layer initialization (2)
import [Link] as nn

layer = [Link](64, 128)

[Link].uniform_([Link])

print(custom_layer.[Link](), custom_layer.[Link]())

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine tuning (1)
Transfer learning: reusing a model trained on a first task for a second similar task, to
accelerate the training process.

For example, we trained a first model on a large dataset of data scientist salaries across the
US and we want to train a new model on a smaller dataset of salaries in Europe.

import torch

layer = [Link](64, 128)

[Link](layer, '[Link]')

new_layer = [Link]('[Link]')

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Transfer learning and fine-tuning
Fine-tuning = A type of transfer learning
Smaller learning rate

Not every layer is trained (we freeze some of them)

Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer

import [Link] as nn

model = [Link]([Link](64, 128),

[Link](128, 256))

for name, param in model.named_parameters():

if name == '[Link]':
param.requires_grad = False

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Let's practice!
INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
26 pages
Training a Neural Network With PyTorch (Chapter3)
No ratings yet
Training a Neural Network With PyTorch (Chapter3)
31 pages
Deep Learning Basics with PyTorch
No ratings yet
Deep Learning Basics with PyTorch
50 pages
Deep Learning Basics with PyTorch
No ratings yet
Deep Learning Basics with PyTorch
35 pages
Deep Learning with PyTorch Overview
No ratings yet
Deep Learning with PyTorch Overview
30 pages
Kaiming Initialization in PyTorch
No ratings yet
Kaiming Initialization in PyTorch
37 pages
PyTorch Neural Network Guide for Beginners
No ratings yet
PyTorch Neural Network Guide for Beginners
17 pages
Deep Learning with PyTorch Guide
No ratings yet
Deep Learning with PyTorch Guide
34 pages
Deep Learning with PyTorch Guide
No ratings yet
Deep Learning with PyTorch Guide
34 pages
Build Your First Deep Neural Network
No ratings yet
Build Your First Deep Neural Network
27 pages
Deep Learning Forward Pass in PyTorch
No ratings yet
Deep Learning Forward Pass in PyTorch
35 pages
Deep Learning with PyTorch Lightning
No ratings yet
Deep Learning with PyTorch Lightning
786 pages
Choosing Activation Functions in Neural Networks
No ratings yet
Choosing Activation Functions in Neural Networks
8 pages
PyTorch 101 for Deep Learning PhD
No ratings yet
PyTorch 101 for Deep Learning PhD
19 pages
Deep Learning Forward Pass Explained
No ratings yet
Deep Learning Forward Pass Explained
35 pages
Forward Pass in PyTorch Explained
No ratings yet
Forward Pass in PyTorch Explained
35 pages
PyTorch Deep Learning Basics Guide
No ratings yet
PyTorch Deep Learning Basics Guide
8 pages
PyTorch Neural Network Training Guide
No ratings yet
PyTorch Neural Network Training Guide
48 pages
Train Your First Neural Network in PyTorch
No ratings yet
Train Your First Neural Network in PyTorch
68 pages
Intro to PyTorch for Deep Learning
No ratings yet
Intro to PyTorch for Deep Learning
7 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
41 pages
PyTorch Deep Learning Course Overview
No ratings yet
PyTorch Deep Learning Course Overview
6 pages
Introduction to PyTorch for ML Modeling
No ratings yet
Introduction to PyTorch for ML Modeling
45 pages
Lab 8: Introduction to PyTorch
No ratings yet
Lab 8: Introduction to PyTorch
11 pages
Esrgan PDF
No ratings yet
Esrgan PDF
14 pages
CIFAR-10 Classification with MLP
No ratings yet
CIFAR-10 Classification with MLP
15 pages
PyTorch Model Training Optimization Guide
No ratings yet
PyTorch Model Training Optimization Guide
44 pages
PyTorch Deep Learning Foundations Guide
No ratings yet
PyTorch Deep Learning Foundations Guide
25 pages
PyTorch Tensor Operations Guide
No ratings yet
PyTorch Tensor Operations Guide
35 pages
ESRGAN
No ratings yet
ESRGAN
21 pages
PyTorch Neural Networks Guide
No ratings yet
PyTorch Neural Networks Guide
4 pages
Beginner's Guide to Deep Learning with PyTorch
No ratings yet
Beginner's Guide to Deep Learning with PyTorch
1,309 pages
DL lab 3 - Neural Network with PyTorch
No ratings yet
DL lab 3 - Neural Network with PyTorch
6 pages
PyTorch Crash Course Overview
No ratings yet
PyTorch Crash Course Overview
15 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
8 pages
PyTorch Image Classification Guide
No ratings yet
PyTorch Image Classification Guide
40 pages
PyTorch Image Classification Guide
No ratings yet
PyTorch Image Classification Guide
40 pages
PyTorch Basics and Neural Network Training
No ratings yet
PyTorch Basics and Neural Network Training
67 pages
PyTorch Overview for Researchers
No ratings yet
PyTorch Overview for Researchers
7 pages
PyTorch Autoencoder Tutorial
No ratings yet
PyTorch Autoencoder Tutorial
7 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (9)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
PyTorch Tensors and Gradients Guide
No ratings yet
PyTorch Tensors and Gradients Guide
10 pages
Mastering PyTorch for Data Science
No ratings yet
Mastering PyTorch for Data Science
224 pages
PyTorch Tensor Operations Cheat Sheet
No ratings yet
PyTorch Tensor Operations Cheat Sheet
1 page
Deep Learning Fundamentals with PyTorch
No ratings yet
Deep Learning Fundamentals with PyTorch
44 pages
PyTorch Guide for Data Science Basics
No ratings yet
PyTorch Guide for Data Science Basics
30 pages
Deep Learning Fundamentals with PyTorch
No ratings yet
Deep Learning Fundamentals with PyTorch
108 pages
PyTorch Feedforward Neural Network Guide
No ratings yet
PyTorch Feedforward Neural Network Guide
13 pages
Curated PyTorch Resources List
No ratings yet
Curated PyTorch Resources List
19 pages
Introduction to PyTorch Basics
No ratings yet
Introduction to PyTorch Basics
25 pages
PyTorch Tensor Operations Cheat Sheet
No ratings yet
PyTorch Tensor Operations Cheat Sheet
7 pages
PyTorch Neural Network Cheat Sheet
No ratings yet
PyTorch Neural Network Cheat Sheet
1 page
PyTorch Optimization and Tensor Basics
No ratings yet
PyTorch Optimization and Tensor Basics
12 pages
Training Deep Neural Networks in PyTorch
No ratings yet
Training Deep Neural Networks in PyTorch
25 pages
PyTorch Model Training and Normalization
No ratings yet
PyTorch Model Training and Normalization
29 pages
06 Neural Networks
No ratings yet
06 Neural Networks
64 pages
Stochastic Gradient Descent Basics
No ratings yet
Stochastic Gradient Descent Basics
41 pages
Quantitative Trading with Python
100% (3)
Quantitative Trading with Python
273 pages
The Art and Science of Trading by Adam Grimes PDF
94% (69)
The Art and Science of Trading by Adam Grimes PDF
727 pages
Algorithmic and High-Frequency Trading
78% (18)
Algorithmic and High-Frequency Trading
360 pages
Algorithmic Trading Essentials in Python
100% (12)
Algorithmic Trading Essentials in Python
275 pages
Financial Econometrics Mathematics and Statistics Theory Method and Application Hardcovernbsped 1493994271 9781493994274 - Compress
100% (10)
Financial Econometrics Mathematics and Statistics Theory Method and Application Hardcovernbsped 1493994271 9781493994274 - Compress
657 pages
Smart Money Concepts for Trading Success
96% (52)
Smart Money Concepts for Trading Success
19 pages
Machine Learning in Finance: Matthew F. Dixon Igor Halperin Paul Bilokon
83% (12)
Machine Learning in Finance: Matthew F. Dixon Igor Halperin Paul Bilokon
565 pages
The Complete Breakout Trader Day Trading John Connors PDF
87% (39)
The Complete Breakout Trader Day Trading John Connors PDF
118 pages
The Only Technical Analysis Book You Will Ever Need
97% (30)
The Only Technical Analysis Book You Will Ever Need
143 pages
ICT Trading Bible: Your Essential Guide
95% (39)
ICT Trading Bible: Your Essential Guide
275 pages
Understanding SMC Trading Concepts
95% (443)
Understanding SMC Trading Concepts
33 pages
Transform Your Trading with Algorithms
100% (7)
Transform Your Trading with Algorithms
219 pages
Price Action Pattern 3.0 With Option Strategy 2.0
96% (54)
Price Action Pattern 3.0 With Option Strategy 2.0
140 pages
High Win Rate Day Trading Setups High Win Rate Day Trading - Robbinson, Marcel - 2022 - Anna's Archive
100% (12)
High Win Rate Day Trading Setups High Win Rate Day Trading - Robbinson, Marcel - 2022 - Anna's Archive
61 pages
Order Flow Trading Setups Guide
100% (29)
Order Flow Trading Setups Guide
161 pages
Gann Wave Order Block Trading Master Active Trading With Proven
100% (16)
Gann Wave Order Block Trading Master Active Trading With Proven
107 pages
Quantitative Finance With Python A Practical Guide To Investment Management, Trading and Financial Engineering (Chris Kelliher, Taylor Francis Group)
100% (12)
Quantitative Finance With Python A Practical Guide To Investment Management, Trading and Financial Engineering (Chris Kelliher, Taylor Francis Group)
698 pages
Beginners-Guide-To-Learn-Algorithmic-Trading 1
100% (24)
Beginners-Guide-To-Learn-Algorithmic-Trading 1
58 pages
Python for Finance & Trading Strategies
100% (7)
Python for Finance & Trading Strategies
354 pages
Barry Johnson - Algorithmic Trading & DMA PDF
100% (7)
Barry Johnson - Algorithmic Trading & DMA PDF
595 pages
Understanding Algorithmic Trading Basics
84% (25)
Understanding Algorithmic Trading Basics
120 pages
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
100% (1)
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
529 pages
(EN) Advance Smart Money Concept-Market Structure
95% (40)
(EN) Advance Smart Money Concept-Market Structure
66 pages
Simple Trading Strategy Guide
98% (50)
Simple Trading Strategy Guide
19 pages
Trading Volatility PDF
90% (10)
Trading Volatility PDF
317 pages
The Ultimate Guide To Price Action Trading PDF
94% (32)
The Ultimate Guide To Price Action Trading PDF
58 pages
Quantitative Finance with Python Guide
83% (6)
Quantitative Finance with Python Guide
681 pages
Complete Guide to Day Trading Basics
95% (84)
Complete Guide to Day Trading Basics
295 pages
Automated Trading Strategies PDF
91% (11)
Automated Trading Strategies PDF
64 pages
Machine Learning For Factor Investing Python Version 9780367639747 9780367639723 9781003121596 2023002044 - Compress
100% (3)
Machine Learning For Factor Investing Python Version 9780367639747 9780367639723 9781003121596 2023002044 - Compress
358 pages
IOQ Protocol for Biometric Access
No ratings yet
IOQ Protocol for Biometric Access
36 pages
Statistical Methods with R Lab Course
No ratings yet
Statistical Methods with R Lab Course
3 pages
SAU 2025-26 Second Merit List
No ratings yet
SAU 2025-26 Second Merit List
4 pages
Z12-Z37 Gear Specifications and Calculations
No ratings yet
Z12-Z37 Gear Specifications and Calculations
6 pages
Factorization and Graphing of Polynomials
No ratings yet
Factorization and Graphing of Polynomials
5 pages
Understanding Quadratic Equations
No ratings yet
Understanding Quadratic Equations
90 pages
Jamaneurology MHN 2026 Oi 250089 1767814965.36624
No ratings yet
Jamaneurology MHN 2026 Oi 250089 1767814965.36624
10 pages
Copper Determination in Brass by Volumetry
No ratings yet
Copper Determination in Brass by Volumetry
5 pages
Molar Volume of Hydrogen Gas Lab
No ratings yet
Molar Volume of Hydrogen Gas Lab
4 pages
Coefficient of Discharge in Venturimeter
No ratings yet
Coefficient of Discharge in Venturimeter
4 pages
Abubakar Tafawa Balewa University Course Registration
No ratings yet
Abubakar Tafawa Balewa University Course Registration
1 page
Class-XII-IP-First Pre Board
No ratings yet
Class-XII-IP-First Pre Board
7 pages
Homology Modeling with SWISS-MODEL
No ratings yet
Homology Modeling with SWISS-MODEL
10 pages
Sample Size Calculation Guide
No ratings yet
Sample Size Calculation Guide
2 pages
Quantitative Methods in Public Policy
No ratings yet
Quantitative Methods in Public Policy
4 pages
Geometry Lesson Plan for K Students
No ratings yet
Geometry Lesson Plan for K Students
3 pages
7-Segment Display in Logisim Guide
No ratings yet
7-Segment Display in Logisim Guide
39 pages
FCAW Settings for Dual Shield II 101H4M
No ratings yet
FCAW Settings for Dual Shield II 101H4M
2 pages
Noun Analysis and Sentence Structure
100% (1)
Noun Analysis and Sentence Structure
9 pages
Understanding Promissory Notes Basics
No ratings yet
Understanding Promissory Notes Basics
38 pages
School Management System in Python
No ratings yet
School Management System in Python
18 pages
Class 5 Maths Periodic Test II Paper
44% (9)
Class 5 Maths Periodic Test II Paper
5 pages
Min Hashing for Set Similarity
100% (1)
Min Hashing for Set Similarity
2 pages
IBPS RRB PO Reasoning Course Day 3
No ratings yet
IBPS RRB PO Reasoning Course Day 3
24 pages
Polynomials and Rational Expressions Guide
No ratings yet
Polynomials and Rational Expressions Guide
21 pages
GT650R Engine Oil Capacity Guide
No ratings yet
GT650R Engine Oil Capacity Guide
14 pages
Honey vs Betadine in Episiotomy Healing
No ratings yet
Honey vs Betadine in Episiotomy Healing
6 pages
PowerKitManager Startup Log Analysis
No ratings yet
PowerKitManager Startup Log Analysis
15 pages
Substrate-Integrated Antennas for THz Systems
No ratings yet
Substrate-Integrated Antennas for THz Systems
14 pages
Jatco JF613E Transaxle Overhaul Guide
100% (10)
Jatco JF613E Transaxle Overhaul Guide
80 pages

Deep Learning Activation Functions

Uploaded by

Deep Learning Activation Functions

Uploaded by

Discovering

Maham Faisal Khan

Bounded between 0 and 1

Can be used anywhere in the network

Approach zero for low and high values of x

Cause function to saturate

Sigmoid function saturation can lead to

This is also a problem for softmax.

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

for positive inputs, the output is equal to

for strictly negative inputs, the output is

overcomes the vanishing gradients problem

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

For positive inputs, it behaves similarly to

For negative inputs, it multiplies the input

The gradients for negative inputs are never

leaky_relu = [Link](negative_slope = 0.05)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

A neuron of a linear layer:

contains N+1 learnable parameters

where N = dimension of previous layer's

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

output layer depends on the number of categories n_classes

model = [Link]([Link](n_features, 8),

We can use as many hidden layers as we want

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

.numel() : returns the number of elements

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

sgd = [Link]([Link](), lr=0.01, momentum=0.95)

momentum: controls the inertia of the optimizer

Bad values can lead to:

bad overall performances (poor accuracy)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Learning rate Momentum

Typical values between 10−2

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Maham Faisal Khan

(tensor(-0.1250, grad_fn=<MinBackward1>), tensor(0.1250, grad_fn=<MaxBackward1>))

A layer weights are initialized to small values

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = [Link](64, 128)

(tensor(0.0002, grad_fn=<MinBackward1>), tensor(1.0000, grad_fn=<MaxBackward1>))

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

layer = [Link](64, 128)

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

Not every layer is trained (we freeze some of them)

model = [Link]([Link](64, 128),

for name, param in model.named_parameters():

INTRODUCTION TO DEEP LEARNING WITH PYTORCH

You might also like