0% found this document useful (0 votes)
329 views48 pages

PyTorch Neural Network Training Guide

Uploaded by

Da HUANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views48 pages

PyTorch Neural Network Training Guide

Uploaded by

Da HUANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

Pytorch Tutorial
TA : 曾元(Yuan Tseng)
2022.02.18
Outline
● Background: Prerequisites & What is Pytorch?
● Training & Testing Neural Networks in Pytorch
● Dataset & Dataloader
● Tensors
● [Link]: Models, Loss Functions
● [Link]: Optimization
● Save/load models
Prerequisites
● We assume you are already familiar with…
1. Python3
■ if-else, loop, function, file IO, class, ...
■ refs: link1, link2, link3
2. Deep Learning Basics
■ Prof. Lee’s 1st & 2nd lecture videos from last year
■ ref: link1, link2

Some knowledge of NumPy will also be useful!


What is PyTorch?
● An machine learning framework in Python.
● Two main features:
○ N-dimensional Tensor computation (like NumPy) on GPUs
○ Automatic differentiation for training deep neural networks
Training Neural Networks

Define Neural Optimization


Loss Function
Network Algorithm

Training

More info about the training process in last year's lecture video.
Training & Testing Neural Networks

Training Validation Testing

Guide for training/validation/testing can be found here.


Training & Testing Neural Networks - in Pytorch
Step 1.
[Link] &
Load Data [Link]

Training Validation Testing


Dataset & Dataloader
● Dataset: stores data samples and expected values
● Dataloader: groups data in batches, enables multiprocessing

● dataset = MyDataset(file)
● dataloader = DataLoader(dataset, batch_size, shuffle=True)

Training: True
Testing: False

More info about batches and shuffling here.


Dataset & Dataloader
from [Link] import Dataset, DataLoader

class MyDataset(Dataset):
def __init__(self, file):
[Link] = ... Read data & preprocess

def __getitem__(self, index):


return [Link][index] Returns one sample at a time

def __len__(self):
return len([Link]) Returns the size of the dataset
Dataset & Dataloader
dataset = MyDataset(file)

dataloader = DataLoader(dataset, batch_size=5, shuffle=False)

DataLoader
__getitem__(0) 0
__getitem__(1) 1
Dataset __getitem__(2) 2 batch_size
__getitem__(3) 3
__getitem__(4) 4
mini-batch
Tensors
● High-dimensional matrices (arrays)

1-D tensor 2-D tensor 3-D tensor


e.g. audio e.g. black&white e.g. RGB images
images
Tensors – Shape of Tensors
● Check with .shape

4
3

5
3
5 5
(5, ) (3, 5) (4, 5, 3)

dim 0 dim 0 dim 1 dim 0 dim 1 dim 2

Note: dim in PyTorch == axis in NumPy


Tensors – Creating Tensors
● Directly from data (list or [Link]) tensor([[1., -1.],
x = [Link]([[1, -1], [-1, 1]]) [-1., 1.]])

x = torch.from_numpy([Link]([[1, -1], [-1, 1]]))

● Tensor of constant zeros & ones tensor([[0., 0.],


[0., 0.]])
x = [Link]([2, 2])

x = [Link]([1, 2, 5]) tensor([[[1., 1., 1., 1., 1.],


shape [1., 1., 1., 1., 1.]]])
Tensors – Common Operations
Common arithmetic functions are supported, such as:

● Addition ● Summation

z = x + y y = [Link]()

● Subtraction ● Mean

z = x - y y = [Link]()

● Power

y = [Link](2)
Tensors – Common Operations
● Transpose: transpose two specified dimensions

>>> x = [Link]([2, 3])


2
>>> [Link]
3
[Link]([2, 3])

>>> x = [Link](0, 1)

>>> [Link] 3

[Link]([3, 2])
2
Tensors – Common Operations
● Squeeze: remove the specified dimension with length = 1

>>> x = [Link]([1, 2, 3])

>>> [Link] 1
3
2
[Link]([1, 2, 3])

>>> x = [Link](0)
(dim = 0)
>>> [Link] 2

[Link]([2, 3]) 3
Tensors – Common Operations
● Unsqueeze: expand a new dimension

>>> x = [Link]([2, 3]) 2


>>> [Link]
3
[Link]([2, 3])

>>> x = [Link](1) (dim = 1)

>>> [Link] 2

[Link]([2, 1, 3]) 3
1
Tensors – Common Operations
x 2
3
1

● Cat: concatenate multiple tensors


y 2
>>> x = [Link]([2, 1, 3])
3
3
>>> y = [Link]([2, 3, 3])

>>> z = [Link]([2, 2, 3]) z


2

>>> w = [Link]([x, y, z], dim=1) 3


2

>>> [Link]
w
[Link]([2, 6, 3]) 2
3
6
more operators: [Link]
Tensors – Data Type
● Using different data types for model and data will cause errors.

Data type dtype tensor

32-bit floating point [Link] [Link]

64-bit integer (signed) [Link] [Link]

see official documentation for more information on data types.


Tensors – PyTorch v.s. NumPy
● Similar attributes

PyTorch NumPy
[Link] [Link]
[Link] [Link]

see official documentation for more information on data types.

ref: [Link]
Tensors – PyTorch v.s. NumPy
● Many functions have the same names as well

PyTorch NumPy
[Link] / [Link] [Link]
[Link]() [Link]()
[Link](1) np.expand_dims(x, 1)

ref: [Link]
Tensors – Device
● Tensors & modules will be computed with CPU by default

Use .to() to move tensors to appropriate devices.


● CPU
x = [Link](‘cpu’)
● GPU
x = [Link](‘cuda’)
Tensors – Device (GPU)
● Check if your computer has NVIDIA GPU

[Link].is_available()

● Multiple GPUs: specify ‘cuda:0’, ‘cuda:1’, ‘cuda:2’, ...

● Why use GPUs?


○ Parallel computing with more cores for arithmetic calculations
○ See What is a GPU and do you need one in deep learning?
Tensors – Gradient Calculation
1 >>> x = [Link]([[1., 0.], [-1., 1.]], requires_grad=True)

2 >>> z = [Link](2).sum()

3 >>> [Link]()

4 >>> [Link]
1 2
tensor([[ 2., 0.],

[-2., 2.]])
3 4

See here to learn about gradient calculation.


Training & Testing Neural Networks – in Pytorch
Step 2.
[Link]
Load Data
Define Neural
Network

Loss Function Training Validation Testing

Optimization
Algorithm
[Link] – Network Layers
● Linear Layer (Fully-connected Layer)

[Link](in_features, out_features)

Input Tensor Output Tensor


[Link](32, 64)
* x 32 * x 64

can be any shape (but last dimension must be 32)


e.g. (10, 32), (10, 5, 32), (1, 1, 3, 32), ...
[Link] – Network Layers
● Linear Layer (Fully-connected Layer)

ref: last year's lecture video


[Link] – Neural Network Layers
● Linear Layer (Fully-connected Layer)

y1
x1

y2
x2

32 y3 64 W x x + b = y
x3 (64x32)
...

...

x32
y64
[Link] – Network Parameters
● Linear Layer (Fully-connected Layer)

>>> layer = [Link](32, 64)

>>> [Link]

[Link]([64, 32]) W x x + b = y
(64x32)
>>> [Link]

[Link]([64])
[Link] – Non-Linear Activation Functions
● Sigmoid Activation

[Link]()

● ReLU Activation

[Link]()

See here to learn about why we need activation functions.


[Link] – Build your own neural network
import [Link] as nn

class MyModel([Link]):
def __init__(self):
super(MyModel, self).__init__()
[Link] = [Link](
[Link](10, 32), Initialize your model & define layers
[Link](),
[Link](32, 1)
)

def forward(self, x):


Compute output of your NN
return [Link](x)
[Link] – Build your own neural network
import [Link] as nn import [Link] as nn

class MyModel([Link]): class MyModel([Link]):


def __init__(self): def __init__(self):
super(MyModel, self).__init__() super(MyModel, self).__init__()
[Link] = [Link]( self.layer1 = [Link](10, 32)
[Link](10, 32), self.layer2 = [Link](),
[Link](), = self.layer3 = [Link](32,1)
[Link](32, 1)
) def forward(self, x):
out = self.layer1(x)
def forward(self, x): out = self.layer2(out)
return [Link](x) out = self.layer3(out)
return out
Training & Testing Neural Networks – in Pytorch
Step 3.
[Link]
[Link] etc.
Load Data
Define Neural
Network

Loss Function Training Validation Testing

Optimization
Algorithm
[Link] – Loss Functions
● Mean Squared Error (for regression tasks)

criterion = [Link]()

● Cross Entropy (for classification tasks)

criterion = [Link]()

● loss = criterion(model_output, expected_value)


Training & Testing Neural Networks – in Pytorch
Step 4.
[Link]
Load Data
Define Neural
Network

Loss Function Training Validation Testing

Optimization
Algorithm
[Link]
● Gradient-based optimization algorithms that adjust network
parameters to reduce error. (See Adaptive Learning Rate lecture video)

● E.g. Stochastic Gradient Descent (SGD)

[Link]([Link](), lr, momentum = 0)


[Link]
optimizer = [Link]([Link](), lr, momentum = 0)

● For every batch of data:


1. Call optimizer.zero_grad() to reset gradients of model parameters.
2. Call [Link]() to backpropagate gradients of prediction loss.
3. Call [Link]() to adjust model parameters.

See official documentation for more optimization algorithms.


Training & Testing Neural Networks – in Pytorch

Load Data
Define Neural
Network

Loss Function Training Validation Testing

Optimization
Algorithm Step 5.
Entire Procedure
Neural Network Training Setup

dataset = MyDataset(file) read data via MyDataset

tr_set = DataLoader(dataset, 16, shuffle=True) put dataset into Dataloader

model = MyModel().to(device) construct model and move to device (cpu/cuda)

criterion = [Link]() set loss function

optimizer = [Link]([Link](), 0.1) set optimizer


Neural Network Training Loop
for epoch in range(n_epochs): iterate n_epochs

[Link]() set model to train mode

for x, y in tr_set: iterate through the dataloader

optimizer.zero_grad() set gradient to zero

x, y = [Link](device), [Link](device) move data to device (cpu/cuda)

pred = model(x) forward pass (compute output)

loss = criterion(pred, y) compute loss

[Link]() compute gradient (backpropagation)

[Link]() update model with optimizer


Neural Network Validation Loop
[Link]() set model to evaluation mode

total_loss = 0

for x, y in dv_set: iterate through the dataloader

x, y = [Link](device), [Link](device) move data to device (cpu/cuda)

with torch.no_grad(): disable gradient calculation

pred = model(x) forward pass (compute output)

loss = criterion(pred, y) compute loss

total_loss += [Link]().item() * len(x) accumulate loss

avg_loss = total_loss / len(dv_set.dataset) compute averaged loss


Neural Network Testing Loop
[Link]() set model to evaluation mode

preds = []

for x in tt_set: iterate through the dataloader

x = [Link](device) move data to device (cpu/cuda)

with torch.no_grad(): disable gradient calculation

pred = model(x) forward pass (compute output)

[Link]([Link]()) collect prediction


Notice - [Link](), torch.no_grad()
● [Link]()

Changes behaviour of some model layers, such as dropout and batch


normalization.

● with torch.no_grad()

Prevents calculations from being added into gradient computation


graph. Usually used to prevent accidental training on validation/testing
data.
Save/Load Trained Models
● Save

[Link](model.state_dict(), path)

● Load

ckpt = [Link](path)

model.load_state_dict(ckpt)
More About PyTorch
● torchaudio
○ speech/audio processing
● torchtext
○ natural language processing
● torchvision
○ computer vision
● skorch
○ scikit-learn + pyTorch
More About PyTorch
● Useful github repositories using PyTorch
○ Huggingface Transformers (transformer models: BERT, GPT, ...)
○ Fairseq (sequence modeling for NLP & speech)
○ ESPnet (speech recognition, translation, synthesis, ...)
○ Most implementations of recent deep learning papers
○ ...
References
● Machine Learning 2021 Spring Pytorch Tutorial
● Official Pytorch Tutorials
● [Link]
Any questions?

You might also like