0% found this document useful (0 votes)
18 views27 pages

Build Your First Deep Neural Network

The lecture covers the construction of a Deep Neural Network (DNN) using PyTorch, emphasizing the importance of depth and non-linearity in learning complex relationships. Key components include activation functions, particularly ReLU, and the architecture of a DNN, which consists of input, hidden, and output layers. The session also outlines the process of building, training, and evaluating a DNN with a practical example using the Fashion-MNIST dataset.

Uploaded by

Tahsin Nujum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views27 pages

Build Your First Deep Neural Network

The lecture covers the construction of a Deep Neural Network (DNN) using PyTorch, emphasizing the importance of depth and non-linearity in learning complex relationships. Key components include activation functions, particularly ReLU, and the architecture of a DNN, which consists of input, hidden, and output layers. The session also outlines the process of building, training, and evaluating a DNN with a practical example using the Fashion-MNIST dataset.

Uploaded by

Tahsin Nujum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CSE 5111: Deep Learning

Lecture 4: Building Our First Deep Neural Network [Class 9,10]


Master of Science in Computer Science & Engineering
Department of Computer Science and Engineering
Comilla University
Instructor: Mahmudul Hasan, PhD
Reference Text: Deep Learning (GBC) - Chapter 6.1, 6.3
Slide 2: Recap: The Complete Puzzle So Far

What We've Learned


We now have all the mathematical pieces:
1. Gradient Descent: The optimization
algorithm that minimizes loss.
2. Backpropagation: The efficient
algorithm for calculating gradients.
3. PyTorch Autograd: Automates
backpropagation for us.
Today's Mission: Assemble these pieces to build our first Deep Neural Network
(DNN) and understand why depth is powerful.
Slide 3: The Limitation of Linear Models
Why Go Deep? The Need for Non-Linearity
Problem: Single-layer networks (like linear regression) can
only learn linear relationships.
Real-World Example: The XOR Problem
• Can you separate True/False with a single straight line?
• Input: (0,0) → Output: 0
• Input: (0,1) → Output: 1
• Input: (1,0) → Output: 1
• Input: (1,1) → Output: 0
Answer: No! This is a fundamental limitation of linear models.
Slide 4: The Solution: Adding Layers & Non-Linearity
Building Complexity Step by Step
Think of it like this:
• Layer 1: Creates simple decision boundaries (straight
lines)
• Layer 2: Combines these lines to create more complex
shapes
• Layer 3: Combines those shapes to create even more complex regions
Analogy: Building with LEGO
• Single layer = Basic bricks
• Multiple layers = Complex structures from simple bricks
• Activation functions = The connectors that hold everything together
Slide 5: Activation Functions: The "Spark" of Neural Networks
What Are Activation Functions?
Activation functions determine whether a neuron
should "fire" or not. They introduce non-linearity!
Without activation functions:
• Deep network = Just multiple linear
transformations
• Multiple layers = Equivalent to a single layer
With activation functions:
• Deep network = Can learn complex, non-linear
relationships
Slide 6: Popular Activation Functions
Meet the Activation Function Family
1. Sigmoid
• σ(x) = 1/(1 + e⁻ˣ)

• Range: (0, 1)

• Problem: Vanishing gradients, not zero-centered

2. Tanh
• tanh(x) = (eˣ - e⁻ˣ)/(eˣ + e⁻ˣ)

• Range: (-1, 1)

• Better than sigmoid (zero-centered)

3. ReLU (Rectified Linear Unit)


• ReLU(x) = max(0, x)

• Range: [0, ∞)

• Default choice for most networks


Slide 7: Why ReLU is the Default Choice
The ReLU Revolution
Advantages:
• Computationally simple: Just max(0, x)
• Avoids vanishing gradient: Gradient is either 0 or 1
• Sparsity: About 50% of neurons can be inactive
Disadvantage:
• Dying ReLU: If inputs are always negative, neuron never
activates
Solution variants:
• Leaky ReLU: max(0.01x, x) - small slope for negative values

• Parametric ReLU (PReLU): Learn the slope


Slide 8: Architecture of a Deep Neural Network
Anatomy of a DNN
Input Layer → Hidden Layer 1 → Hidden Layer 2 → ... → Output Layer
↑ ↑ ↑ ↑
Raw Data Simple More Complex Final Prediction
Features Features
Key Components:
• Input Layer: Size = number of features
• Hidden Layers: Where the magic happens
• Output Layer: Size = number of classes (classification) or 1 (regression)
• Connections: Fully connected = each neuron connects to all neurons in next layer
Slide 9: Real-World Example: Image Classification
Hands-On Example: Fashion-MNIST
The Dataset:
• 70,000 grayscale images
• 10 categories (T-shirt, trousers, pullover, dress, etc.)
• 28×28 pixels = 784 features per image
Our Goal: Build a DNN that can classify clothing items!
Why this is perfect for learning:
• Simple enough to train quickly
• Complex enough to need a real neural network
Slide 10: Building a DNN in PyTorch: Step 1 - Imports
Setting Up Our Toolkit
python
import torch
import [Link] as nn
import [Link] as optim
import torchvision
import [Link] as transforms
import [Link] as plt

# Check if GPU is available


device = [Link]("cuda" if [Link].is_available() else "cpu")
print(f"Using device: {device}")
Key Imports:
• [Link]: Neural network modules

• [Link]: Optimization algorithms

• torchvision: Computer vision datasets


Slide 11: Building a DNN in PyTorch: Step 2 - Define the Model
Creating Our Neural Network Class
class FashionDNN([Link]):
def __init__(self):
super(FashionDNN, self).__init__()
[Link] = [Link](
# Input: 784 features (28x28 pixels)
[Link](784, 128), # First hidden layer
[Link](), # Activation function
[Link](128, 64), # Second hidden layer
[Link](), # Activation function
[Link](64, 10) # Output: 10 classes
)

def forward(self, x):


# Flatten the image from 28x28 to 784
x = [Link]([Link](0), -1)
return [Link](x)

# Create model and move to GPU


model = FashionDNN().to(device)
print(model)
Slide 12: Understanding [Link]
What is [Link]?
[Link] is a container that chains layers together:
Input → Linear(784,128) → ReLU() → Linear(128,64) → ReLU() →
Linear(64,10) → Output
It's like a pipeline:
• Data flows through each layer in order
• Output of one layer becomes input to the next
• Makes code clean and readable
Alternative: You can also define each layer separately and connect them manually in
the forward method.
Slide 13: Building a DNN: Step 3 - Prepare the Data
Getting Our Data Ready
python
# Transform: convert images to tensors and normalize
transform = [Link]([
[Link](),
[Link]((0.5,), (0.5,))
])

# Download and load training data


trainset = [Link](
root='./data', train=True, download=True, transform=transform)
trainloader = [Link](
trainset, batch_size=64, shuffle=True)

# Download and load test data


testset = [Link](
root='./data', train=False, download=True, transform=transform)
testloader = [Link](
testset, batch_size=64, shuffle=False)
Why batch_size=64?
• Training with mini-batches is more efficient
• Provides more stable gradient estimates
• Common sizes: 32, 64, 128, 256

Why Batch Sizes Are Often Powers of 2 in Deep Learning


In machine learning, particularly deep learning, the batch size refers to the number of training
samples processed together in one iteration before updating the model's parameters.

• GPU Parallelism: Aligns with GPU core counts (e.g., 32) for efficient processing.
• Memory Alignment: Matches memory page sizes to reduce padding and waste.
• Matrix Optimization: Speeds up cuDNN matrix operations for multiples of 8.
Slide 14: Building a DNN: Step 4 - Loss Function & Optimizer
Choosing the Right Tools
# Loss function for classification
criterion = [Link]()

# Optimizer - Adam is usually a good default


optimizer = [Link]([Link](), lr=0.001)

# Learning rate scheduler (optional but helpful)


scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10,
gamma=0.1)

Why CrossEntropyLoss?
• Perfect for multi-class classification

• Combines softmax and negative log likelihood

• Handles probabilities nicely


Why Adam?
a) Adaptive Learning Rates: Adam adjusts learning rates for each parameter using estimates
of first and second moments, leading to faster convergence.
b) Efficiency with Large Datasets: It performs well with large-scale data and parameters,
requiring less memory than other optimizers.
c) Robust to Noisy Gradients: Adam handles noisy or sparse gradients effectively, making it
suitable for complex, non-convex problems.
d) Combines Momentum and RMSProp: By integrating momentum and RMSProp, Adam
balances speed and stability in optimization.
Slide 15: The Complete Training Loop
Putting It All Together: Training
python
def train_model(model, trainloader, criterion, optimizer, epochs=10):
[Link]() # Set model to training mode
train_losses = []

for epoch in range(epochs):


running_loss = 0.0
for images, labels in trainloader:
# Move data to GPU
images, labels = [Link](device), [Link](device)

# Zero the gradients


optimizer.zero_grad()

# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)

# Backward pass and optimize


[Link]()
[Link]()

running_loss += [Link]()

epoch_loss = running_loss / len(trainloader)


train_losses.append(epoch_loss)
print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}')

return train_losses

# Train the model!


loss_history = train_model(model, trainloader, criterion, optimizer)
Slide 16: Understanding Model Training vs Evaluation Modes
[Link]() vs [Link]()
Training Mode ([Link]()):
• Enables dropout and batch normalization

• Tracks gradients for backpropagation

• Used during training

Evaluation Mode ([Link]()):


• Disables dropout and uses full network

• Uses running statistics for batch norm

• No gradient tracking (saves memory)

• Used during testing/validation

python
# For testing:
[Link]()
with torch.no_grad(): # No gradients needed for testing
test_outputs = model(test_images)

# Back to training:
[Link]()
Slide 17: Evaluating Our Model
How Good is Our Model?
python
def evaluate_model(model, testloader):
[Link]()
correct = 0
total = 0

with torch.no_grad(): # No gradients needed = faster!


for images, labels in testloader:
images, labels = [Link](device), [Link](device)
outputs = model(images)
_, predicted = [Link]([Link], 1)
total += [Link](0)
correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total


print(f'Test Accuracy: {accuracy:.2f}%')
return accuracy

# Test our trained model


accuracy = evaluate_model(model, testloader)
What's happening here:
• [Link](outputs, 1): Get the predicted class (highest probability)
• Compare predictions with true labels
• Calculate percentage of correct predictions
Slide 18: Visualizing Training Progress
Learning Curves: Our Training Report Card
python
[Link](loss_history)
[Link]('Training Loss Over Time')
[Link]('Epoch')
[Link]('Loss')
[Link](True)
[Link]()
What to look for:
• Good: Smooth, steady decrease in loss
• Bad: Loss oscillating wildly (learning rate too high)
• Bad: Loss not decreasing (learning rate too low or model too simple)
• Bad: Loss suddenly becomes NaN (exploding gradients)
Typical results: You should see loss drop from ~2.0 to ~0.3 in 10 epochs!
Slide 19: Making Predictions
Using Our Trained Model
# Get a batch of test images
dataiter = iter(testloader)
images, labels = next(dataiter)
images, labels = [Link](device), [Link](device)

# Make predictions
[Link]()
with torch.no_grad():
outputs = model(images)
_, predictions = [Link](outputs, 1)

# Display results
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

for i in range(5): # Show first 5 predictions


print(f'Predicted: {class_names[predictions[i]]}, '
f'Actual: {class_names[labels[i]]}')
Slide 20: Lab 2 Preview: Your Turn!
Lab 2: Implement and Train a DNN
Your Tasks:
1. Implement the DNN architecture we just built
2. Experiment with different architectures:
o Try different numbers of hidden layers

o Try different numbers of neurons per layer

o Try different activation functions

3. Tune hyperparameters:
o Learning rate (try 0.1, 0.01, 0.001)

o Batch size (try 32, 64, 128)

4. Achieve at least 85% test accuracy


Due Date: [Insert your due date here]
Slide 21: Key Takeaways
What We Learned Today
1. Why Depth Matters: Deep networks can learn complex, non-linear relationships
2. Activation Functions: ReLU is the default choice for hidden layers
3. DNN Architecture: Input → Hidden Layers → Output
4. PyTorch Workflow:
o Define model as [Link] subclass
o Use [Link] for simple architectures

o Choose appropriate loss function and optimizer

o Implement training loop

o Evaluate on test data

You now have everything needed to build and train real neural networks!
Slide 22: What's Next?
Preview of Lecture 5

• Problem: Our DNN treats images as flat


vectors - it ignores spatial structure!
• Solution: Convolutional Neural
Networks (CNNs)
• Topics:
o Convolution operation
o Pooling layers
o Building CNNs in PyTorch
o Transfer learning
• Reading: GBC, Chapter 9
Slide 23: References & Questions
References & Resources
1. Primary: GBC, Chapter 6.1, 6.3
2. PyTorch Tutorials: "Deep Learning with PyTorch: A 60 Minute Blitz"
3. Dataset: Fashion-MNIST documentation
4. Visualization: TensorBoard for tracking experiments

You might also like