0% found this document useful (0 votes)

15 views42 pages

Deep Learning Fundamentals and Techniques

The document provides an overview of deep learning, a subfield of machine learning that utilizes neural networks with multiple layers to learn complex features from data. It discusses the differences between shallow and deep learning, various learning approaches, applications, and the challenges faced in deep learning optimization, including issues like overfitting and the need for large datasets. Additionally, it covers optimization techniques, the importance of batch processing, and the significance of random sampling in training deep learning models.

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views42 pages

Deep Learning Fundamentals and Techniques

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Deep

Learning
Introduction
• Machine Learning enables systems (shallow or deep) to learn from
data and improve with experience, similar to humans. Raw data is
processed to extract useful information for decision-making.

• Types of Learning Approaches

1) Supervised
2) Unsupervised
3)Semi-Supervised Learning
4)Reinforcement Learning
Shallow Learning
• Shallow learning refers to machine learning models that uses
relatively simple architecture usually consisting of only one or two
layers of feature transformation before making predictions.

• Input -> Feature Engineering -> Model -> output.

• Deep Learning -> multi layered model +automatic features -> good for
complex unstructured data.
• Difference: Shallow vs. Deep Learning
• Shallow Learning:
• Uses only 1–2 layers.
• Works for simple tasks.
• Example: Linear Regression, Decision Trees, SVMs.
• Deep Learning:
• Uses tens or hundreds of layers.
• Learns complex features automatically.
• Example: Image classification, Speech recognition, ChatGPT itself.
Deep Learning

• Deep Learning is a subfield of Machine Learning

Uses neural networks with multiple hidden layers
• Learns hierarchical representations of data
• Replaces manual feature engineering with
automatic learning
• Layer 1 (low-level): detects edges, gradients, textures.
• Layer 2 (mid-level): combines edges into motifs like corners, contours.
• Layer 3+ (high-level): captures object parts, semantics (face, car,
digit).
• Similar to human brain’s visual cortex organization.
Applications

• Computer Vision: face recognition, object detection, medical imaging.

• Natural Language Processing: translation, sentiment analysis,
chatbots.
• Speech Processing: voice recognition, speaker identification.
• Autonomous Systems: self-driving cars, robotics control.
Why to Use Deep Learning?

•Automatic feature extraction → Learns useful representations directly from raw data.
•High representational power → Captures complex nonlinear relationships.
•Scales effectively → Performs better with large datasets and powerful compute resources.
•Transfer learning advantage → Reuse pre-trained models across tasks, reducing effort.
•Domain adaptability → Works across multiple applications with minimal customization.
•Superior performance → Achieves state-of-the-art results in vision, speech, and NLP.
•Outperforms traditional ML → Higher accuracy, better generalization, less manual engineering.
1.5 How Deep Learning Works
• Deep networks map input to target via a sequence of layered transformations, and that these
layered transformations are learned by exposure to the training examples.
• Transformations implemented by a layer are parameterized by its weights.
• Learning can be defined as the process of finding the values of the weights of all layers in the
network in such a manner that input examples can be correctly mapped to their associated
targets.
• A deep learning network contains thousands of parameters, and finding the right values of these
parameters is not an easy task, particularly when the value of one parameter has an impact on
the value of another parameter.
• In order to train a deep network one needs to find out how far the calculated output of the
network is from the desired value. This measure is obtained by using a loss function, also called
as objective function.
• The objective of the training is to find the values for the weights that minimize the chosen error
function.
• The difference obtained is then used as a feedback signal to adjust the weights of the
network, in a way that loss score for the current example is lowered. This adjustment
is done by the optimizer—backpropagation algorithm, the central algorithm in deep
learning.
• Backpropagation algorithm involves assigning random values to the weight vectors
initially, so that the network just implements a series of random transformations.
Initially, the output obtained from the network can be far from what it should be, and
accordingly the loss score may be very high.
• With every example that is fed to the network, the weights are adjusted in such a
direction that makes the loss score to decrease.
• This process is repeated a number of times, until the weight values that minimize the
loss function are obtained. A network is said to have learned when the output values
obtained from the network are as close as they can be to the target values
1.6 Challenges in Deep Learning

• Vanishing/exploding gradients → solved with ReLU, residuals, batch

norm.
• Requires large datasets and high compute (GPUs/TPUs).
• Overfitting risks due to high parameter count.
• Optimization is difficult (complex loss surfaces).
• Interpretability: Deep models are often black-boxes.
• Deployment concerns: latency, memory, robustness.
Optimization for Training Deep
Models
• 1. Optimization in Deep Learning
• Optimization means finding the best values for parameters of a model (like
weights of a neural network) so that the model performs well on a given task.
• Example: In PCA (Principal Component Analysis), optimization is used to find
directions (principal components) that maximize variance.
• In neural networks, optimization is about minimizing a cost function (also
called loss function) like cross-entropy or mean squared error.
• But compared to classical methods (like PCA), optimization in deep learning is
much harder and requires special techniques.
• This chapter focuses on one particular case of optimization: finding the
parameters θ of a neural network that significantly reduce a cost function
J(θ), which typically includes a performance measure evaluated on the entire
training set as well as additional regularization terms.
• Why Optimization for Deep Models is Difficult
• Training deep neural networks is not just a math exercise—it’s computationally
very expensive and comes with challenges:
• High cost: Training can take days or months on clusters of GPUs/TPUs.
• Non-convex cost function: Unlike simple convex problems (like linear regression),
neural networks have many local minima and saddle points.
• Large parameter space: Modern networks have millions or billions of parameters.
• Vanishing/Exploding gradients: Gradients can become too small or too large,
making training unstable.
• Overfitting: Optimization must balance between fitting training data and
generalizing (hence regularization).
How Learning Differs from Pure
Optimization
•Pure Optimization:
•Goal = minimize a function J(θ) directly, with no other concerns.
•Example: Finding the minimum of a quadratic function.
•Machine Learning Optimization:
•Goal = improve performance P (e.g., test accuracy, F1 score).
•But we can’t optimize P directly because it’s usually:
•Defined on the test set (which we don’t use during training).
•Sometimes intractable (too complex to compute exactly).
•So instead, we optimize a proxy cost function J(θ)(training loss)
with the hope that reducing J(θ) also improves P.
• The Cost Function in Machine Learning
• The standard cost function used in training is written as:
• J(θ)=E(x,y)∼p^dataL(f(x;θ),y)
• where:
• θ= model parameters (weights, biases).
• f(x;θ) = model’s prediction for input x.
• y = true label (in supervised learning).
• L(⋅) = loss function (per-example error, e.g., cross-entropy, MSE).
• p^data= empirical distribution (training dataset).
• So, training minimizes average loss over the training set.
• Ideal Case: True Data Distribution
• In reality, what we would like to minimize is:
• J∗(θ)=E(x,y)∼ p^data L(f(x;θ),y)
• where:
• p^data = true data distribution (all possible examples, not just training).
• But we only have access to the finite training set, so we use the
approximation J(θ).
• This is why generalization matters: minimizing training loss does not
guarantee test performance.
Empirical Risk Minimization
• The Goal of Machine Learning
• The ultimate goal is to minimize the expected generalization risk, i.e.:
• R(θ)=E(x,y)∼pdataL(f(x;θ),y) where:
• f(x;θ) = model’s prediction.
• L(⋅) = loss function (error per sample).
• P^data= true underlying distribution of data.
• Problem: we don’t know p^data. We only have a finite dataset.
• From True Risk to Empirical Risk
• Since we can’t compute expectation over the unknown p^data, we
approximate it with the empirical distribution p^data, based on training
samples.
• This gives the empirical risk:
• R^(θ)=1 /m∑L(f(x(i);θ),y(i)) where:
• m = number of training examples.
• (x(i),y(i)) = training samples.
• This is simply the average training loss.
• So ERM = minimize training error, hoping it also reduces test error.
• Problems with ERM
• Even though ERM looks simple, in deep learning it’s not ideal:
• (a) Overfitting
• High-capacity models (deep nets) can just memorize the training set.
• Minimizing empirical risk too aggressively → poor generalization.
• (b) Non-differentiable loss functions
• Many useful loss functions (e.g., 0-1 loss: counts misclassifications) are not differentiable.
• Gradient descent requires smooth loss functions.
• 0-1 loss → derivative = 0 or undefined → not usable.
• (c) Gap between training and true objective
• What we really want to minimize = true risk over pdatap_{data}pdata.
• What we minimize = empirical risk over finite training samples.
• These can diverge, especially with limited data.
Surrogate Loss Functions and
Early Stopping
• Definition
• A surrogate loss function is an alternative loss function that is easier to optimize
than the one we actually care about.
• It serves as a proxy (replacement) for the true loss.
• It is chosen because it is differentiable, smooth, and works well with gradient-
based methods.
• Example:
• The true goal in classification: minimize 0–1 loss (just count how many
predictions are wrong).
• Problem: 0–1 loss is non-differentiable and computationally intractable.
• Solution: Use cross-entropy (negative log-likelihood) as a surrogate.
• Why do we use surrogates?
[Link] difficulty:
1. 0–1 loss is a step function → gradient is 0 or undefined → gradient descent
can’t work.
[Link] advantages:
1. Smooth, differentiable → works well with gradient-based optimizers.
2. Often provides more useful information (like confidence of prediction).
• Early Stopping
Before Gradient loss reaches 0 we stop just to avoid overfitting.
• We want to compare pure optimization and learning optimization.
1) Pure Optimization Case:
Minimize the function

Find the value of that minimizes .

• 2) Learning Optimization Case:
Suppose the true relationship between input and output is:

but we only observe noisy data:

Batch and Minibatch Algorithms
• One aspect of machine learning algorithms that separates them from
general optimization algorithms is that the objective function usually
decomposes as a sum over the training examples.
• In machine learning, the objective function (like likelihood or loss)
usually splits into a sum over all training examples.
• Maximum Likelihood Estimation(MLE) :
• In most machine learning problems, the objective function (cost or
likelihood) is the sum of contributions from each training example.
• Formula: θML=argθmaxi∑logpmodel(x(i),y(i);θ)

• Expectation Form of the Objective:

• Instead of summing explicitly, we view it as an expectation under the
empirical data distribution
• J(θ)=Ex,y∼p^datalogpmodel(x,y;θ)
• Gradient of the Objective:
• Optimization needs the gradient of the objective
• ∇θJ(θ)=Ex,y∼p^data∇θlogpmodel(x,y;θ)
• Statistical Error of Sampling:
• The standard error (uncertainty) of estimating gradient with samples is:SE=σ
/√n
• : true standard deviation.
• : number of samples.
• Redundancy in Dataset:
• If dataset has repeated or highly similar examples, computing gradient on all
is wasteful.
• Sampling avoids redundancy.
Types of Optimization
Algorithms
1) Batch Gradient Descent (BGD):
Uses entire dataset for every update.
•Pros: Accurate gradient.
•Cons: Very slow for large datasets.
2) Stochastic Gradient Descent (SGD):
Uses one sample per update.
•Pros: Very fast.
•Cons: Gradient very noisy.
3) Minibatch Gradient Descent:
Uses small batch (e.g., 32–256).
•Pros: Balance between speed & stability.
•Cons: Still approximate, but works best in practice.
Factors Affecting Minibatch Size
• Gradient accuracy: Larger batches reduce noise but with diminishing
returns.
• Hardware: Very small batches underutilize CPU/GPU.
• Memory: Batch size limited by GPU/TPU memory.
• GPU efficiency: Powers of 2 (32, 64, 128, 256) often best.
• Generalization effect:
• Small batches add noise → acts as regularization.
• Sometimes improves test accuracy.
• Too small (batch size = 1) requires very small learning rate → inefficient.
Method Batch Size Pros Cons Use Case

Accurate gradient, Very slow, memory-

Batch GD All data Small datasets
stable heavy

Fast updates, strong Noisy gradients, Online learning,

SGD 1
regularization unstable streaming data

Efficient on GPUs,
Gradient still Standard choice in
Minibatch GD 32–256 balance of speed &
approximate deep learning
accuracy
Sensitivity of Algorithms
• First-order methods (e.g., SGD): Only need gradient .
• Robust, work fine with small batches (~100).
• Second-order methods (e.g., Newton’s method): Use
Hessian .
• Update rule: .
• Require huge batches (~10,000), otherwise errors in
get amplified.
Importance of Random Sampling
• If minibatches are not random, gradients may be biased.
• Example: If dataset grouped by patient → one minibatch may contain only one
patient’s data.
• Solution: Shuffle dataset before training.
Large datasets: Even shuffling once is enough; true randomness not always needed.

Parallelization
• Different minibatches can be computed in parallel (asynchronous/distributed
training).
• Crucial for very large-scale learning.
Minibatch SGD and Generalization Error
• In theory, SGD with minibatches follows the gradient of true generalization error
(how well model performs on unseen data).
• This is true only if no examples are repeated (like in online learning).
• But in practice, we reuse dataset multiple epochs → slight bias, but training error
reduction outweighs it

Online Learning
• Special case: Data arrives as a stream (never repeats).
• Learner updates model in real-time (like humans learning continuously).
• 👉 Use case: Real-time applications (stock prices, user activity streams).
• Extremely Large Datasets
• When datasets are huge (billions of examples):
• Sometimes only one pass (or less) through data is possible.
• Overfitting not a problem → main issue is computation efficiency &
underfitting.

• From cost function → to expectation → to gradient → to computation

challenge → to sampling idea → to batch types → to minibatch
considerations → to randomness → to parallelization & generalization → to
very large datasets.
Challenges in Neural Network
Optimization
1) Ill-Conditioning
• What it means:
In optimization, the Hessian matrix (matrix of 2nd derivatives) can be ill-conditioned –
meaning some directions have very steep curvature and some are very flat.
Think of a long, narrow valley: moving straight down is easy, but moving across is very slow.
• Effect on training:
• SGD may get “stuck”: even small steps make the cost increase.
• The learning rate must be very small, slowing down learning.
• Gradient norms (‖g‖²) don’t shrink much, while curvature term (gᵀHg) grows → unstable training.
• Key idea:
Even with strong gradients, training becomes slow because the curvature forces small steps.
• Why Newton’s method fails here:
Newton’s method works well in convex problems, but in neural nets (non-convex, huge
parameter space) it needs major modification before being useful.
• 8.2.2 Local Minima
• Convex case:
Any local minimum is also the global minimum → no problem.
• Neural networks (non-convex):
• Many local minima exist because of weight space symmetry:
• E.g., swapping two hidden units with their input/output weights doesn’t change the function → many equivalent
minima.
• Scaling symmetries in ReLU/Maxout also create infinite equivalent minima.
• Are local minima dangerous?
• Early belief: Yes, they trap optimization.
• Now: Not really – most local minima in large networks have low cost and are “good enough.”
• What matters is reaching a point with low training loss, not the true global minimum.
• Test:
If gradient norm doesn’t shrink to ~0, the problem is not due to local minima.
• 8.2.3 Plateaus, Saddle Points, and Flat Regions
• Saddle point = a point where gradient = 0, but:
• In some directions cost increases (like a hill).
• In other directions cost decreases (like a valley).
• Example: a horse saddle – up in one direction, down in another.
• High-dimensional fact:
• In low dimensions → local minima common.
• In high dimensions → saddle points much more common than local minima.
• The ratio grows exponentially with dimension.
• Implication:
• Optimization often slows near saddle points because gradients vanish.
• SGD can usually escape, but second-order Newton’s method may get stuck (since it looks for zero gradient points).
• Visuals:
• Training often shows “flat valleys” (plateaus) instead of deep pits.
• Much time is wasted crossing wide flat areas.
• Cliffs and Exploding Gradients
• What happens:
In deep/recurrent nets, multiplying many large weights creates very steep
surfaces (“cliffs”).
• Small movement in parameters → huge jump in cost.
• Gradient step can overshoot → undoing previous learning.
• Solution:
• Gradient clipping: Limit the maximum step size.
• Keeps learning stable near cliffs.
• Most common in:
Recurrent Neural Networks (RNNs) due to repeated multiplications across
many time steps.
• 8.2.5 Long-Term Dependencies
• Problem:
In very deep nets or RNNs, gradients pass through long computational graphs.
• Equivalent to multiplying by Wᵗ (matrix to the power t).
• Eigenvalues > 1 → exploding gradients.
• Eigenvalues < 1 → vanishing gradients.
• Effect:
• Vanishing → network forgets long-term info (can’t learn dependencies across long sequences).
• Exploding → unstable updates, cost jumps.
• Examples:
• RNN trying to learn dependencies across 100 time steps.
• Only recent steps affect learning; distant steps vanish.
• Solutions (not in your section but practical):
• LSTM/GRU (special RNN architectures).
• Proper initialization.
• Gradient clipping.
• 8.2.6 Inexact Gradients
• Ideal theory: Optimization assumes exact gradient/Hessian.
• Reality in deep learning:
• Gradients are estimated using mini-batches → noisy, approximate.
• Some models (e.g., Boltzmann machines) have intractable gradients, so
approximations (like Contrastive Divergence) are used.
• Effect:
• Learning becomes noisy, not perfectly downhill.
• Still works if estimates are unbiased on average.
• Sometimes surrogate loss functions are used that are easier to optimize.
• 8.2.7 Poor Correspondence Between Local and Global Structure
• Main idea:
Even if you can move locally in the best direction, it may not lead toward the global solution.
• Example:
• Imagine being on the wrong side of a mountain.
• Gradient descent keeps pushing you downhill locally, but you can’t cross the mountain to reach the
global valley.
• You must go around, taking a very long trajectory.
• Observation:
• Neural networks often don’t converge to a critical point at all (gradient never shrinks fully).
• Training paths are often long arcs around obstacles.
• Research direction:
• Instead of new algorithms, much focus is on finding good initializations so that local descent naturally
leads to good solutions.
• 8.2.8 Theoretical Limits of Optimization
• What theory says:
• Some results prove neural net training is NP-hard (Judd, 1989; Blum & Rivest, 1992).
• “No free lunch theorem”: No optimization algorithm is best for all problems (Wolpert &
Macready, 1997).
• But in practice:
• These worst-case results don’t stop us from training useful networks.
• Why?
• We don’t need the exact minimum, only a low-enough value for good generalization.
• Larger networks often make finding a good solution easier (more parameter settings give acceptable results).
• Most practical problems are not worst-case.
• Key takeaway:
Theory gives pessimistic limits, but in real life, neural nets are trainable and effective.
Challenge Cause Effect on Training Typical Solution

Ill-conditioning Hessian has very different Slow learning, tiny steps Preconditioning, adaptive
curvatures optimizers

Local minima Symmetries & non- Many minima exist, but Large nets, good init, test
convexity usually low-cost gradient norm

Saddle points & plateaus Zero-gradient regions in Slowdown, stuck SGD noise helps escape,
high-dim avoid Newton
Cliffs & exploding grads Multiplying large weights Overshooting, unstable Gradient clipping

Long-term dependencies Repeated multiplication Vanishing/exploding LSTM/GRU, clipping

(RNNs) gradients

Inexact gradients Mini-batch estimates or Noisy updates Larger batches, surrogate

intractable loss loss
Poor global vs local Local steps don’t lead Long, inefficient Good initialization
structure globally trajectories

Theoretical limits NP-hardness, no free Exact minimization Approximate “good

lunch impossible enough” solutions

Deep Learning Model Optimization Techniques
No ratings yet
Deep Learning Model Optimization Techniques
75 pages
Optimization Challenges in Deep Learning
No ratings yet
Optimization Challenges in Deep Learning
15 pages
Optimization Techniques for Deep Learning
No ratings yet
Optimization Techniques for Deep Learning
18 pages
Deep Learning vs. Shallow Learning Explained
No ratings yet
Deep Learning vs. Shallow Learning Explained
8 pages
Neural Network Training Techniques
No ratings yet
Neural Network Training Techniques
25 pages
Neural Network Training Techniques
No ratings yet
Neural Network Training Techniques
25 pages
DL 12
No ratings yet
DL 12
55 pages
Introduction to Deep Learning Basics
100% (3)
Introduction to Deep Learning Basics
49 pages
Deep Learning Model Optimization Guide
No ratings yet
Deep Learning Model Optimization Guide
5 pages
Neural Network Training Techniques
No ratings yet
Neural Network Training Techniques
25 pages
Bai701 DLRL Ia1 Scheme
No ratings yet
Bai701 DLRL Ia1 Scheme
12 pages
Deep Learning Overview and Applications
No ratings yet
Deep Learning Overview and Applications
49 pages
Loss Function Optimization in Neural Networks
100% (1)
Loss Function Optimization in Neural Networks
24 pages
Optimization in Deep Learning
No ratings yet
Optimization in Deep Learning
21 pages
Deep Learning Fundamentals with PyTorch
No ratings yet
Deep Learning Fundamentals with PyTorch
108 pages
Deep Learning Fundamentals Overview
100% (1)
Deep Learning Fundamentals Overview
26 pages
Deep Learning Neural Networks Overview
No ratings yet
Deep Learning Neural Networks Overview
18 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
14 pages
Deep Learning Optimization Challenges
No ratings yet
Deep Learning Optimization Challenges
12 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
18 pages
APS360 Deep Learning Syllabus Overview
No ratings yet
APS360 Deep Learning Syllabus Overview
56 pages
Backpropagation in Deep Learning
No ratings yet
Backpropagation in Deep Learning
27 pages
Deep Learning Specialization Notes
100% (5)
Deep Learning Specialization Notes
173 pages
Deep Learning Techniques Overview
100% (1)
Deep Learning Techniques Overview
299 pages
Deep Learning: Problem Solving Techniques
No ratings yet
Deep Learning: Problem Solving Techniques
63 pages
Deep Learning: Supervised vs. Unsupervised
No ratings yet
Deep Learning: Supervised vs. Unsupervised
48 pages
Deep Learning Optimization Techniques
No ratings yet
Deep Learning Optimization Techniques
48 pages
Gradient Descent in Neural Network Optimization
No ratings yet
Gradient Descent in Neural Network Optimization
33 pages
Deep Learning Overview and Challenges
No ratings yet
Deep Learning Overview and Challenges
13 pages
Neural Network Training Techniques
No ratings yet
Neural Network Training Techniques
36 pages
Deep Learning Optimization Techniques
No ratings yet
Deep Learning Optimization Techniques
60 pages
Normalization Methods in Deep Learning
No ratings yet
Normalization Methods in Deep Learning
50 pages
Perceptron and Loss Functions Explained
No ratings yet
Perceptron and Loss Functions Explained
9 pages
Deep Learning: Neural Networks Overview
No ratings yet
Deep Learning: Neural Networks Overview
37 pages
Gradient Descent in Deep Learning
No ratings yet
Gradient Descent in Deep Learning
28 pages
Deep Learning Concepts Overview
No ratings yet
Deep Learning Concepts Overview
20 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
4 pages
Deep Learning: History and Techniques
No ratings yet
Deep Learning: History and Techniques
27 pages
Deep Learning: Gradient-Based Optimization
No ratings yet
Deep Learning: Gradient-Based Optimization
64 pages
Neural Network Optimization Techniques
No ratings yet
Neural Network Optimization Techniques
50 pages
Universal Approximation Theorem Explained
No ratings yet
Universal Approximation Theorem Explained
67 pages
Deep Learning With Python 1st Edition François Chollet Online Version
No ratings yet
Deep Learning With Python 1st Edition François Chollet Online Version
84 pages
Deep Learning Course Overview
100% (4)
Deep Learning Course Overview
100 pages
Deep Learning Model Optimization Strategies
100% (1)
Deep Learning Model Optimization Strategies
81 pages
Understanding Deep Neural Networks
No ratings yet
Understanding Deep Neural Networks
153 pages
Deep Learning Hand Book 2024
100% (1)
Deep Learning Hand Book 2024
185 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
64 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
43 pages
Sparse Modeling in Machine Learning
100% (1)
Sparse Modeling in Machine Learning
61 pages
Deep Feedforward Neural Networks
No ratings yet
Deep Feedforward Neural Networks
97 pages
Deep Learning and Reinforcement Learning Course
No ratings yet
Deep Learning and Reinforcement Learning Course
16 pages
MapReduce Programming Overview
No ratings yet
MapReduce Programming Overview
39 pages
Deep Learning Experiments and Projects
No ratings yet
Deep Learning Experiments and Projects
15 pages
AI & ML Curriculum Overview 2023-24
No ratings yet
AI & ML Curriculum Overview 2023-24
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
61 pages
Introduction to MongoDB Features
No ratings yet
Introduction to MongoDB Features
50 pages
E-Waste Management in India: Overview & Regulations
No ratings yet
E-Waste Management in India: Overview & Regulations
4 pages
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
92 pages
Understanding Big Data and Its Types
No ratings yet
Understanding Big Data and Its Types
57 pages
Understanding Hadoop for Big Data
No ratings yet
Understanding Hadoop for Big Data
91 pages
Introduction to Renewable Energy Concepts
No ratings yet
Introduction to Renewable Energy Concepts
90 pages
Candidate Registration Certificate Details
No ratings yet
Candidate Registration Certificate Details
1 page
Machine Learning Key Concepts and Problems
No ratings yet
Machine Learning Key Concepts and Problems
1 page
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
23 pages
Word Processing Techniques in Computing
No ratings yet
Word Processing Techniques in Computing
92 pages
Key Machine Learning Concepts and Problems
No ratings yet
Key Machine Learning Concepts and Problems
1 page
Goals of AI Research Explained
No ratings yet
Goals of AI Research Explained
44 pages
Overview of Indian Knowledge System
No ratings yet
Overview of Indian Knowledge System
16 pages
Bridging Ethics and HCAI Practice
No ratings yet
Bridging Ethics and HCAI Practice
68 pages
Language Modelling: Grammar vs. Statistics
No ratings yet
Language Modelling: Grammar vs. Statistics
79 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
13 pages
BRMK557 Model Question Paper 2022-23
No ratings yet
BRMK557 Model Question Paper 2022-23
2 pages
Understanding Human-Centered AI Principles
100% (1)
Understanding Human-Centered AI Principles
61 pages
BAIL606 Machine Learning Lab Syllabus
No ratings yet
BAIL606 Machine Learning Lab Syllabus
15 pages
DBMS Lab Manual for BCS403
No ratings yet
DBMS Lab Manual for BCS403
11 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
34 pages
Testbank ModelBased Reinforcement Learning Milad Farsi Download
100% (2)
Testbank ModelBased Reinforcement Learning Milad Farsi Download
242 pages
AI Product Design and Development Guide
No ratings yet
AI Product Design and Development Guide
12 pages
Trends in Machine Learning Techniques
No ratings yet
Trends in Machine Learning Techniques
44 pages
Scalable ML for IoT Analytics in Cloud
No ratings yet
Scalable ML for IoT Analytics in Cloud
21 pages
AI Practical Exam Questions for Class X
No ratings yet
AI Practical Exam Questions for Class X
23 pages
Machine Learning Overview and Lifecycle
No ratings yet
Machine Learning Overview and Lifecycle
100 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
78 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
53 pages
Data Science Interview Prep: TensorFlow Insights
No ratings yet
Data Science Interview Prep: TensorFlow Insights
71 pages
Machine Learning a Bayesian and Optimization Perspective Sergios Theodoridis
No ratings yet
Machine Learning a Bayesian and Optimization Perspective Sergios Theodoridis
12 pages
PyTorch Autoencoder Architecture Guide
No ratings yet
PyTorch Autoencoder Architecture Guide
42 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
9 pages
Incremental Naive Bayes for Health Prediction
No ratings yet
Incremental Naive Bayes for Health Prediction
13 pages
Overview of Learning Scenarios in AI
No ratings yet
Overview of Learning Scenarios in AI
25 pages
Multi-Label XGBoost for Student Success
No ratings yet
Multi-Label XGBoost for Student Success
7 pages
Machine Learning Overview for BEC515A
100% (1)
Machine Learning Overview for BEC515A
38 pages
Statistical Deep Learning Overview
No ratings yet
Statistical Deep Learning Overview
12 pages
Now: Collide-1st-Edition-By-Dave-Stewart-Mark-Simmons-032172058x-9780321720580-11396
100% (6)
Now: Collide-1st-Edition-By-Dave-Stewart-Mark-Simmons-032172058x-9780321720580-11396
66 pages
Online Gradient Descent in Machine Learning
No ratings yet
Online Gradient Descent in Machine Learning
7 pages
Bayesian Network Exam Insights
No ratings yet
Bayesian Network Exam Insights
6 pages
Machine Learning for Topology Optimization
No ratings yet
Machine Learning for Topology Optimization
19 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
21 pages
Robot Skill Learning and Planning
No ratings yet
Robot Skill Learning and Planning
21 pages
Understanding Overfitting and Underfitting
0% (1)
Understanding Overfitting and Underfitting
36 pages
Perceptron Algorithm in Online Learning
No ratings yet
Perceptron Algorithm in Online Learning
3 pages
Nonlinear Regression & Perceptrons
No ratings yet
Nonlinear Regression & Perceptrons
7 pages
Online Learning with Feature Selection
No ratings yet
Online Learning with Feature Selection
9 pages
A Multi-Task Learning Approach For Delayed Feedback Modeling
No ratings yet
A Multi-Task Learning Approach For Delayed Feedback Modeling
5 pages
BAI701: Deep Learning Overview
No ratings yet
BAI701: Deep Learning Overview
25 pages

Deep Learning Fundamentals and Techniques

Uploaded by

Deep Learning Fundamentals and Techniques

Uploaded by

Introduction to Deep

• Types of Learning Approaches

• Input -> Feature Engineering -> Model -> output.

• Deep Learning is a subfield of Machine Learning

• Computer Vision: face recognition, object detection, medical imaging.

• Vanishing/exploding gradients → solved with ReLU, residuals, batch

Find the value of that minimizes .

but we only observe noisy data:

• Expectation Form of the Objective:

Accurate gradient, Very slow, memory-

Fast updates, strong Noisy gradients, Online learning,

• From cost function → to expectation → to gradient → to computation

Long-term dependencies Repeated multiplication Vanishing/exploding LSTM/GRU, clipping

Inexact gradients Mini-batch estimates or Noisy updates Larger batches, surrogate

Theoretical limits NP-hardness, no free Exact minimization Approximate “good

You might also like