0% found this document useful (0 votes)

22 views89 pages

Deep Learning: Linear Regression & Optimization

Uploaded by

Nalain Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Topics covered

Sigmoid Function,
Contour Plot,
Data Visualization,
Predictive Modeling,
Weight Adjustment,
Statistical Models,
Linear Regression,
Training Data,
Training Epochs,
Function Parameters

0% found this document useful (0 votes)

22 views89 pages

Deep Learning: Linear Regression & Optimization

Uploaded by

Nalain Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Topics covered

Sigmoid Function,
Contour Plot,
Data Visualization,
Predictive Modeling,
Weight Adjustment,
Statistical Models,
Linear Regression,
Training Data,
Training Epochs,
Function Parameters

Deep Learning

Lecture-2
Dr. Abdul Jaleel
Associate Professor
Machine learning: A new Programming Paradigm
Linear Regression y=mx+c

But how to determine such exact m & c ??

Linear Regression y=mx+c

Possibility of too many

lines.

which one is best

suited?
Mean Squared Error
Residuals, Error or Loss

Cost Function

May be used to
Compare different
hypothetical Lines
Lots of Regression Lines, Each having some Cost
Which one is the best?
That have minimum
MSE cost.
Loss Functions

 Squared Loss Loss =

 Mean Squared Error (MSE) MSE =

 Absolute Loss Loss =

 Mean Absolute Error (MAE) MAE =

8
Convex Optimization and Gradient Descent Approach
A real-valued function defined on an n-dimensional interval is called convex if the
line segment between any two points on the graph of the function lies above or on the
graph.
Convex Optimization for a set of 21 Data-Points

Regression Line

12
Convex Optimization

Loss function plot of 21

data points for

Wj ={-1,-0.5, 0, 0.5, 1, 1.5, 2, 2.5,…,5}

Loss =

13
Convex Optimization

From a range of weight values plotted in left side graph, Lets estimate loss for weight value zero.

14
Convex Optimization

𝑦 =0

Mean Squared Error Loss for weight value zero is calculated as a

difference/distance of Red and Green lines plotted in right side.
+0
15
Convex Optimization

𝑦 =0.5 𝑥

+0
16
Convex Optimization

𝑦 =1 𝑥

Next, lets guess loss for weight value one. +0

17
Convex Optimization

𝑦 =1.5 𝑥

Weight value 1.5 decreases the MSE.

+0
18
Convex Optimization

𝑦 =2 𝑥

For weight value 2, the predicted line best fits the data point line.
+0
19
Convex Optimization with bias

𝑦 𝑝 =𝑤𝑥 + 𝑏

20
Convex Optimization with bias

Loss =

21
Convex Optimization with bias

Loss =

Loss function’s surface plot is converted into contour plot.

22
Convex Optimization with bias

𝑦 𝑝 =𝑤𝑥 + 𝑏
23
Convex Optimization with bias

𝑦 =0 𝑥 − 1

𝑦 𝑝 =𝑤𝑥 + 𝑏
24
Convex Optimization with bias

𝑦 =1 𝑥 −1

𝑦 𝑝 =𝑤𝑥 + 𝑏
25
Convex Optimization with bias

𝑦 =2 𝑥 −1

𝑦 𝑝 =𝑤𝑥 + 𝑏
26
Convex Optimization with bias

𝑦 =2 𝑥+ 0

𝑦 𝑝 =𝑤𝑥 + 𝑏
27
Convex Optimization with bias

𝑦 =2 𝑥+1

𝑦 𝑝 =𝑤𝑥 + 𝑏
28
Gradient Descent Approach
Gradient Descent
Slope and
Derivative
Slope and Derivative
Result: the derivative of x2 is 2x
Derivative Partial Derivative
Partial Derivative
Gradient Descent Approach
Deep Learning
Lecture-3
Dr. Abdul Jaleel
Associate Professor
H(x) = Pred_y
 Lets apply Gradient descent in
coefficient learning to find the
Gradient
values of a function's parameters
Descent
that minimize the cost function as
far as possible.
Almost we reached a best fit line
- In Neural Networks, we apply Logistic Regression on the outcome of Gradient

Decent based learned parameters of Linear Regression best fit line.

- The Sigmoid Function works as an activation function for the Neuron to

classify

the outcome
Why we need a Sigmoid / Logit function instead of
Step Function for Neuron Activation
Why we need a Sigmoid / Logit function instead of
Step Function for Neuron Activation
Why we need a Sigmoid / Logit function instead of
Step Function for Neuron Activation
The Linear Equation

Non Linear
Activation Function
How it works for Row 1
Predicted and Actual outcome for Row 1 :- Error calculated with LogLoss function
instead of MSE
Predicted and Actual outcome for Row 2 :- Error calculated with LogLoss function
Predicted and Actual outcome for Row 13 :- Error calculated with LogLoss function
[Link]
ss-function-for-logistic-regression-589816b5e03c
Loss is high for W1=1,W2=1, Need to apply Gradient Descent
Implementation of activation functions in python
Implementation of Loss functions in python
Now we start implementing gradient descent in plain python. Again the goal is to come up with same w1, w2
and bias that keras model calculated.
We want to show how keras/tensorflow would have computed these values internally using gradient descent

First write couple of helper routines such as sigmoid and log_loss

Now comes the time to implement our own custom neural network class !!
This shows that in the end we were able to come up with same value of w1,w2
and bias using a plain python implementation of gradient descent function

you can compare predictions from our own custom model

and tensoflow model.

You will notice that predictions are almost same

 [Link]
ear-regression-with-mathematical-insights/
 [Link]
LINKS
 [Link]
https://
[Link]/why-not-mse-as-a-loss-fu
nction-for-logistic-regression-589816b5e03c

[Link]
c-regression-logarithmic-expr#:~:
text=Mean%20Squared%20Error%2C%20common
ly%20used,function%20is%20however%20always
%20convex

Common questions

Gradient descent is a fundamental optimization algorithm used to find the optimal parameter values of a function by iteratively moving towards the minimum of a cost function. This is achieved by computing the gradient, or derivative, of the cost function with respect to the parameters and updating the parameters in the opposite direction to the gradient. In neural networks, this approach is crucial for training the network as it systematically reduces the error by fine-tuning the weights and biases based on the computed gradients. By continuously applying these updates, the network's parameters converge towards values that minimize the loss, thus ensuring an optimal model fit. This iterative process allows deep learning models to adjust during training, ultimately improving their predictive accuracy .

The Sigmoid function is preferred over the Step function in neural networks for neuron activation primarily due to its non-linear property. While the Step function changes output abruptly between 0 and 1, the Sigmoid function provides a smooth gradient which helps in better gradient-based learning via backpropagation. The continuous nature of the Sigmoid function allows the network to adjust weights gently and update them incrementally, aiding convergence and learning. Also, the derivative of the Sigmoid function is well-defined everywhere, offering a useful gradient for optimization methods like gradient descent, which a Step function lacks due to its discontinuous nature .

Using the Step function instead of the Sigmoid function for neuron activation in neural networks presents several challenges. The primary issue arises from the Step function's non-differentiable and abrupt output transition between classes, making it unsuitable for gradient-based optimization methods such as backpropagation. This discontinuity prevents the effective computation and propagation of gradients through the network, stymieing learning and convergence. Moreover, the lack of output sensitivity to input variations essentially nullifies the potential for fine-tuning during training, as neuron outputs either stay the same or change instantaneously without intermediate values. Consequently, replacing the Sigmoid function with a Step function can severely limit the network's ability to learn complex patterns and adjust to errors incrementally during training .

The gradient descent approach assists in reaching the best-fit line in linear regression by iteratively updating the model's parameters to minimize the cost function, typically the Mean Squared Error (MSE). This is achieved by calculating the gradient of the cost function concerning the parameters, directing how adjustments should be made to decrease error. The effectiveness of this method in linear regression stems from its ability to handle the optimization of continuous cost landscapes effectively, ensuring convergence by following the steepest descent path. Similarly, gradient descent is applied in other machine learning models where it optimizes complex parameters subject to a cost function. This technique is especially indispensable in deep learning for training multi-layered networks by allowing backpropagation to adjust weights and biases systematically across layers, ultimately enhancing the model's predictive performance .

A neural network utilizes gradient descent internally by iteratively tuning its weights and biases to minimize a defined cost function, achieving a configuration that parallels implementations like those in Keras or Tensorflow. When training a neural network, the algorithm calculates the gradient of the cost function in relation to each weight, indicating the direction in which the weights should be adjusted to reduce error. By applying these adjustments step-by-step, the network's parameters get fine-tuned towards minimizing the loss. In implementations such as Keras or Tensorflow, this process is streamlined through automatic differentiation, which efficiently computes gradients during model training. By coding gradient descent in Python, similar outcomes can be achieved, highlighting that the choice of framework, while offering ease and optimizations, primarily facilitates the gradient computations that a custom neural network can replicate through manual gradient descent steps .

Convex optimization plays a crucial role in determining the best-fit line in linear regression by minimizing the cost function, which is often represented as Mean Squared Error (MSE). In this context, a real-valued cost function defined over an n-dimensional interval is convex if the line segment between any two points on the graph of the function lies above or on the graph. This property is utilized to find the optimal values of the weights by comparing costs for different hypothetical regression lines. The lowest cost, or MSE, indicates the best-fit line for the data points. An iterative approach, such as adjusting the weights and calculating the corresponding losses, helps identify the optimal regression line. For example, varying the weight value, as shown by a loss function plot for different weight values like 0, 0.5, 1.5, 2, etc., helps evaluate how well the line fits the data, ultimately selecting the one with minimal MSE, effectively optimizing the model's parameters .

A contour plot helps visualize the optimization of a loss function during linear regression model training by representing the cost function's surface and illustrating how adjustments to parameters move toward a minimization point. Each contour line shows points where the function takes on a constant value; hence, adjacent contours differ by changes in the cost function value. As gradient descent progresses, moving along a path on the contour plot, the plotted path highlights how weight and bias adjustments affect the loss reduction. This visualization offers an intuitive understanding of how the optimization process approaches the optimal solution via systematically descending to regions of lower cost, aiding in tracking the convergence and diagnosing potential issues during training .

Log Loss is crucial in logistic regression as it aligns with the nature of the classification problems, providing a measure of how far the predicted probabilities deviate from the actual labels. Unlike MSE, which is suited for regression tasks and assumes a continuous output, Log Loss evaluates the accuracy of probability predictions in binary classification. It measures the uncertainty of predictions, penalizing incorrect ones more significant than correct predictions. This allows for more nuanced updates during model training, effectively adjusting weights based on prediction confidence. The Log Loss approach ensures that the cost function used is convex, facilitating convergence during optimization and making it a preferred choice over MSE, which may lead to non-convex problems in classification scenarios .

Mean Squared Error (MSE) and Mean Absolute Error (MAE) are both used as loss functions in linear regression but have distinct characteristics and impacts on the model. MSE calculates the square of the residuals, which emphasizes larger errors more than smaller ones, making it sensitive to outliers. It provides a solution that minimizes the variance of the error distribution, often preferred when outliers are less of a concern, aiming to reduce large predictions errors more aggressively. Conversely, MAE computes the absolute differences between actual and predicted values, offering a linear penalization of errors and a more robust performance in the presence of outliers due to its less sensitivity. Choice between the two depends on the specific model goals and data characteristics; MSE is useful when large errors are especially problematic or when the model requires a differentiable function for optimization via gradient methods, while MAE is preferred for more uniform treatment of all errors .

The convexity of a cost function is pivotal in the effectiveness of optimization methods like gradient descent because it ensures the function has a single global minimum and no local minima. This property guarantees that the iterative updates carried out by gradient descent will reliably converge to the optimal solution if the learning rate is appropriately set. A convex cost function allows for more straightforward optimization because the surface defined by the function is smooth and predictable, enabling consistent and efficient parameter updates. This contrasts with non-convex functions that may lead to suboptimal local minima, thereby complicating the training process, making convergence more difficult, and heightening the risk of getting stuck in less ideal parameter spaces .

Machine Learning: Loss Functions & Optimization
No ratings yet
Machine Learning: Loss Functions & Optimization
39 pages
Analytical vs Numerical Solutions in Optimization
No ratings yet
Analytical vs Numerical Solutions in Optimization
14 pages
Neural Network Optimization Techniques
No ratings yet
Neural Network Optimization Techniques
65 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
40 pages
Logistic Regression Overview and Methods
No ratings yet
Logistic Regression Overview and Methods
21 pages
Scilab GUI for Linear Regression Analysis
No ratings yet
Scilab GUI for Linear Regression Analysis
11 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
5 pages
Regression Analysis Techniques Overview
No ratings yet
Regression Analysis Techniques Overview
28 pages
Linear Models Training Overview
No ratings yet
Linear Models Training Overview
23 pages
Linear Regression Fundamentals Explained
No ratings yet
Linear Regression Fundamentals Explained
9 pages
Linear Regression with Gradient Descent
100% (1)
Linear Regression with Gradient Descent
8 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
27 pages
Neural Network Programming Basics
No ratings yet
Neural Network Programming Basics
12 pages
Linear and Logistic Regression Overview
No ratings yet
Linear and Logistic Regression Overview
65 pages
7 Machine Learning Loss Functions Explained
No ratings yet
7 Machine Learning Loss Functions Explained
16 pages
Linear Regression with Gradient Descent
No ratings yet
Linear Regression with Gradient Descent
13 pages
Linear Regression and Gradient Descent Guide
No ratings yet
Linear Regression and Gradient Descent Guide
60 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Linear Regression with Gradient Descent
No ratings yet
Linear Regression with Gradient Descent
2 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
25 pages
Logistic Regression and Gradient Ascent
No ratings yet
Logistic Regression and Gradient Ascent
22 pages
Choosing Loss Functions in Deep Learning
No ratings yet
Choosing Loss Functions in Deep Learning
29 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Univariate Linear Regression Explained
No ratings yet
Univariate Linear Regression Explained
72 pages
Limitations and Methods of Linear Regression
No ratings yet
Limitations and Methods of Linear Regression
31 pages
Contour Plots in Gradient Descent
No ratings yet
Contour Plots in Gradient Descent
108 pages
Neural Networks Optimization Seminar
No ratings yet
Neural Networks Optimization Seminar
66 pages
Introduction to Linear Regression
No ratings yet
Introduction to Linear Regression
9 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
61 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
54 pages
Linear Regression with Gradient Descent
No ratings yet
Linear Regression with Gradient Descent
11 pages
Linear Regression Overview and Methods
No ratings yet
Linear Regression Overview and Methods
25 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
14 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
14 pages
Training Machine Learning Models Overview
No ratings yet
Training Machine Learning Models Overview
83 pages
Mean Squared Error in Neural Networks
No ratings yet
Mean Squared Error in Neural Networks
6 pages
Deriving Linear Regression Loss Function
No ratings yet
Deriving Linear Regression Loss Function
29 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
91 pages
Linear Regression in Python Overview
No ratings yet
Linear Regression in Python Overview
90 pages
Linear Regression with Gradient Descent
No ratings yet
Linear Regression with Gradient Descent
11 pages
Python Calculus: Functions and Optimization
No ratings yet
Python Calculus: Functions and Optimization
19 pages
Linear Regression and Optimization Techniques
No ratings yet
Linear Regression and Optimization Techniques
41 pages
Overview of Loss Functions in ML
100% (2)
Overview of Loss Functions in ML
37 pages
Neural Network Loss Functions Explained
No ratings yet
Neural Network Loss Functions Explained
29 pages
Linear Regression and Gradient Descent Guide
No ratings yet
Linear Regression and Gradient Descent Guide
71 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
51 pages
Linear Regression Analysis in Python
No ratings yet
Linear Regression Analysis in Python
115 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Types of Gradient Descent Explained
No ratings yet
Types of Gradient Descent Explained
8 pages
Understanding Loss Functions in ML
No ratings yet
Understanding Loss Functions in ML
9 pages
Gradient Descent Guide: Module 4 Lab 2
No ratings yet
Gradient Descent Guide: Module 4 Lab 2
5 pages
Linear Regression Fundamentals Explained
No ratings yet
Linear Regression Fundamentals Explained
80 pages
Linear Regression Basics and Techniques
No ratings yet
Linear Regression Basics and Techniques
34 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
67 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
40 pages
Linear Regression in Python: Least Squares & Gradient Descent
No ratings yet
Linear Regression in Python: Least Squares & Gradient Descent
6 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
53 pages
Build a 3-Layer XOR Neural Network
No ratings yet
Build a 3-Layer XOR Neural Network
14 pages
Feedforward Neural Networks Overview
No ratings yet
Feedforward Neural Networks Overview
14 pages
Handwritten Digit Classification with Neural Networks
No ratings yet
Handwritten Digit Classification with Neural Networks
53 pages
Handwritten Digit Classification with Neural Networks
No ratings yet
Handwritten Digit Classification with Neural Networks
53 pages
Compiler Construction Lab Report
No ratings yet
Compiler Construction Lab Report
27 pages
Machine Learning Energy Advising System
No ratings yet
Machine Learning Energy Advising System
93 pages
E-Banking Quality and Customer Satisfaction
No ratings yet
E-Banking Quality and Customer Satisfaction
66 pages
Introduction to GOIZPER Clutch-Brakes
No ratings yet
Introduction to GOIZPER Clutch-Brakes
5 pages
Computer Applications Exam Guide
No ratings yet
Computer Applications Exam Guide
4 pages
Construction of Palm Jumeirah Island
100% (1)
Construction of Palm Jumeirah Island
23 pages
Hospital Booking System Development
No ratings yet
Hospital Booking System Development
46 pages
Microsoft Solutions Partner Certification Guide
No ratings yet
Microsoft Solutions Partner Certification Guide
25 pages
GenerEl PCBA Type 404 Test Report
No ratings yet
GenerEl PCBA Type 404 Test Report
28 pages
Affidavit of Purpose for Motorcycle Registration
0% (1)
Affidavit of Purpose for Motorcycle Registration
2 pages
Essential Clinical Trial Documents SOP
No ratings yet
Essential Clinical Trial Documents SOP
12 pages
The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting
No ratings yet
The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting
19 pages
Functions and Components of CPU
No ratings yet
Functions and Components of CPU
28 pages
Brinno TLC200 User Manual
No ratings yet
Brinno TLC200 User Manual
24 pages
Dead Time Compensation for VSIs
No ratings yet
Dead Time Compensation for VSIs
15 pages
Installing FileMaker Pro 11 on Windows
No ratings yet
Installing FileMaker Pro 11 on Windows
12 pages
Statistics Course Outline for Educ. 201
No ratings yet
Statistics Course Outline for Educ. 201
2 pages
SCADA Protocols: IEC 104 vs. MQTT
No ratings yet
SCADA Protocols: IEC 104 vs. MQTT
68 pages
Stihl FS 38 Parts Diagram Guide
No ratings yet
Stihl FS 38 Parts Diagram Guide
44 pages
SWE30003 Unit Outline: Jan 2024
No ratings yet
SWE30003 Unit Outline: Jan 2024
9 pages
History and Future of Computers
No ratings yet
History and Future of Computers
9 pages
Bank Account Statement Summary
No ratings yet
Bank Account Statement Summary
5 pages
Cybersecurity Interview Feedback Form
No ratings yet
Cybersecurity Interview Feedback Form
3 pages
Proline Promag W 500 Overview
No ratings yet
Proline Promag W 500 Overview
146 pages
Signal Fault Analysis Report
No ratings yet
Signal Fault Analysis Report
4 pages
CP631 Technical Data Sheet
No ratings yet
CP631 Technical Data Sheet
1 page
Android Module Metadata Analysis
No ratings yet
Android Module Metadata Analysis
23 pages
Cellular System Design by Qaiser Hussain
No ratings yet
Cellular System Design by Qaiser Hussain
22 pages
Gantt Chart Project Management Template
No ratings yet
Gantt Chart Project Management Template
16 pages
PL-TCM Mill Installation in Vietnam
No ratings yet
PL-TCM Mill Installation in Vietnam
10 pages
Marham: AI Mental Health Platform Overview
No ratings yet
Marham: AI Mental Health Platform Overview
8 pages
Grade 1 ICT Curriculum Guide
No ratings yet
Grade 1 ICT Curriculum Guide
8 pages

Deep Learning: Linear Regression & Optimization

Uploaded by

Deep Learning: Linear Regression & Optimization

Uploaded by

Deep Learning

But how to determine such exact m & c ??

Possibility of too many

which one is best

 Squared Loss Loss =

 Mean Squared Error (MSE) MSE =

 Absolute Loss Loss =

 Mean Absolute Error (MAE) MAE =

Loss function plot of 21

Wj ={-1,-0.5, 0, 0.5, 1, 1.5, 2, 2.5,…,5}

Mean Squared Error Loss for weight value zero is calculated as a

Next, lets guess loss for weight value one. +0

Weight value 1.5 decreases the MSE.

Loss function’s surface plot is converted into contour plot.

Decent based learned parameters of Linear Regression best fit line.

- The Sigmoid Function works as an activation function for the Neuron to

First write couple of helper routines such as sigmoid and log_loss

you can compare predictions from our own custom model

You will notice that predictions are almost same

Common questions

How does gradient descent contribute to finding optimal parameter values in a function, and how is this applied in neural networks?

Why is the Sigmoid function preferred over a Step function in neural networks, particularly in the context of neuron activation?

What challenges may arise if the Step function is used in place of the Sigmoid function for neuron activation in neural networks?

How does the gradient descent approach assist in reaching the best-fit line in linear regression, and how is it applied in other machine learning models?

How does a neural network utilize gradient descent internally to achieve a configuration similar to a Keras/Tensorflow model?

What role does convex optimization play in determining the best-fit line in linear regression, and how is it applied to a set of data points?

In what way does a contour plot help visualize the optimization of a loss function during linear regression model training?

What is the importance of applying Log Loss instead of Mean Squared Error (MSE) in logistic regression?

In the context of linear regression, how do loss functions like Mean Squared Error and Mean Absolute Error differ, and why might one be preferred over the other?

What role does the convexity of a cost function play in the effectiveness of optimization methods like gradient descent?

You might also like