0% found this document useful (0 votes)

32 views35 pages

Linear Regression Equation Explained

notes

Uploaded by

nafisatamboli72

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views35 pages

Linear Regression Equation Explained

notes

Uploaded by

nafisatamboli72

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

 REGRESSION : INTRODUCTION

UQ. When is it suitable to use linear regression over classification?

(SPPU - Q. 3(a), Nov./Dec. 16, 5 Marks)

Linear Regression is general form of predictive analysis. It is broadly used for all statistical
techniques. It measures the relationship between one or more predictor variables and one
outcome variable. Regression analysis examines the relationship between a dependent and
independent variable.

 3.1.1 Linear Models

UQ. What do you mean by a linear regression? Which applications are best modeled by linear
regression?
(SPPU - Q. 5(a), March 19, Q. 5(b), March 19,
Q. 1(b), Nov./Dec. 17, 4 Marks))

• Linear regression model illustrates the relationship between two variables or factors.
Regression analysis is generally used to show the correlation between two variables.

• The prediction variable present in the equation model of linear regression is known as
dependent variable. Let us consider it as Y. The variables that predict the dependent
variable are known as independent variables. Let say X.

• Y becomes a dependent variable, as the prediction (Y) dependent on the other variables
(X).

• In simple linear regression analysis, each observation has two variables. That is the
independent variable and the dependent variable. Multiple regression analysis consists of
two or more independent variables and in what way they correlate to the independent
variable. The equation that defines how y is related to X is called the regression model.

• Here’s a simple linear regression formula:

Fig. 3.1.1 : Linear Decision Boundary

• In the equation shown above, y is the dependent variable, which is to be described and x1
is independent variable. That is the variable which is related with the change in predicted
values.

• The coefficient describes that a change in independent variable is maybe not totally
equivalent to a change
in y.

• Now let us look at the proof. To put a line through our data that best fits the data. A
regression line shows a positive linear relationship (the line that is sloping up), a negative
linear relationship (the line that is sloping down), or totally absence of relationship (shown
by a flat line)

Fig. 3.1.2 : Positive Relationship

Fig. 3.1.3 : Negative Relationship

Fig. 3.1.4 : Absent Relationship

• Where the line crosses the vertical axis, that point is said to be as constant.

• For instance, if mapping of 0 years of experience

(X axis) is considered with Salary on (Y axis) then it would be $30000.

• Hence, the constant in given graph below will be about $30000.

• The steeper the slope, the more the salary for years of experience.

• For example, if 1 more year of experience is considered, then salary (y) should be
incremented by $10,000, but due to steeper slope, it may increase like $15,000

• When we look at a graph, vertical lines can be drawn from the line to our actual
observations. The actual observations can be clearly seen as the dots, while the line displays
the model observations (the predictions).

Fig. 3.1.5
Fig. 3.1.6

• The line shows the difference between employees’ actually earning and he’s prediction to
be earned. To find the best line, the minimum sum of squares is looked, the sum of all the
squared differences is done and the minimum is found out. This is known as Ordinary least
squares method.

• Regression is one of the parametric techniques that make assumptions. Let's have a glance
at the assumptions it makes :
1. A linear and additive relationship is present between dependent variable (DV) and
independent variable (IV). Linear relationship means, that the change in DV by 1 unit
change in IV is constant. By additive it means, the effect of X on Y is independent of other
variables.
2. No correlation between independent variable must be present. As presence of correlation
in ndependent variables may cause Multicollinearity. That is, it becomes difficult for the
model to define the actual effect of IV on DV.
3. The error terms should consist of constant variance. Due to its absence, it causes
heteroskedest a city.
4. Error at ∈t should not decide the error at ∈t+1
i.e. the error terms must not be uncorrelated.
5. Correlation in error terms is called as Autocorrelation. Its presence extremely affects the
regression coefficients and standard error values as they are based on the assumption of
uncorrelated error terms.
6. There should be normal distribution between dependent variable and the error terms.
• Presence of these assumptions makes regression relatively obstructive. By restrictive it
means, the performance of a regression model is dependent on completion of these
assumptions.

• Following are the types of models in Linear Regression

1. Univariate Linear Regression

2. Multiple Linear Regression

 3.1.2 Univariate Linear Regression : Model Representation

• Simple linear regression consists of a single input, that can be used in statistics to estimate
the coefficients.

• Statistical properties from data are required to get calculated like means, standard
deviations, correlations and covariance. All the data should be present to traverse and
calculate statistics. Hypothesis function for it is given by

y = 1 + 2  x

x : input training data (univariate – one input variable(parameter))

y : labels to data (supervised learning)
While training the model – to predict the value of y, it is required to fit the best line to for
a given value of x. The best regression fit line is given by model by finding the best θ1 and
θ2 values.
θ1 : intercept ; θ2 : coefficient of x

• In simple linear regression, the topic of this section, the predictions of Y when plotted as a
function of X form a straight line.

• In Table 3.1.1. Example data is plotted in Fig. 3.1.7. There exists a positive relationship
between X and Y. If Y is predicted from X, higher the value of X, higher will be prediction
of Y.

Table 3.1.1 : X with Y sample data

X Y
1.00 1.00
2.00 2.00
3.00 1.30
X Y
4.00 3.75
5.00 2.25

Fig. 3.1.7 : A scatter plot of the sample data

• The best fit straight line through the points is found in linear regression. That best-fitting
line is known as regression line. In Fig. 3.1.8, the back diagonal line is regression line
which consists of the predicted score on Y for each possible value of X. Errors of prediction
are represented by the vertical lines from the point to the regression line. As shown in Fig.
3.1.8, the red point is near the regression line; it has less error prediction level. Whereas,
the yellow point is much higher, so has more error prediction.

• The black line comprises of the predictions, the points depict the actual data, and the
vertical lines between the points and the black line denote prediction errors.

Fig. 3.1.8 : Diagonal Regression line

• The error of prediction for a point is the calculation of value of the point minus the
predicted value (the value on the line).
• Table 3.1.2 shows the predicted values (Y') and the errors of prediction (Y-Y'). An example
illustrates it, the first point has a Y of 1.00 and a predicted Y
(called Y') of 1.21. So, its error of prediction is – 0.21.

Table 3.1.2 : Example data

X Y Y' Y – Y (Y – Y)2

1.00 1.00 1.210 – 0.044

0.210

2.00 2.00 1.635 0.365 0.133

3.00 1.30 2.060 – 0.578

0.760

4.00 3.75 2.485 1.265 1.600

5.00 2.25 2.910 – 0.436

0.660

• We have not yet defined the term "best-fitting line." The line that has minimum sum of
the squared errors of prediction is so far widely used criterion. Same criterion is used for
finding the line in Fig. 3.1.8.

• The squared errors of prediction are given in last column of Table 3.1.2. Compared to any
other regression line, the sum of the squared errors of prediction shown in Table 3.1.2 is
lowest.

The formula for a regression line is

Y = bX + A

where Y' is the predicted value, b is the slope of the line, and A is the Y intercept.
The equation for the line in Fig. 3.1.8 is

Y = 0.425X + 0.785

For X = 1,

Y = (0.425)(1) + 0.785 = 1.21.

For X = 2,

Y = (0.425)(2) + 0.785 = 1.64.

 3.1.3 Multiple Linear Regression

• The relationship between more than one explanatory variables and response variable is
modeled by multiple linear regression through fitting a linear equation to observed data.

• Each value of the independent variable x is associated with a value of the dependent
variable y. The population regression line for p explanatory variables x1, x2, ...xp is defined
to be

y = 0 + 1 x1 + 2x2 + ... + px

• p in this line it is shown that the way the mean response y changes with the explanatory
variables. The observed values for y differ about their means y and are assumed to have
the same standard deviation. The parameters 0, 1,...p are estimated by the fitted values
b0, b1,…... bp of the population regression line.

• The multiple regression models include a term for variation, as the observed values for y
vary about their means y.

• That means, the model is expressed as

DATA = FIT + RESIDUAL, where the “FIT” term represents the expression 0 + 1 x1 +
2x2 + ... + p xp.

• The “RESIDUAL” term signifies the deviations of the observed values y from their means
y, which are normally distributed with mean 0 and variance. The notation for the model
deviations is  .

• Lawfully, the model for multiple linear regressions, for n observations, is

y1 = 0 + 1 xi1 + 2 xi2 +…+ p xip + i for i = 1,2,…n

• Here we have learnt the concept of simple linear regression where to model the response
variable Y, a single predictor variable X was used. In so many applications, more than one
factor are responsible to influence the response.

• Multiple regression models define how a single response variable Y is dependent linearly
on a number of predictor variables.

 Examples

• The selling price of a house can depend on various factors like the popularity of the
location, the number of bedrooms, the number of bathrooms, the year the house was built,
the square footage of the plot etc.
• The child’s height can depend on the height of the parents, nutrition he gets, and other
environmental factors.

 3.2 LEAST-SQUARE METHOD MODEL REPRESENTATION

• The “least squares” method is a type of mathematical regression analysis that determines
the best fit line for a collection of data, displaying the relationship between the points
visually.

• A relationship between a known independent variable and an unknown dependent variable

is represented by each point of data.

 3.3 UNIVARIATE REGRESSION : LEAST SQUARE METHOD

UQ. What do you mean by least square method? Explain least square method in the context
of linear regression. (SPPU - Q. 2(b), Dec. 19, 5 Marks,
Q. 1(a), May/June 2016, 5 Marks)
UQ. What do you mean by coefficient of regression? SSR, MSE in the context of regression.
Explain SST, SSE,
(SPPU - Q. 4(b), Dec. 19, 5 Marks)

• Univariate regression is also called as simple linear regression in which a single

independent variable ‘X’ has linear relationship with a single dependent variable.

• Regression analysis is used to identify linear relationship between single dependent and an
independent variable.

• Equation of linear regression is given by

Y = 0 + 1 X + 

where,
Y = Value of dependent variable
X = independent variable ;

0 and 1 are constant

 = random error

This equation of univariate regression is similar with

Y = b + mx
Where,

b = Y axis intercept = 0

m = slope of line = 1

Fig. 3.3.1

Univariate linear regression model we get a line on

X-Y plane as

Fig. 3.3.2 : Simple linear regression

Actual value of Y is, y = 0 + 1 X + 

 
and predicted value of 
y is, 
y = 0 + 1 X

• For increase in value of X by 1 unit then value of Y is expected to increase by 1 units

• Even if X = 0 i.e. value of independent variable is zero then also it is expected that value
of Y is 0.

 Features of Best fit regression line should satisfy

1. Regression line results in minimum sum of errors.

– –
(
2. It must pass through centroid of sample data where centroid is X Y and )
n n
 xi  yi
–– i=1 –– i=1
X = , Y =
n n

• It does not need to go through all or maximum points of sample data.

• It does not need to have same number of sample points above and below to it.

• Value of 0 and 1 is given by

–– ––
0 = Y – 1 X

n  ––  ––
 yi – Y  xi – X 
i=1
1 = n –– 2
 (xi – X)
i=1

 Least square method

• It is used to find model parameters in linear regression

• Consider input features vectors
(x1, y1), (x2, y2) ,,,, (xn, yn)
X = x1, x2,...xn = independent variable
Y = y1, y2,...yn = independent variable

• Least square method is discussed with respect to shaft univariate linear regression
Y = 0 + 1 X + 

• Here, target is to find values of 0 and 1 which should best fit to the given sample data.

• 0 and 1 values can be found by using least square method.

• Linear regression predicts value of yi for given input feature xi as 

 y1 is predicted as 
y1

 y2 is predicted as 
y2

• For each point on regression, we can calculate the difference between actual value y and

predicted value 
y (predicted by regression line)

 for point x1, e1 = y1 – 

for point x2, e2 = y2 – 

y2
the difference between actual value and predicted value is called as Residual or Errors
• Least square method is used to find values of 0 and 1 to construct in regression line for
which sum of squares of all error (SSE) is minimum.
n  2
 SSE =  yi – yi
i=1
n  2
• Objective of least square method is to find values of 0 and 1 for which  yi – yi is
i=1
minimum i.e. sum of square of errors is minimum,
n n  2
 (ei) =  yi – yi
2
SSE =
i=1 i=1
n  
=  yi –  0 +  1 + xi 
2

i=1
n  
 yi –  0 –  1 xi 
2
SSE = ...(1)
i=1

• To get minimum value of SSE for 0 and 1 partial derivative of SSE [Link] of 0 and 1
must be equal to 0
SSE SSE
 =0 ;  =0
0 1

SSE  n    2
 =  yi –  0 – 1xi = 0
0 0 i = 1

• Partial derivative and summation are interchargeable

 n   
 yi – 0 –  1 xi  =
2
 0
0 i = 1

can be written as,

n     2 =
   i
y –  –  1 i
x 0
i = 1 0
0

n  
  – 2yi –  0 –  1 xi  = 0
i=1
n  
– 2  yi – 0 –  1 xi  = 0
i=1
n  
 yi – 0 –  1 xi  = 0 ...(2)
i=1

Similarly, for 1 find partial derivative to get value of 1

SSE
 = 0
1

 n   
 yi – 0 –  1 xi =
2
 0
1 i = 1

n    
yi – 0 –  1 xi =
2
   0
i = 1 1
n  
  – 2xi yi –  0 –  1 xi = 0
i=1

n  
– 2  xi yi –  0 –  1 xi = 0
i=1

n  
 xi yi –  0 –  1 xi = 0 ...(3)
i=1

Consider Equation no. (2)

n  
 yi – 0 –  1 xi = 0
i=1

n n   n
 yi –   0 –  1  xi = 0
i=1 i=1 i=1

n  n  n
 yi –  0  1 –  1  xi = 0
i=1 i=1 i=1

n   n
 yi – n  0 – 1  xi = 0
i=1 i=1

 n  n
 n 0 =  – yi –  1  xi
i=1 i=1
n n
 yi 1  xi
 i=1 i=1
0 = n – n
n n
 xi  yi
–– i=1 –– i=1
But, X = ;Y =
n n

 ––  ––
 0 = Y –  1 X ...(4)

Now consider equation no. (3)

n  
 xi yi –  0 –  1 xi = 0
i=1

n ––  –– 
 xi yi –  Y – 1 X  –  1 xi = 0
i=1

n ––  –– 
 xi yi – Y +  1 X – 1 xi = 0
i=1

n  ––  –– 
 xi yi – Y +  1 X – xi  = 0
i=1
( )
n  ––  –– 
 xi yi – Y –  1 xi – X  = 0
i=1
( )
n –– n  ––
( )
 xi yi – Y –  xi  1 xi – X
i=1 i=1
( ) = 0

n ––  n ––
 xi yi – Y
i=1
( )=  1  xi xi – X
i=1
( )
n –– n ––

 xi yi – Y
i=1
( ) 
i=1
xi – X ( ) (y – ––Y )
i

1 = = n
n –– ––
 xi xi – X
i=1
( )  xi xi – X
i=1
( ) (x – ––X)
i i

n


i=1
(y – ––Y ) (x – ––X )
i i

 1 = ...(5)
n –– 2

i=1
( xi – X )
n

i=1
(x – ––X ) (y – ––Y )
i i

But, cov (x, y) = = xy

N
n

i=1
(x – ––X ) (y – ––Y )
i i

But, cov (x, y) = = xx

n


i=1
(y – ––Y ) (x – ––X )
i i
n xy
1 = =
n n xx

i=1
(x – ––X )
i
2

 xy
1 =
xx

 
Y = 0 + 1 X

• For increase in value of X by 1 unit there is increase in value of Y is 1 units.

• Even if x = 0 i.e. value of independent variable is zero then also it is expected that value of
Y is 0.

n n
 xi  yi
–– i=1 –– i = 1
X = and Y= n
n

n ––
 ––  –– 

i=1
(
xi – X yi )
0 = Y – 1 X ; 1 = n
–– 2

i=1
(
xi – X )

 3.4 COST FUNCTIONS : MSE, MAE,

R – SQUARED

UQ. What do you mean by coefficient of regression? SSR, MSE in the context of
regression. Explain SST, SSE.
(SPPU - Q. 4(b), Dec. 19, 5 Marks)
UQ. What are the ingredients of machine learning?
(SPPU - Q. 4(b), May/June 2016, 5 Marks)
UQ. Enlist ingredients of ML. Explain each ingredient in two or three sentences.
(SPPU - Q. 2(b), Nov./Dec. 16, 5 Marks)
UQ. How the performance of a regression function is measured ? (SPPU - Q. 2(b),
Nov./Dec. 17, 4 Marks)
UQ. Define and explain Squared Error (SE) and Mean Squared Error (MSE) w.r.t.
Regression.
(SPPU - Q. 1(b), May/June 2018, 5 Marks)
UQ. How the performance of regression is assessed? Write various performance metrics
used for it.
(SPPU - Q. 4(b), May/June 2019, 4 Marks)
UQ. Suppose you have been given a set of training examples {[x1, y1), (x2, y2)........(xn,
yn)}. Find the equation of the line that best fits the data in that minimizes the squared
error.
(SPPU - Q. 6(b), Oct.19, 5 Marks)

• A cost function is a mechanism utilized in supervised machine learning, the cost

function returns the error between predicted outcomes compared with the actual outcomes.
In other words, it estimates the total cost of production given a specific quantity produced.

• A cost function is a measure of how wrong the model is in terms of its ability to estimate
the relationship between X and y. This is typically expressed as a difference or distance
between the predicted value and the actual value.

 Regression cost Function

• Regression models deal with predicting a continuous value for example salary of an
employee, price of a car, loan prediction, etc.

• A cost function used in the regression problem is called “Regression Cost Function”.

 Mean Error (ME)

• In this cost function, the error for each training data is calculated and then the mean value
of all these errors is derived.

• Calculating the mean of the errors is the simplest and most intuitive way possible.

• The errors can be both negative and positive. So they can cancel each other out during
summation giving zero mean error for the model.

• Thus this is not a recommended cost function but it does lay the foundation for other cost
functions of regression models.
 Mean Squared Error (MSE)
• This improves the drawback we encountered in Mean Error above. Here a square of the
difference between the actual and predicted value is calculated to avoid any possibility of
negative error.

• It is measured as the average of the sum of squared differences between predictions and
actual observations. It is also known as L2 loss.

• In MSE, since each error is squared, it helps to penalize even small deviations in prediction
when compared to MAE. But if our dataset has outliers that contribute to larger prediction
errors, then squaring this error further will magnify the error many times more and also
lead to higher MSE error.

• Hence we can say that it is less robust to outliers

 Mean Absolute Error (MAE)

• This cost function also addresses the shortcoming of mean error differently. Here an
absolute difference between the actual and predicted value is calculated to avoid any
possibility of negative error.

• So in this cost function, MAE is measured as the average of the sum of absolute differences
between predictions and actual observations.

• It is also known as L1 Loss.

• It is robust to outliers thus it will give better results even when our dataset has noise or
outliers.

 R – Squared

• R-squared (R2) is a statistical measure that represents the proportion of the variance for a
dependent variable that's explained by an independent variable or variables in a regression
model. ... It may also be known as the coefficient of determination.

• Linear regression equation is given by,

yi = 0 + 2 Xi + i

• Values of 0 and 1 are estimated by various methods for e.g for least square method,
maximum likelihood method


• Values of 0 and 1 are used to predict values of Yi as Yi where
  
y i = 0 + 1 x

• Analysis of performance of linear regression can be done by using various measures as

(a) SSE – Sum of Squared Error
(b) MSE – Mean of Squared Error
(c) RMSE – Root Mean Squared Error
(d) NMSE – Normalised Mean Squared Error
(e) R-Squared
 (a) Sum of Squared Error (SSE)
n  2
 SSE =  yi – yi
i=1

 (b) Mean of Squared Error (MSE)

1 n  2
MSE = n (SSE) = n  yi – yi
1
i=1

 (c) Root Mean Squared Error (RMSE)

1 n   2
RMSE = MSE =  yi – yi
ni=1

 (d) Normalised Mean Squared Error (NMSE)

n  2
NMSE = Var (Y ) = Var (Y )  yi – yi
SSE 1
i i i=1

 (e) R-Squared
SSE
Rr-squared = 1 – Var (y )
i

 (f) Mean Absolute Error (MAE)

1 n 
MAE = n  yi – yi
i=1

 Solved Examples

Ex. 3.4.1 : Consider following data, for 5 students. Each

Xi (i = 1 to 5) represents score of ith student in standard
X and corresponding Yi (j = 1 to 5) respects score of ith student in standard XII.
(a) What linear regression equation best predicts
standard XIIth score ?
OR
Find Regression line that fits best for given sample data
(b)How to interpret regression equation ?
(c) If a student’s score is 90 in Xth Standard then what
is his/her expected score in XIIth standard ?
Sample data

Student Score in Xth std (Xi) Score in XIIth std (Yi)

1 95 85

2 85 95

3 80 70

4 70 65

5 60 70

 Soln. :
Given Xi = Score of ith student in Xth std.

Yi = Score of ith student in XIIth std.

Let us assume that X is independent variable and Y is dependent variable.

Let equation of regression line is

 
Y = 0 + 1 X

Where, values of 0 and 1 are given by least square method are as,
Fig. 3.4.1 : Linear Regression for Students example
–– ––
0 = Y – 1 X

n

i=1
(x – ––X ) × (y – ––Y )
i i

 1 = n

i=1
(x – ––X )
i
2

In this example, n = 5

xi yi –– –– 2
–– 2
––
(xi –X ) (xi –X ) (yi –Y ) (x i –X )

––
(yi –Y )
95 85 17 289 +8 136

85 95 7 49 + 18 126

80 70 2 4 –7 – 14

70 65 –8 64 – 12 96

60 70 – 18 324 –7 126

Sum 730 470

n
 xi
n –– i = 1 390
 xi = 390 ; X = 5 = 5 = 78
i=1
n –– 1 n 385
 xi = 385 ; Y = n  yi = 5 = 77
i=1 i=1
n –– 2
( )
  xi – X = 289 + 49 + 4 + 64 + 324 = 730
i=1
and
n –– ––
i=1
(
 xi – X )(y i –Y ) = 470

n –– ––

i=1
(xi –X  ) (y i –Y) 470
 1 = = 730 = 0.644
n –– 2

i=1
(x i –X)
–– ––
and 0 = Y – 1  X = 77 – (0.644) (78) = 26.768

 Equation of Regression that Best Fit in sample data is,


y = 26.768 + 0.644 x

(b) Interpretation of linear regression equation


y = 26.768 + 0.644 x

 Interpretation 1
For increase in value of X increase score of students in X increase in score of student in
Xth Std same student in XIIth std is 0.644.

 Interpretation 2

Even if X = 0 (practically it is not possible to apply this interpretation in real word). i.e. a
students score in Xth is 0 then also expected score of student in XIIth standard is 26.76

 
y = 26.768 + 0.644 (90) ; y = 84.72

 Problem for Practice

UQ. For a given data having 100 examples, if squared errors SE1, SE2, and SE3 are 13.33,
3.33 and 4.00 respectively, calculate Mean Squared Error (MSE). State the formula for
MSE.
(SPPU - Q. 1(b), May/June 16, 5 Marks)
UQ. Consider the following data points :
Calculate the Cost Function for 00 = 0.5 and
01 = l using linear regression.
(SPPU - Q. 4(b), Nov./Dec. 17, 6 Marks)

X Y
1 1.5
2 2.75
3 4
4 4.5
5 5.5

 .8 MULTIVARIATE REGRESSION : MODEL REPRESENTATION

[Link] higher dimensional linear regression with suitable example.

UQ. Define and explain : Multivariate normal distribution.
(SPPU - Q. 8(b), Dec. 19, 4 Marks)
UQ. What is multivariate regression? How will it be different from univariate regression?
(SPPU - Q. 2(a), May/June 2016,
Q. 5(a), Nov./Dec. 16, 5 Marks)
UQ. What do you mean by zero centered and
un-correlated features? What is the use of it in the solution of multivariate linear
regression?
(SPPU - Q. 4(b), May/June 18, 6 Marks)

• Linear regression is a statistical model that observes the linear relationship between two
(Simple Linear Regression) or more (Multiple Linear Regression) variables a dependent
variable and independent variable(s). Linear relationship mainly means that dependent
variable too increases (or decreases), when one (or more) independent variables increases
(or decreases).

• As it can be seen, that a linear relationship may be positive (independent variable goes up,
dependent variable goes up) or negative (independent variable goes up, dependent variable
goes down.
• Multiple Linear Regression attempts to model the Relationship between two or more
features and a response by fitting a linear equation to observed data. The steps that are
needed to perform multiple linear Regression are similar to that of simple linear
Regression.

• The difference is in the Evalution. it can be used to find out which factor has the highest
impact on the predicted output and different variable relate to each other.

Here : Y = b0 + b1x1 + b2x2 + b3x3 +…… bnxn

Y = Dependent variable and x1, x2, x3, …… xn

= multiple independent variables

Fig. 3.8.1 : Positive and Negative Regression

• There are two important disadvantages of Linear Regression. Let’s consider that the shown
model is actually close to (or exactly) linear, i.e.,

yi = r(xi) + I i = 1,….n,

for some underlying regression function r(X0) that is approximately (or exactly) linear in
X0.

The two short comings :

1. Predictive Ability : Though the Linear regression fit has low bias but has high variance.
The expected test error is a combination of these two quantities. Prediction accuracy can
be enhanced by losing some small amount of bias in order to decrease the variance.
2. Interpretative ability : A coefficient is assigned to every predictor variable by Linear
regression. A smaller set of important variables is searched, when the number of variables
p is large, for the sake of interpretation. So there is a need to encourage our fitting to make
a subset of the coefficients large, and others small or even zero.

• In a high dimensional regression setting, these limitations become major problems, where
the number of predictors p rivals or even exceeds the number of observations n. In fact,
where p > n, the linear regression approximation is not well defined.
 How can we do better?

• For a linear model, the linear regression has predictable test error σ2 + p  σ2/n. The first
term is the irreducible error; the second term is entirely from the variance of the linear
regression estimate (averaged over the input points). Its bias is exactly zero

• What can be understood from this? If another predictor variable is being added into the
mix, then same amount of variance will get added, σ2/n, irrespective of whether its true
coefficient is large or small (or zero)

• Hence in the last example, efforts for “spending” variance for trying to fit truly small
coefficients was done there were 20 out of 30 them.

• One may find that it can be done better by shrinking small coefficients towards zero, which
possibly introduces some bias, but also reduces the variance. In other words, “small
details” were ignored in order to get a more stable “big picture”. If it is properly done, this
way can actually work.

 3.9 INTRODUCTION TO POLYNOMIAL REGRESSION

UQ. Write short notes on : Linearly and non- linearly separable data(SPPU - Q.
6(a), March 19, 5 Marks)
UQ. What is a polynomial regression? How it can be represented in a form of a matrix?
(SPPU - Q. 2(b), May/June 18, 5 Marks)
UQ. What do you mean by zero centered and
un-correlated features? What is the use of it in the solution of multivariate linear
regression?
(SPPU - Q. 4(b), May/June 18, 6 Marks)

• In statistics, polynomial regression is a kind of regression analysis in which the relationship

between the independent variable x and the dependent variable y is modeled as an nth
degree polynomial in x.

• The non- linear relationship between value of x and equivalent conditional mean y, is fitted
by polynomial regression, denoted as E(y | x).

• Though polynomial regression fits a nonlinear model to the data, it is linear as a statistical
estimation, in the logic that the regression function E(y | x) is linear in the unknown
parameters estimated by the data.
• Because of this, polynomial regression is considered as a special case of multiple linear
regression.

Fig. 3.9.1 : Non linear Regression Model

• In the case above, the model remains linear externally, but it can hold internal non-linearity.
Let’s consider the above Fig. 3.9.1, which shows how scikit-learn implements this
technique. This is obviously a non-linear dataset, and any linear regression based only on
the original two-dimensional points cannot capture the dynamics.

• The need of Polynomial Regression in ML can be understood in the below points:

• If a linear model is applied on a linear dataset, then it offers good result as seen in Simple
Linear Regression, but if the same model is applied without any alteration on a non-linear
dataset, then the result that is being produced may be drastic.

• Because of which loss function may increase, the error rate will become high, and accuracy
will ultimately get decreased.

• Thus in such cases, a Polynomial Regression model is needed, where data points are
arranged in a non-linear fashion. This can be understood in a better way using the
comparison shown in below Fig. 3.9.2 and
Fig. 3.9.3 of the linear dataset and non-linear dataset.

Fig. 3.9.2 : Simple Linear Model Fig. 3.9.3 : Polynomial Model

• In the image above, a non-linear dataset is arranged. So if it is looked from the view of
linear model, then clearly it is seen that it hardly covers any data point. In another way, a
curve is appropriate which covers most of the data points, which is of the Polynomial
model.

• Henceforth, if the datasets are organized in a non-linear way, then the Polynomial
Regression model is used instead of Simple Linear Regression.

 Equation of the Polynomial Regression Model

• Polynomial Regression equation :

y = b0 + b1x + b2 x2 + b3 x3 + …... + bn xn

• Polynomial regression is a crucial part of linear regression. It’s main idea is how to
select the features. Observing at the multivariate regression with 2 variables: x1 and
x2. Linear regression will look like this:

y = a1 * x1 + a2 * x2.

• To have a polynomial regression (let’s make 2-degree polynomial). Few additional features
2 2
are created: x1*x2, x1 and x2 . So we will get your ‘linear regression’:
2 2
y = a1 * x1 + a2 * x2 + a3 * x1*x2 + a4 * x1 + a5 * x1

• A polynomial term: linear model can be turned into curve by a quadratic (squared) or cubic
(cubed) terms. But here data X is squared or cubed, and not Beta coefficients, so it is still
a linear model. This helps us to model curves easily without explicitly modelling a
complicated nonlinear model.

• One frequent pattern in machine learning is to use trained linear models on nonlinear
functions of the data. This method gives fast performance of linear methods while letting
them to fit a much wider range of data.

 Advantages of using Polynomial Regression

1. Polynomial delivers the best approximation of the relationship between the dependent and
independent variable.
2. A Wide range of function can be fit onto it.
3. Polynomial mostly fits a wide range of curvature.

 Disadvantages of using Polynomial Regression

1. Even one or two outlier can affect the result to great extent.
2. Polynomial regression is very sensitive to outliers.
3. Furthermore, nonlinear regression, have fewer model validation tools than linear
regression.

 .12 UNDERFITTING AND OVERFITTING

UQ. Explain the Fig. 3.12.1 (a), (b) and (c).

(SPPU - Q. 5(b), Oct.19, 5 Marks)

Fig. 3.12.1

GQ. What is mean by overfitting ? Discuss different

methodology for avoidance of overtaking.
UQ. Which one of these is underfit and overfit? Why?
Comment with respect to bias and variance.
(SPPU - Q. 4(a), Nov./Dec. 16, 5 Marks)

(a) Degree 1 (b) Degree 4 (c)

Degree 15
Fig. Q. 4(a)
UQ. What is overfitting and underfitting? What are the
catalysts of overfitting?
(SPPU - Q. 3(b), May/June 19, 4 Marks)

• A model can be called as a good machine learning model, if it can generalize new input
data from the problem domain appropriately. This may help to make future data prediction,
that data model will be unknown about.

• Assume that we need to check how good machine learning model learns and generalizes
new data, for this is the concept of over fitting and Underfitting. They are responsible for
poor performance of machine learning algorithms.

 3.12.1 Underfitting

• Underfitting situation is said to be arrived when a statistical model or machine learning

algorithm cannot capture the trend of data. It destroys the accuracy of model.

• Underfitting means that data is not able to fit well. this happens usually when limited data
is present to build an accurate model and also possibly when a linear model is tried to build
using non-linear model.

• In these situations machine learning model is much easier and flexible to apply rules on
minimal data when results that model makes in wrong predictions. For this, more data and
reduction of feature selection is required to avoid underfitting.

 3.12.2 Overfitting

• It is said to be overfitted, when a model is trained with a lot of data. And when such
situation occurs, it starts learning from the noise and inaccurate data entries are done in
data set.

• Thus model cannot categorize the data properly, due to too much of details and noise. The
non-parametric and non-linear methods are the causes of overfitting as these types of
machine learning algorithms have more freedom in building the model based on the dataset
and hence they can really build unrealistic models.

• To overcome problem of overfitting use of a linear algorithm is required, if we have linear

data or the parameters like the maximal depth while using decision trees.

Fig. 3.12.1 : Overfitting

• Overfitting is more probable with nonparametric and nonlinear models that have more
flexibility while learning a target function.

• Also, many nonparametric machine learning algorithms also contain parameters or

techniques that can limit and restrain how much detail the model learns.

• For instance, decision trees are a nonparametric machine learning algorithm as they are
flexible, the problem of overfitting arrives.

• This problem can be overcome by reducing a tree after learning so that some details can
be removed.

 The generally used methodologies for avoidance of overfitting are :

1. Cross- Validation : It is a standard way that finds out-of-sample prediction error which
can be used for 5-fold cross validation.
2. Early Stopping : This rule guides us to know how many iterations can be run before
learner begins to over-fit.
3. Pruning : Pruning is widely used while building related models. In this nodes are removed
that have predictive power for the problem in hand.
4. Regularization : Regularization brings new features by introducing a cost term with the
objective function. Therefore it pushes the coefficients of many variables to zero that will
reduce cost term.

 3.13 BIAS VS. VARIANCE

UQ. Explain the term bias-Variance dilemma.

(SPPU - Q. 3(b), Nov./Dec. 18, Q. 4(a),
May/June 19 Q. 5(a), Oct.19)

 3.13.1 Bias

• Let’s consider we have two values, one is predicted by our model and other is actual value
of data (target value).

• Bias refers to the gap between these two values (predicted value by our model and actual
value of data).

• Bias helps us to generalize better and make our model less sensitive to some single data
point.

• Model with high bias pays very little attention to the training data and oversimplifies the
model.

• It always leads to high error on training and test data.

 High Bias

Our estimated data value is a long way from the actual data value, resulting in a large gap
between the two.

 Low Bias

Our estimated data value is close to the actual data value, i.e. there is a smaller gap between
expected and actual data value.

 3.13.2 Variance

• Variance refers to the spread of expected values in relation to one another.

• A high variance model pays close attention to training data and does not generalise to data
it hasn't seen before.

• On training data, such models work well, but on test data, they have a high error rate.

• Variance comes from highly complex models with a large number of features.

 Low Variance

All predicted values we will see in a group (closely together).

 High Variance

Predicted values will be scattered in relation with each other.

Case 1 : Low Bias and Low Variance

The difference between actual and predicted values is small, and it belongs to the same
group as the low biassed and low variance rule (refer Fig. 3.13.1)

Case 2 : Low Bias and High Variance

Data is scattered due to high variance, but due to the rule of low bias, it is not far from the
actual data (target value) as seen Fig. 3.13.1.

Case 3 : High Bias and Low Variance

By the rule of high bias it’s a huge gap and by the rule of low variance it’s in group refer
Fig. 3.13.1.
Fig. 3.13.1 : Bias and Variance graphical visualization

Case 4 : High Bias and High Variance

• By the rule of high bias it’s a huge gap and by the rule of high variance data is scattered
refer Fig. 3.13.1.

• The predicted values are almost identical to the data's actual value. So the ideal option is
Low Bias and Low Variance.

 Underfitting

The data predicted with a high bias is in a straight line

format, which does not fit the data in the data set adequately.

h(x) = g ( 0 + 1x1 + 2x2 )

 Overfitting

When a model's variance is excessive, it's referred to as Overfitting of Data. It involves

precisely fitting the training set using a complicated curve and a high order.

h(x) = 0 + 1x + 2x2 + 3x3 + 4x4

Fig. 3.13.2 : Model representation for bias and variance

 Difference between Bias and Variance

Sr. Bias Variance

No.
1. The bias is known as The variability of
the difference model prediction for
between the a given data point
prediction of the which tells us spread
values by the ML of our data is called
model and the correct the variance of the
value. model.
2. Model with high bias High variance
pays very little models pay close
attention to the attention to training
training data and data and do not
oversimplifies the generalize to new
model. input.
3. It always leads to Such models
high error on training perform very well on
and test data. training data but has
high error rates on
test data.

Introduction to Linear Regression Analysis
100% (2)
Introduction to Linear Regression Analysis
32 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
60 pages
Linear Regression Concepts and Examples
No ratings yet
Linear Regression Concepts and Examples
28 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
41 pages
Univariate Linear Regression Overview
No ratings yet
Univariate Linear Regression Overview
16 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
26 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
118 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
5 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Types of Linear Regression Models
No ratings yet
Types of Linear Regression Models
42 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
35 pages
Linear Regression Basics in ML
No ratings yet
Linear Regression Basics in ML
23 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
38 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
51 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
41 pages
Predicting Loan Interest Rates with Linear Regression
No ratings yet
Predicting Loan Interest Rates with Linear Regression
15 pages
Introduction to Regression Models
No ratings yet
Introduction to Regression Models
15 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
54 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
5 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
18 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
24 pages
Regression Analysis Fundamentals
No ratings yet
Regression Analysis Fundamentals
30 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Linear Regression Basics for Business Analysis
No ratings yet
Linear Regression Basics for Business Analysis
16 pages
Regression Analysis and Equations
No ratings yet
Regression Analysis and Equations
23 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
15 pages
Regression Analysis Fundamentals
No ratings yet
Regression Analysis Fundamentals
11 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
55 pages
Regression PDF
No ratings yet
Regression PDF
16 pages
Understanding Linear Models in ML
No ratings yet
Understanding Linear Models in ML
60 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
13 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
20 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
52 pages
Linear Regression Overview in Data Science
100% (1)
Linear Regression Overview in Data Science
14 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
11 pages
Supervised Learning: Regression Models Guide
No ratings yet
Supervised Learning: Regression Models Guide
34 pages
Understanding Linear Regression Basics
100% (1)
Understanding Linear Regression Basics
8 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
13 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
34 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
34 pages
Understanding Regression Models Basics
No ratings yet
Understanding Regression Models Basics
26 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
45 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
18 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
14 pages
Simple & Multiple Linear Regression Guide
No ratings yet
Simple & Multiple Linear Regression Guide
56 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
2 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
35 pages
Statistical Decision Theory & Linear Regression
No ratings yet
Statistical Decision Theory & Linear Regression
16 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
58 pages
Supervised Learning: Regression Models Guide
No ratings yet
Supervised Learning: Regression Models Guide
30 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
29 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
36 pages
DL Unit 06 Notes
No ratings yet
DL Unit 06 Notes
16 pages
DL Unit 03 Notes
No ratings yet
DL Unit 03 Notes
10 pages
DL Unit 04 Notes
No ratings yet
DL Unit 04 Notes
13 pages
Mobile Computing Assignment - BE IT
No ratings yet
Mobile Computing Assignment - BE IT
1 page
Mobile Computing Assignment Overview
No ratings yet
Mobile Computing Assignment Overview
1 page
WAD Lab Manual
No ratings yet
WAD Lab Manual
24 pages
Web Development Concepts and Techniques
No ratings yet
Web Development Concepts and Techniques
1 page
Criminal Law and Robot Rights
No ratings yet
Criminal Law and Robot Rights
12 pages
Categorical Variable Feature Engineering
No ratings yet
Categorical Variable Feature Engineering
43 pages
Batrachotoxin and Action Potentials
No ratings yet
Batrachotoxin and Action Potentials
8 pages
2025 ACSM Fitness Trends Report
No ratings yet
2025 ACSM Fitness Trends Report
15 pages
Pronunciation and Grammar Practice Test
No ratings yet
Pronunciation and Grammar Practice Test
23 pages
Undercover in Gujarat's Riots
No ratings yet
Undercover in Gujarat's Riots
309 pages
Overview of Islamic Teachings and Practices
No ratings yet
Overview of Islamic Teachings and Practices
94 pages
Lab 8 - Dial Pools and Dial Peer Hunting
No ratings yet
Lab 8 - Dial Pools and Dial Peer Hunting
8 pages
NCERT Class 12 Chemistry: Electrochemistry Exercises
No ratings yet
NCERT Class 12 Chemistry: Electrochemistry Exercises
4 pages
Soil Improvement Techniques Overview
No ratings yet
Soil Improvement Techniques Overview
41 pages
Loner: A Jujutsu Kaisen Reader-Insert
No ratings yet
Loner: A Jujutsu Kaisen Reader-Insert
38 pages
GLP Accredited Labs for Batch Analysis
No ratings yet
GLP Accredited Labs for Batch Analysis
12 pages
Philippine UTI Management Guidelines
No ratings yet
Philippine UTI Management Guidelines
28 pages
Strength Training for Lightweight Rowers
100% (2)
Strength Training for Lightweight Rowers
25 pages
NTC 5468: Standards for Fruit Products
No ratings yet
NTC 5468: Standards for Fruit Products
28 pages
KOMTRAX System Overview and Differences
No ratings yet
KOMTRAX System Overview and Differences
5 pages
3300 Rebam
No ratings yet
3300 Rebam
8 pages
Crop Maturity Indices for Harvesting
No ratings yet
Crop Maturity Indices for Harvesting
14 pages
System Evaluation of GIS Functionality
No ratings yet
System Evaluation of GIS Functionality
7 pages
Techniques for Delivering Negative Messages
No ratings yet
Techniques for Delivering Negative Messages
4 pages
Understanding Tornado Causes and Effects
No ratings yet
Understanding Tornado Causes and Effects
10 pages
Understanding Himba Traditions in Africa
No ratings yet
Understanding Himba Traditions in Africa
33 pages
Implementasi Kurikulum 2013 di Era Disrupsi
No ratings yet
Implementasi Kurikulum 2013 di Era Disrupsi
10 pages
BHEL Internship Report: Heavy Plates Plant
No ratings yet
BHEL Internship Report: Heavy Plates Plant
25 pages
Understanding Ethics and Morality
100% (2)
Understanding Ethics and Morality
126 pages
Evolution of Post-WIMP User Interfaces
No ratings yet
Evolution of Post-WIMP User Interfaces
5 pages
Software Engineer Application by Shikhar Kumar
No ratings yet
Software Engineer Application by Shikhar Kumar
2 pages
Nominal Pipe Size NPS, Nominal Bore NB, Outside Diameter OD
100% (1)
Nominal Pipe Size NPS, Nominal Bore NB, Outside Diameter OD
7 pages
Power Systems Exam Answer Key
No ratings yet
Power Systems Exam Answer Key
15 pages
Shear Stress Distribution in Beams
No ratings yet
Shear Stress Distribution in Beams
12 pages