0% found this document useful (0 votes)
51 views12 pages

Understanding Machine Learning Types

its about technical

Uploaded by

maliksubhaan15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views12 pages

Understanding Machine Learning Types

its about technical

Uploaded by

maliksubhaan15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

Machine Learning enables machines to learn from data and experiences without explicit programming.
Instead of writing code, you provide data to an algorithm, which builds logic based on that data.

How it works: A training dataset is used to train an ML algorithm to create a model. New input data is
then processed through this model to make predictions. If the predictions meet acceptable accuracy, the
model is deployed. Otherwise, it is retrained with enhanced data until the accuracy improves.

Types of Machine Learning:

1. Supervised Learning –

Learning is guided by labeled data (a


"teacher"). The model is trained on this
dataset and then makes predictions or
decisions when new data is introduced.

2. Unsupervised Learning
The model learns by observing and
finding patterns in data without
labels. It organizes data into clusters
based on relationships, though it
doesn't assign labels to these
clusters. For example, it can group
apples, bananas, and mangoes into
clusters without naming them.

3. Reinforcement Learning
An agent interacts with its
environment and learns through rewards and penalties. It refines its decisions over time by
maximizing positive rewards and minimizing mistakes. Once trained, it can make predictions
based on new data.
Classification of machine learning

Unit 2
Regression Models:

Regression predicts continuous response values, such as house prices, stock values, or cricket scores.
Common models include:

1. Simple Linear Regression – Predicts using one independent variable.


2. Multiple Linear Regression – Predicts using multiple independent variables.

Key Concepts:

• Cost Function & Gradient Descent: Methods for optimizing the model by minimizing error.

• Performance Metrics:

o Mean Absolute Error (MAE)

o Mean Squared Error (MSE)

o R-Squared & Adjusted R-Squared (indicate model fit).

Types of Regression
1. Linear Regression
2. Logistic Regression
3. Polynomial Regression
4. Support Vector Regression
5. Decision Tree Regression

6 Random Forest Regression

7 Ridge Regression

8 Lasso Regression

Linear Regression:

Linear regression is a simple statistical method for predictive analysis that models the relationship
between continuous variables. It addresses regression problems by showing a linear relationship
between the independent variable (X) and the dependent variable (Y).

Types:

1. Simple Linear Regression – One input variable.

2. Multiple Linear Regression – Multiple input variables.

Equation:
Y=aX+bY = aX + b

• YY: Dependent variable (target)

• XX: Independent variable

Example: Predicting an employee's salary based on years of experience.

Some popular applications of linear regression are:

• Analyzing trends and sales estimates


• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
LINEAR REGRESSION
Linear regression is a statistical approach for modeling relationship between a dependent variable with a
given set of independent variables.

Simple Linear Regression


Simple linear regression is an approach for predicting a response using a single feature.

It is assumed that the two variables are linearly related. Hence, we try to find a linear function that
predicts the response value(y) as accurately as possible as a function of the feature or independent
variable(x). Let us consider a dataset where we have a value of response y for every feature x:
LOGISTIC REGRESSION

Consider an example dataset which maps the number of hours of study with the result of an exam. The
result can take only two values, namely passed (1) or failed(0)

i.e. y is a categorical target variable which can take only two possible type:“0” or “1”.In order to
generalize our model, we assume that:
Differences between Linear Regression and Logistic Regression: -
LINEAR REGRESSION LOGISTIC REGRESSION
1. Linear Regression is a supervised regression model. 1. Logistic Regression is a supervised classification
2. In Linear Regression, we predict the value by an model.
integer number. 2. In Logistic Regression ,we predict the value by 1 or 0.
3. Here no activation function is used 3. Here activation function is used to convert a linear
regression
Performance Metrics

1. Accuracy can be calculated by taking average of the values lying across the “main diagonal”

2. Precision:-It is the number of correct positive results divided by the number of positive results
predicted by classifier.

3. Recall :- It is the number of correct positive results divided by the number of all relevant samples
Residuals and Residual Plots

Residuals: Residuals measure the vertical distance between


observed data points and the regression line, representing the error
between predicted and actual values.

Residual Plots:

• Residuals (Y-axis) vs. independent variable (X-axis) are


visualized in residual plots.

• Key assumption: Residuals should be independent and


normally distributed.

Residual Plot Analysis

A key assumption of linear regression is that residuals (errors)


are independent and normally distributed. Since predictions are
never 100% accurate, some randomness is inherent. The
regression model aims to capture all predictive information in
the deterministic part, leaving residuals as completely random
and unpredictable (stochastic). Ideally, residuals should follow
a normal distribution, validating this assumption.

Characteristics of a Good Residual Plot:

1. High density of points near the origin and low density away from it.

2. Symmetry about the origin.

3. No patterns as residuals are distributed evenly along the X-axis.

4. Projected residuals on the Y-axis form a normal distribution.

A good residual plot shows random, pattern less scatter, while a bad one shows systematic patterns or
deviations from normality. This validates the assumption that residual errors are stochastic and
independent.

A good residual plot satisfies key assumptions:

1. Residuals projected onto the Y-axis form a normal distribution, confirming normality.

2. Residuals are evenly distributed across the X-axis with no visible patterns, ensuring
independence.
Good residual plots
Project on to the Y axis

In contrast, a bad residual plot shows:

• High density far from the origin and low density near it.

• A non-normal distribution when projected onto the Y-axis, violating these assumptions.

Polynomial Regression

Polynomial regression models the relationship between the independent variable xxx and the
dependent variable yyy as an nnn-degree polynomial. It fits a nonlinear relationship using the least-
squares method.

Types of Polynomial Regression:

• Linear: Degree = 1

• Quadratic: Degree = 2

• Cubic: Degree = 3
• Higher degrees follow similarly.

Assumptions of Polynomial Regression

For effective polynomial regression:

1. The relationship between the dependent variable and independent variables should be linear or
curved and additive.

2. Independent variables must not correlate with each other.

3. Errors should be independent, normally distributed with a mean of zero, and have constant
variance.

Polynomial regression alters the structure from a linear equation to a quadratic or higher-degree
equation, which can be visualized through its curve.

Linear Regression vs. Polynomial Regression

Linear regression models straight-line relationships but struggles when data points follow a curve. When
linear regression underfits the data, polynomial regression captures the nonlinear patterns by fitting a
curved line.
Key Difference:

• Linear regression assumes a linear relationship between variables.

• Polynomial regression handles nonlinear relationships effectively by increasing model complexity


(e.g., quadratic curves) while keeping feature weights linear.

Polynomial regression overcomes underfitting by transforming the model structure without changing the
linear nature of the weights.

MEASURES FOR IN – SAMPLE EVALUATION:


Measures for in – sample evaluation:
A way to numerically determine how good the model fits the data set.

Two important measures to determine the fit of a model:

• Mean squared error (MSE)


• R squared (R^2)

Mean Squared Error (MSE)

Mean Squared Error (MSE) quantifies how close a regression line is to data points by calculating the
average of squared errors.

• Smaller MSE indicates closely dispersed data with fewer errors, resulting in a better model.

• Larger MSE suggests widely scattered data points around the mean.

Goal: Minimize MSE for improved model accuracy.

Common questions

Powered by AI

Linear regression assumes a direct linear relationship between the independent and dependent variables, making it suitable for datasets where data points can be approximated with a straight line . Polynomial regression, however, models the relationship as an n-degree polynomial, enabling it to capture nonlinear patterns through curve fitting . While linear regression uses a linear equation (Y = aX + b), polynomial regression transforms this into a nonlinear one by including powers of the independent variable, allowing for a more flexible fit to data that demonstrate curved trends . Linear regression underfits data with nonlinear tendencies, while polynomial regression overcomes this by adjusting the model structure with linear weights on polynomial terms .

Mean Squared Error (MSE) and R-Squared are vital metrics for evaluating regression models as they provide quantitative measures of a model's prediction accuracy and fit quality. MSE calculates the average squared deviation of the predicted values from the actual values, giving an indication of the model's error magnitude; a lower MSE suggests better model performance . R-Squared, or the coefficient of determination, explains the proportion of variance in the dependent variable that is predictable from the independent variables, indicating the goodness of fit . Both metrics are integral to model optimization. By minimizing the MSE, a model can be tuned for higher accuracy. Similarly, a higher R-Squared value denotes a model that better captures the relationship between variables, thus aiding in selecting the most efficient model .

The choice between simple and multiple linear regression models significantly impacts prediction and analysis outcomes. Simple linear regression uses a single independent variable to predict a dependent variable, making it straightforward but limited in scenarios where multiple factors influence the outcome . This model is appropriate when a clear linear relationship between two variables exists. However, when multiple factors impact the dependent variable, multiple linear regression is more suitable. It considers multiple independent variables, helping capture more complex relationships and providing a more comprehensive analysis . The model selected can influence the accuracy and insightfulness of predictions, with multiple regression often yielding more robust models in multi-factor scenarios .

Cost function and gradient descent are fundamental for optimizing regression models as they guide the adjustment of model parameters to minimize prediction error. The cost function quantifies the difference between predicted and actual values, typically using squared errors to measure performance . Minimizing this function is crucial to improving model accuracy. Gradient descent iteratively adjusts the parameters by moving them in the direction that reduces the cost function most sharply, based on the slope (gradient) of the cost function . This process ensures convergence towards the optimal parameters, refining the model's predictive performance. Together, these methods are critical for developing precise and effective regression models, by systematically reducing error and ensuring robust predictions .

Linear and quadratic forms of polynomial regression differ significantly in assumptions and dataset applicability. Linear polynomial regression (degree 1) assumes a straight-line relationship between variables , making it suitable for datasets with simple linear trends. Quadratic regression (degree 2) assumes a curved relationship and is used when data exhibit parabolic trends or undergo acceleration/deceleration . The quadratic model's ability to capture these complexities allows it to fit more intricately shaped datasets than linear regression. However, it assumes curvature, which if incorrectly assumed, can lead to overfitting or misinterpretation in datasets better suited to linear models . This increased flexibility means quadratic models are preferable when a simple linear model underfits the observed data trends .

A 'good' residual plot in linear regression analysis is characterized by random, patternless scatter of residuals across the X-axis and high-density points near the origin with symmetry about it. It also demonstrates that the residuals, projected onto the Y-axis, form a normal distribution, indicating that residual errors are independent and stochastic . This is crucial because it suggests that the model has captured all the systematic part of the data through the deterministic component, leaving only randomness in residuals. Good residual plots indicate that key regression assumptions, like normality and independence of errors, are met, ensuring the model's validity and reliability .

Logistic regression is a supervised classification model used to predict categorical outcomes, typically binary, such as 'pass' or 'fail' . In contrast, linear regression predicts continuous numerical outcomes. Logistic regression applies an activation function, such as the sigmoid, to the linear combination of input features to map predictions to a probability between 0 and 1, enabling classification of the data into categories . Linear regression, on the other hand, does not use activation functions and predicts exact scalar values .

Reinforcement learning differs significantly from supervised and unsupervised learning in its approach to learning and feedback. Unlike supervised learning, which uses labeled datasets to guide learning and make predictions , reinforcement learning relies on an agent interacting with an environment. The agent learns by receiving rewards or penalties for its actions, thereby refining its decision-making policy to maximize cumulative rewards over time . This trial-and-error method contrasts with unsupervised learning, where learning happens without explicit labels, typically through pattern recognition and clustering . Reinforcement learning's distinctive use of a reward-based feedback system allows the model to adaptively learn optimal strategies instead of relying on predefined correct outputs .

Clustering in unsupervised learning poses notable challenges that affect cluster interpretation. One major challenge is the initial choice of parameters, such as the number of clusters, as misestimation can lead to poor representation of the data structure . Without labeled data, determining clusters' significance and naming them is subjective, potentially leading to misinterpretation of relationships within the data. Clustering algorithms are also sensitive to noise and outliers, which can distort the true cluster centroids, affecting the accuracy and usefulness of the results . These challenges make it critical to carefully preprocess data and validate clusters to ensure that the generated insights genuinely reflect underlying data patterns .

Model validation in machine learning involves assessing the performance of an algorithm by testing it on a set of data distinct from the training dataset. This process is crucial to ensure that the model can generalize well to new, unseen data rather than just memorizing the training data . If the initial predictions of a model do not meet acceptable accuracy levels, indicating poor model performance or overfitting, the model is retrained using enhanced data. Enhancements might include additional or higher-quality data, feature engineering, or parameter tuning to improve accuracy and model performance . Retraining ensures that the model is robust and reliable before deployment into real-world applications .

You might also like