0% found this document useful (0 votes)
25 views38 pages

Machine Learning Basics and Types

The document provides a comprehensive overview of machine learning, covering its definition, types, and applications, including supervised, unsupervised, semi-supervised, and reinforcement learning. It explains how machine learning algorithms create mathematical models to make predictions based on historical data and discusses the advantages and disadvantages of each learning type. Additionally, the document touches on artificial neural networks and their relationship to biological neural networks.

Uploaded by

nyatharimadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views38 pages

Machine Learning Basics and Types

The document provides a comprehensive overview of machine learning, covering its definition, types, and applications, including supervised, unsupervised, semi-supervised, and reinforcement learning. It explains how machine learning algorithms create mathematical models to make predictions based on historical data and discusses the advantages and disadvantages of each learning type. Additionally, the document touches on artificial neural networks and their relationship to biological neural networks.

Uploaded by

nyatharimadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

MACHINE LEARNING

UNIT - I
Learning –
The Machine Learning Tutorial covers both the fundamentals and more complex ideas of
machine learning. Students and professionals in the workforce can benefit from our machine
learning tutorial.

A rapidly developing field of technology, machine learning allows computers to


automatically learn from previous data. For building mathematical models and making
predictions based on historical data or information, machine learning employs a variety of
algorithms. It is currently being used for a variety of tasks, including speech recognition,
email filtering, auto-tagging on Facebook, a recommended system, and image recognition.

You will learn about the many different methods of machine learning, including
reinforcement learning, supervised learning, and unsupervised learning, in this machine
learning tutorial. Regression and classification models, clustering techniques, hidden Markov
models, and various sequential models will all be covered.

What is Machine Learning

In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.

Introduction to Machine Learning


A subset of artificial intelligence known as machine learning focuses primarily on the creation of
algorithms that enable a computer to independently learn from data and previous experiences. Arthur
Samuel first used the term "machine learning" in 1959. It could be summarized as follows:

Without being explicitly programmed, machine learning enables a machine to automatically


learn from data, improve performance from experiences, and predict things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample historical
data, or training data. For the purpose of developing predictive models, machine learning
brings together statistics and computer science. Algorithms that learn from historical data are
either constructed or utilized in machine learning. The performance will rise in proportion to
the quantity of information we provide.
Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions. Machine
learning contains a set of algorithms that work on a huge amount of data. Data is fed to these
algorithms to train them, and on the basis of training, they build the model & perform a
specific task.

These ML algorithms help to solve different business problems like Regression,


Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real- world
examples of classification algorithms are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name suggests,
there is no need for supervision. It means, in unsupervised machine learning, the machine is
trained using the unlabeled dataset, and the machine predicts the output without any
supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit images,
and we input it into the machine learning model. The images are totally unknown to the
model, and the task of the machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference, shape
difference, and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is a
way to group the objects into a cluster such that the objects with the most similarities remain
in one group and have fewer or no similarities with the objects of other groups. An example
of the clustering algorithm is grouping the customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those variables accordingly so
that it can generate maximum profit. This algorithm is mainly applied in Market Basket
analysis, Web usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
Applications of Unsupervised Learning
o Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to
discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to
extract particular information from the database. For example, extracting information
of each user located at a particular location.

Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between


Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled datasets
during the training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled
data. As labels are costly, but for corporate purposes, they may have few labels. It is
completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main aim of semi-
supervised learning is to effectively use all the available data, rather than only labelled data
like in supervised learning. Initially, similar data is clustered along with an unsupervised
learning algorithm, and further, it helps to label the unlabeled data into labelled data. It is
because labelled data is a comparatively more expensive acquisition than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analyzing the
same concept under the guidance of an instructor at college.

Advantages and disadvantages of Semi-supervised Learning

Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance. Agent gets rewarded for each
good action and get punished for each bad action; hence the goal of reinforcement learning
agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such as
Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision Process(MDP). In


MDP, the agent constantly interacts with the environment and performs actions; at each
action, the environment responds and generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies


increasing the tendency that the required behaviour would occur again by adding
something. It enhances the strength of the behaviour of the agent and positively
impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour
would occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning


o Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-
human performance. Some popular games that use RL algorithms are AlphaGO and
AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that
how to use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful with reinforcement
learning. There are different industries that have their vision of building intelligent
robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with
the help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning

Advantages

o It helps in solving complex real-world problems which are difficult to be solved


by general techniques.
o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken
the results.

The Brain and the Neuron


THE BRAIN AND THE NEURON In animals, learning occurs within the brain. If we can
understand how the brain works, then there might be things in there for us to copy and use for
our machine learning systems. While the brain is an impressively powerful and complicated
system, the basic building blocks that it is made up of are fairly simple and easy to
understand. We’ll look at them shortly, but it’s worth noting that in computational terms the
brain does exactly what we want. It deals with noisy and even inconsistent data, and produces
answers that are usually correct from very high dimensional data (such as images) very
quickly. All amazing for something that weighs about 1.5 kg and is losing parts of itself all
the time (neurons die as you age at impressive/depressing rates), but Artificial Neural
Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial Neural
Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human
brain. Similar to a human brain has neurons interconnected to each other, artificial neural
networks also have neurons that are linked to each other in various layers of the networks.
These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the artificial neural network.
In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing
map, Building blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.
The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural network:


Biological Neural Artificial Neural Network
Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts to


mimic the network of neurons makes up a human brain so that computers will have an option
to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.

There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to


understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an


input to an
activation function to produce the output. Activation
functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the
sort of task we are performing.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.

Capability to work with incomplete knowledge:


After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20 th century are
exponentially developing. In the present time, we have investigated the pros of artificial neural networks
and the issues encountered in the course of their utilization. It should not be overlooked that the cons of
ANN networks, which are a flourishing science branch, are eliminated individually, and their pros are
increasing day by day. It means that artificial neural networks will turn into an irreplaceable part of our
lives progressively important.

How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network receives
the input signal from the external source in the form of a pattern and image in the form of a
vector. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights are
the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted
inputs are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
Choosing the Training Experience:
The first design choice we face is to choose the type of training experience from which our
system will learn. The type of training experience available can have a significant impact on
success or failure of the learner. One key attribute is whether the training experience provides
direct or indirect feedback regarding the choices made by the performance system. For
example, in learning to play checkers, the system might learn from direct training examples
consisting of individual checkers board states and the correct move for each.
In order to complete the design of the learning system, we must now choose
1. The exact type of knowledge to be learned
2. A representation for this target knowledge
3. A learning mechanism

Choosing the Target Function:


The next design choice is to determine exactly what type of knowledge will be learned and
how this will be used by the performance program. Let us begin with a checkers-playing
program that can generate the legal moves from any board state. The program needs only to
learn how to choose the best move from among these legal moves.
Choosing a Representation for the Target Function:

X1: the number of black pieces on the board


x2: the number of red pieces on the board
x3: the number of black kings on the board
x4: the number of red kings on the board
x5: the number of black pieces threatened by red (i.e., which can be captured on red's next
turn)
X6: the number of red pieces threatened by black
Thus, our learning program will represent V(b) as a linear function of the form
V(b)=w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6

where wo through w6 are numerical coefficients, or weights, to be chosen by the


learning algorithm.
Partial design of a checkers learning program:
Task T: playing checkers
Performance measure P: percent of games won in the world tournament
Training experience E: games played against itself
Target function: V:BoardR
Target function representation
V(b)=w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6
Choosing a Function Approximation Algorithm
In order to learn the target function f we require a set of training examples, each describing a
specific board state b and the training value Vtrain(b) for b.
ESTIMATING TRAINING VALUES
Rule for estimating training values.

Vtrain (b) V(Successor(b))

ADJUSTING THE WEIGHTS

The Final Design:


The Performance System is the module that must solve the given per- formance task, in this
case playing checkers, by using the learned target function(s). It takes an instance of a new
problem (new game) as input and produces a trace of its solution (game history) as output.
The Critic takes as input the history or trace of the game and produces as output a set of
training examples of the target function. As shown in the diagram, each training example
in this case corresponds to some game state in the trace, along with an estimate Vtrain, of
the target function value for this example.
The Generalizer takes as input the training examples and produces an output hypothesis
that is its estimate of the target function. It generalizes from the specific training examples,
hypothesizing a general function that covers these examples and other cases beyond the
training examples.
The Experiment Generator takes as input the current hypothesis (currently
learned function) and outputs a new problem (i.e., initial board state) for the Performance
System to explore. Its role is to pick new practice problems that will maximize the learning
rate of the overall system.

Fig: Final Design of Checkers Learning Problem


Fig: Summary of choices in designing checkers learning problem.

PERSPECTIVES AND ISSUES IN MACHINE LEARNING:

One useful perspective on machine learning is that it involves searching a very large space
of possible hypotheses to determine one that best fits the observed data and any prior
knowledge held by the learner. For example, consider the space of hypotheses that could in
principle be output by the above checkers learner. This hypothesis space consists of all
evaluation functions that can be represented by some choice of values for the weights wo
through w6.
The learner's task is thus to search through this vast space to locate the hypothesis that is
most consistent with the available training examples. The LMS algorithm for fitting
weights achieves this goal by iteratively tuning the weights, adding a correction to each
weight each time the hypothesized evaluation function predicts a value that differs from
the training value.

This algorithm works well when the hypothesis representation considered by the learner
defines a continuously parameterized space of potential hypotheses.
Issues in Machine Learning

 What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired
function, given sufficient training data? Which algorithms perform best for
which types of problems and representations?
 How much training data is sufficient? What general bounds can be found to
relate the confidence in learned hypotheses to the amount of training
experience and the character of the learner's hypothesis space?
 When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is
only approximately correct?
 What is the best strategy for choosing a useful next training experience, and how
does the choice of this strategy alter the complexity of the learning problem?
 What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should the
system attempt to learn? Can this process itself be automated?
 How can the learner automatically alter its representation to improve its ability
to represent and learn the target function?

Concept Learning:

Concept learning: Inferring a boolean-valued function from training examples of its


input and output.

A CONCEPT LEARNING TASK:

What hypothesis representation shall we provide to the learner in this case? Let us begin
by considering a simple representation in which each hypothesis consists of a conjunction
of constraints on the instance attributes. In particular, let each hypothesis be a vector of
six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind,
Water, and Forecast. For each attribute, the hypothesis will either indicate by a "?' that
any value is acceptable for this attribute, specify a single required value (e.g., Warm) for
the attribute, or indicate by a "θ" that no value is acceptable.
The inductive learning hypothesis. Any hypothesis found to approximate the
target function well over a sufficiently large set of training examples will also
approximate the target function well over other unobserved examples.

CONCEPT LEARNING AS SEARCH Concept learning can be viewed as the


task of
searching through a large space of hypotheses implicitly defined by the hypothesis
representation. The goal of this search is to find the hypothesis that best fits the training
examples. It is important to note that by selecting a hypothesis representation, the
designer of the learning algorithm implicitly defines the space of all hypotheses that the
program can ever represent and therefore can ever learn.

General-to-Specific Ordering of Hypotheses Many algorithms for concept


learning
organize the search through the hypothesis space by relying on a very useful structure
that exists for any concept learning problem: a general-to-specific ordering of
hypotheses. By taking advantage of this naturally occurring structure over the
hypothesis space, we can design learning algorithms that exhaustively search even
infinite hypothesis spaces without explicitly enumerating every hypothesis.

To illustrate the general-to-specific ordering, consider the two hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?)

h2 = (Sunny, ?, ?, ?, ?, ?)

Now consider the sets of instances that are classified positive by hl and by h2. Because
h2 imposes fewer constraints on the instance, it classifies more instances as positive. In
fact, any instance classified positive by hl will also be classified positive by h2.
Therefore, we say that h2 is more general than hl.

Definition: Let hj and hk be boolean-valued functions defined over X. Then hj is


more general-than-or-equal-to hk (written hj >= hk) if and only if
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS:

Fig: Find-S Algorithm

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM:

The CANDIDATE-ELIMINATION algorithm finds all describable hypotheses that are


consistent with the observed training examples. In order to define this algorithm precisely,
we begin with a few basic definitions. First, let us say that a hypothesis is consistent with
the training examples if it correctly classifies these examples.
CANDIDATE-ELIMINATION Learning Algorithm:
REMARKS ON VERSION SPACES AND CANDIDATE-ELIMINATION
ALGORITHM:

Will the CANDIDATE-ELIMINATION Algorithm Converge to the Correct


Hypothesis?

The version space learned by the CANDIDATE-ELIMINATION algorithm will con-


verge toward the hypothesis that correctly describes the target concept, provided (1)
there are no errors in the training examples, and (2) there is some hypothesis in H that
correctly describes the target concept.

What Training Example Should the Learner Request Next? Up to this point we
have assumed that training examples are provided to the learner by some external
teacher. Suppose instead that the learner is allowed to conduct experiments in
which it chooses the next instance, then obtains the correct classification for this
instance from an external oracle (e.g., nature or a teacher).

How Can Partially Learned Concepts Be Used? Suppose that no addit onal
training
examples are available beyond the four in our example above, but that the learner is
now required to classify new instances that it has not yet observed. Even though the
version space still contains multiple hypotheses, indicating that the target concept has
not yet been fully learned, it is possible to classify certain examples with the same
degree of confidence as if the target concept had been uniquely identified.

INDUCTIVE BIAS:

Inductive bias of CANDIDATE-ELIMINATION algorithm. The concept c is


target contained in the given hypothesis space H.
Linear Discriminant Analysis (LDA) in Machine Learning

Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality reduction
techniques in machine learning to solve more than two-class classification problems. It is
also known as Normal Discriminant Analysis (NDA) or Discriminant Function Analysis
(DFA).

This can be used to project the features of higher dimensional space into lower- dimensional
space in order to reduce resources and dimensional costs. In this topic, "Linear Discriminant
Analysis (LDA) in machine learning”, we will discuss the LDA algorithm for classification
predictive modeling problems, limitation of logistic regression, representation of linear
Discriminant analysis model, how to make a prediction using LDA, how to prepare data for
LDA, extensions to LDA and much more. So, let's start with a quick introduction to Linear
Discriminant Analysis (LDA) in machine learning.

What is Linear Discriminant Analysis (LDA)?

Although the logistic regression algorithm is limited to only two-class, linear Discriminant
analysis is applicable for more than two classes of classification problems.

Linear Discriminant analysis is one of the most popular dimensionality reduction


techniques used for supervised classification problems in machine learning. It is also
considered a pre-processing step for modeling differences in ML and applications of pattern
classification.

Whenever there is a requirement to separate two or more classes having multiple features
efficiently, the Linear Discriminant Analysis model is considered the most common
technique to solve such classification problems. For e.g., if we have two classes with multiple
features and need to separate them efficiently. When we classify them using a single feature,
then it may show overlapping.

To overcome the overlapping issue in the classification process, we must increase the number
of features regularly.
Example:

Let's assume we have to classify two different classes having two sets of data points in a 2-
dimensional plane as shown below image:

However, it is impossible to draw a straight line in a 2-d plane that can separate these data
points efficiently but using linear Discriminant analysis; we can dimensionally reduce the 2-
D plane into the 1-D plane. Using this technique, we can also maximize the separability
between multiple classes.

How Linear Discriminant Analysis (LDA) works?

Linear Discriminant analysis is used as a dimensionality reduction technique in machine


learning, using which we can easily transform a 2-D and 3-D graph into a 1- dimensional
plane.

Let's consider an example where we have two classes in a 2-D plane having an X-Y axis, and
we need to classify them efficiently. As we have already seen in the above example that LDA
enables us to draw a straight line that can completely separate the two classes of the data
points. Here, LDA uses an X-Y axis to create a new axis by separating them using a straight
line and projecting data onto a new axis.
Hence, we can maximize the separation between these classes and reduce the 2-D plane into
1-D.

To create a new axis, Linear Discriminant Analysis uses the following criteria:

o It maximizes the distance between means of two classes.


o It minimizes the variance within the individual class.
Using the above two conditions, LDA generates a new axis in such a way that it can
maximize the distance between the means of the two classes and minimizes the variation
within each class.

In other words, we can say that the new axis will increase the separation between the data
points of the two classes and plot them onto the new axis.

Why LDA?
o Logistic Regression is one of the most popular classification algorithms that perform
well for binary classification but falls short in the case of multiple classification
problems with well-separated classes. At the same time, LDA handles these quite
efficiently.
o LDA can also be used in data pre-processing to reduce the number of features, just as
PCA, which reduces the computing cost significantly.
o LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to extract
useful data from different faces. Coupled with eigenfaces, it produces effective
results.

Drawbacks of Linear Discriminant Analysis (LDA)

Although, LDA is specifically used to solve supervised classification problems for two or
more classes which are not possible using logistic regression in machine learning.
But LDA also fails in some cases where the Mean of the distributions is shared. In this case,
LDA fails to create a new axis that makes both the classes linearly separable.

To overcome such problems, we use non-linear Discriminant analysis in machine learning.

Extension to Linear Discriminant Analysis (LDA)

Linear Discriminant analysis is one of the most simple and effective methods to solve
classification problems in machine learning. It has so many extensions and variations as
follows:

1. Quadratic Discriminant Analysis (QDA): For multiple input variables, each class
deploys its own estimate of variance.
2. Flexible Discriminant Analysis (FDA): it is used when there are non-linear groups
of inputs are used, such as splines.
3. Flexible Discriminant Analysis (FDA): This uses regularization in the estimate of
the variance (actually covariance) and hence moderates the influence of different
variables on LDA.

Real-world Applications of LDA

Some of the common real-world applications of Linear discriminant Analysis are given
below:

o Face Recognition
Face recognition is the popular application of computer vision, where each face is
represented as the combination of a number of pixel values. In this case, LDA is used
to minimize the number of features to a manageable number before going through the
classification process. It generates a new template in which each dimension consists
of a linear combination of pixel values. If a linear combination is generated using
Fisher's linear discriminant, then it is called Fisher's face.
o Medical
In the medical field, LDA has a great application in classifying the patient disease on
the basis of various parameters of patient health and the medical treatment which is
going on. On such parameters, it classifies disease as mild, moderate, or severe. This
classification helps the doctors in either increasing or decreasing the pace of the
treatment.

o Customer Identification
In customer identification, LDA is currently being applied. It means with the help of
LDA; we can easily identify and select the features that can specify the group of
customers who are likely to purchase a specific product in a shopping mall. This can
be helpful when we want to identify a group of customers who mostly purchase a
product in a shopping mall.
o For Predictions
LDA can also be used for making predictions and so in decision making. For example,
"will you buy this product” will give a predicted result of either one or two possible
classes as a buying or not.
o In Learning
Nowadays, robots are being trained for learning and talking to simulate human work,
and it can also be considered a classification problem. In this case, LDA builds similar
groups on the basis of different parameters, including pitches, frequencies, sound,
tunes, etc.

Perceptron in Machine Learning

In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term
for all folks. It is the primary step to learn Machine Learning and Deep Learning
technologies, which consists of a set of weights, input values or scores, and a threshold.
Perceptron is a building block of an Artificial Neural Network. Initially, in the mid of 19 th
century, Mr. Frank Rosenblatt invented the Perceptron for performing certain calculations
to detect input data capabilities or business intelligence. Perceptron is a linear Machine
Learning algorithm used for supervised learning for various binary classifiers. This algorithm
enables neurons to learn elements and processes them one by one during preparation. In this
tutorial, "Perceptron in Machine Learning," we will discuss in-depth knowledge of
Perceptron and its basic functions in brief. Let's start with the basic introduction of
Perceptron.
What is the Perceptron model in Machine Learning?

Perceptron is Machine Learning algorithm for supervised learning of various binary


classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.

What is Binary classifier in Machine Learning?

In Machine Learning, binary classifiers are defined as the function that helps in deciding
whether input data can be represented as vectors of numbers and belongs to some specific
class.

Backward Skip 10sPlay VideoForward Skip 10s

Binary classifiers can be considered as linear classifiers. In simple words, we can understand
it as a classification algorithm that can predict linear predictor function in terms of weight
and feature vectors.

Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main components. These are as follows:
o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into
the system for further processing. Each input node contains a real numerical
value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This
is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear
equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

The data scientist uses the activation function to take a subjective decision based
on various problem statements and forms the desired outputs. Activation
function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by
checking whether the learning process is slow or has vanishing or exploding
gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network


that consists of four main parameters named input values (Input nodes), weights
and Bias, net sum, and an activation function. The perceptron model begins with
the multiplication of all input values and their weights, then adds these values
together to create the weighted sum. Then this weighted sum is applied to the
activation function 'f' to obtain the desired output. This activation function is also
known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output
is mapped between required values (0,1) or (-1,1). It is important to note that the
weight of input is indicative of the strength of a node. Similarly, an input's bias
value gives the ability to shift the activation function curve up or down.
Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value
as follows:
Y = f(∑wi*xi + b)

Linear separability

Linear separability is an important concept in machine learning, particularly in


the field of supervised learning. It refers to the ability of a set of data points to be
separated into distinct categories using a linear decision boundary. In other
words, if there exists a straight line that can cleanly divide the data into two
classes, then the data is said to be linearly separable.

Linear separability is a concept in machine learning that refers to the ability to


separate data points in binary classification problems using a linear decision
boundary. If the data points can be separated using a line, linear function, or flat
hyperplane, they are considered linearly separable. Linear separability is an
important concept in neural networks, and it is introduced in the context of linear
algebra and optimization theory.

In the context of machine learning, linear separability is an important property


because it makes classification problems easier to solve. If the data is linearly
separable, we can use a linear classifier, such as logistic regression or support
vector machines (SVMs), to accurately classify new instances of data.

Linearly separable data points can be separated using a line, linear function, or
flat hyperplane. In practice, there are several methods to determine whether data
is linearly separable[3]. One method is linear programming, which defines an
objective function subjected to constraints that satisfy linear separability.
Another method is to train and test on the same data - if there is a line that
separates the data points, then the accuracy or AUC should be close to 100%. If
there is no such line, then training and testing on the same data will result in at
least some error. Multi-layer neural networks can learn hidden features and
patterns in data that linear classifiers cannot

To understand the concept of linear separability, it is helpful to first consider a


simple two-dimensional example. Imagine we have a set of data points in a two-
dimensional space, where each point is labeled either "red" or "blue". If these
data points can be separated by a straight line, such that all the red points are on
one side of the line and all the blue points are on the other side, then the data is
linearly separable.

Python provides several methods to determine whether data is linearly separable.


One method is linear programming, which defines an objective function
subjected to constraints that satisfy linear separability. Another method is
clustering, where if two clusters with cluster purity of 100% can be found using
some clustering methods such as k-means, then the data is linearly separable.

However, not all data sets are linearly separable. In some cases, it may be
impossible to draw a straight line that can separate the data into distinct
categories. For example, imagine a set of data points that are arranged in a
circular pattern, with red and blue points interspersed throughout. In this case, it
is impossible to draw a straight line that separates the data into two classes.

When faced with data that is not linearly separable, machine learning algorithms
must use more complex decision boundaries to accurately classify the data. For
example, a decision tree or a neural network may be able to accurately classify
data that is not linearly separable.
Linear separability is not only important in the context of machine learning, but it
also has applications in other fields such as physics, biology, and economics. For
example, in physics, linear separability can be used to analyze the relationship
between two physical quantities. In biology, it can be used to study the behavior of
animals or to analyze genetic data. In economics, it can be used to analyze the
relationship between two economic variables.

Example:
One way to test for linear separability is to use linear programming. Linear
programming defines an objective function subject to constraints that satisfy linear
separability. The [Link]() function in Python can be used to solve
linear programming problems. Here's an example of using [Link]()
to test for linear separability:

1. import numpy as np
2. from [Link] import linprog
3.
4. # Define the data points
5. X = [Link]([[1, 2], [2, 3], [3, 1], [4, 3]])
6. y = [Link]([1, 1, -1, -1])
7.
8. # Define the objective function and constraints
9. c = [Link]([Link][1] + 1)
10. c[-1] = 1
11. A = [Link](([Link], [Link][1] + 1))
12. A[:, :-1] = -y[:, [Link]] * X
13. A[:, -1] = -y
14. b = -[Link]([Link])
15.
16. # Solve the linear programming problem
17. res = linprog(c, A_ub=A, b_ub=b)
18.
19. if [Link]:
20. print("The data is linearly separable.")
21. else:
22. print("The data is not linearly separable.")
In this example, we define a set of data points X and their corresponding labels y. We then
define the objective function and constraints for the linear programming problem. The objective
function is simply a vector of zeros with a 1 in the last position, which corresponds to the bias
term in the linear decision boundary. The constraints are defined such that the dot product of
each data point with the weight vector and bias term is greater than or equal to -1 for negative
examples and less than or equal to 1 for positive examples. We then solve the linear
programming problem using [Link]() and check if the solution is successful. If
the solution is successful, the data is linearly separable.

Another way to test for linear separability is to use a linear classifier or support vector machine
(SVM) with a linear kernel. For linearly separable datasets, a linear classifier or SVM with a
linear kernel can achieve 100% accuracy to classify data[4]. Here's an example of using a linear
SVM to test for linear separability:
1. from sklearn import svm
2.
3. # Define the data points
4. X = [Link]([[1, 2], [2, 3], [3, 1], [4, 3]])
5. y = [Link]([1, 1, -1, -1])
6.
7. # Train a linear SVM
8. clf = [Link](kernel='linear')
9. [Link](X, y)
10. if [Link](X, y) == 1.0:
11. print("The data is linearly separable.")
12. else:
13. print("The data is not linearly separable.")

In this example, we define a set of data points X and their corresponding labels y. We then train
a linear SVM using the [Link]() function from scikit-learn with a linear kernel. We then
check if the accuracy of the SVM on the training data is 100%.

Moreover, it is important to note that linear separability is not the only criterion for the
effectiveness of a classification algorithm. In some cases, even if the data is linearly separable, a
linear classifier may not be the best choice. For example, if the data is high-dimensional or
contains complex nonlinear relationships, a nonlinear classifier may be more effective. In such
cases, machine learning algorithms such as decision trees, random forests, or neural networks
may be more appropriate

In practice, determining whether a set of data points is linearly separable can be challenging.
There are several approaches that can be used to determine linear separability, including visual
inspection and mathematical analysis. One common approach is to use a scatter plot to visualize
the data and see if a straight line can be drawn that separates the data into two classes.

Another approach is to use mathematical tools to determine if the data is linearly separable. For
example, the Perceptron algorithm is a simple algorithm that can be used to determine linear
separability. The algorithm works by iterating through the data points and updating a set of
weights that define the decision boundary. If the algorithm is able to converge on a set of
weights that separates the data, then the data is linearly separable.

Another important aspect to consider is that linear separability is not an inherent property of the
data itself, but rather a property of the representation of the data. In other words, the way in
which the data is transformed or encoded can affect whether or not it is linearly separable. This
means that preprocessing techniques such as feature selection,

In conclusion, linear separability is an important concept in machine learning that refers to the
ability of a set of data points to be separated into distinct categories using a linear decision
boundary. Linear separability makes classification problems easier to solve, but not all data sets
are linearly separable. linear separability is an important concept in machine learning that allows
us to determine if a binary classification problem can be separated using a linear decision
boundary. In practice, determining whether a set of data points is linearly separable can be
challenging, and there are several approaches that can be used to determine linear separability,
including visual inspection and mathematical analysis.

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε
Here,

Y= Dependent Variable (Target Variable)

X= IndependentVariable (predictor Variable)

a0= intercept of the line (Gives an additional degree of freedom)

a1 = Linear regression coefficient (scale factor to each input value).

ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression: If a single independent variable is used to predict the


value of a numerical dependent variable, then such a Linear Regression algorithm is
called Simple Linear Regression.

o Multiple Linear regression: If more than one independent variable is used to predict
the value of a numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:

o Positive Linear Relationship: If the dependent variable increases on the Y-axis and
independent variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.

o Negative Linear Relationship: If the dependent variable decreases on the Y-axis and
independent variable increases on the X-axis, then such a relationship is called a
negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the
error between predicted values and actual values should be minimized. The best fit line will
have the least error.

The different values for weights or the coefficient of lines (a 0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to
calculate this we use cost function.

Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line
of regression, and the cost function is used to estimate the values of the coefficient for
the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a
linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which
maps the input variable to the output variable. This mapping function is also known as
Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the
average of squared error occurred between the predicted values and actual values. It can be
written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation


Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual. If
the observed points are far from the regression line, then the residual will be high, and so cost
function will high. If the scatter points are close to the regression line, then the residual will
be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
o A regression model uses gradient descent to update the coefficients of the line by reducing the
cost function.
o It is done by a random selection of values of coefficient and then iteratively update the values to
reach the minimum cost function.

Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The process of
finding the best model out of various models is called optimization. It can be achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.


o It measures the strength of the relationship between the dependent and independent variables on a
scale of 0-100%.
o The high value of R-square determines the less difference between the predicted values and actual
values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple determination
for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal checks while
building a Linear Regression model, which ensures to get the best possible result from the given dataset.

o Linear relationship between the features and target: Linear regression assumes the linear
relationship between the dependent and independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to
multicollinearity, it may difficult to find the true relationship between the predictors and target
variables. Or we can say, it is difficult to determine which predictor variable is affecting the target
variable and which is not. So, the model assumes either little or no multicollinearity between the
features or independent variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of independent
variables. With homoscedasticity, there should be no clear pattern distribution of data in the scatter
plot.
o Normal distribution of error terms: Linear regression assumes that the error term should follow
the normal distribution pattern. If error terms are not normally distributed, then confidence
intervals will become either too wide or too narrow, which may cause difficulties in finding
coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any deviation, which
means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be any
correlation in the error term, then it will drastically reduce the accuracy of the model.
Autocorrelation usually occurs if there is a dependency between residual errors.

You might also like