WELL-POSED LEARNING PROBLEMS
A well-posed learning problem clearly specifies what is to be learned, how learning success is
measured, and from what experience the learning occurs. This precise formulation helps in
designing, analyzing, and comparing machine learning algorithms. //definition: A computer
program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P , improves with
experience E . // In general to have a Well Posed learning Problems, we must identify 3
features:// 1. The class of tasks (T) 2. The measure of performance to be improved (P) 3. The
source of experience (E) //Examples 1. Checkers game: A computer program that learns to play
checkers might improve its performance as measured by its ability to win at the class of tasks
involving playing checkers games, through experience obtained by playing games against itself.
checkers learning problem : Task T: playing checkers // Performance measure P: percent of
games won against opponents // Training experience E: playing practice games against
itself //A handwriting recognition learning problem: Task T: recognizing and classifying
handwritten words within images //Performance measure P: percent of words correctly
classified // Training experience E: A database of handwritten words with given classifications
Designing a Learning System-Designing a learning system involves a sequence of well-defined
decisions that determine what is learned, how it is learned, and how learning improves
performance. The design process can be explained as follows: //1. Define the performance
task (T) -The first step is to clearly specify the task the system must perform. Example: Playing
the game of checkers. //[Link] the performance measure (P) - A quantitative measure is
selected to evaluate how well the system performs the task. Example: Percentage of games
won in a world checkers tournament. //[Link] the training experience (E)
Decide how the system will gain experience. The experience may provide direct or indirect
feedback. Example: Learning by playing games against itself without a human
teacher.//[Link] the degree of control over training data-The learner may depend on a
teacher, query an expert, or autonomously generate its own training [Link] self-play, the
learner controls both the problems and outcomes.// [Link] representativeness of training
experience-Training examples should closely match the distribution of situations in which final
performance is measured.//Mismatch between training and testing distributions can reduce
performance.//[Link] the target function to be learned
The learning task is reduced to learning a specific function that improves
performance.//Example: Learning an evaluation function V that assigns a numerical value to
each board state.//[Link] an ideal (non-operational) target function-The ideal target
function specifies correct behavior but may be computationally infeasible.//Learning aims to
approximate this ideal function.//[Link] a representation for the target function
Select how the function will be represented, balancing expressiveness and
learnability.//Example: Representing the evaluation function as a linear combination of board
features.//[Link] a learning (function approximation) algorithm- A suitable algorithm is
chosen to adjust parameters based on training examples.//Example: Least Mean Squares (LMS)
algorithm using gradient descent to minimize squared error.//[Link] the complete learning
architecture The final system consists of four modules :Performance System – uses the
learned function to perform the task //Critic – evaluates performance and generates training
examples//Generalizer – learns a general hypothesis from examples//Experiment Generator –
creates new experiences to improve learning
A Concept Learning Task
Meaning of concept learning
Concept learning is the task of learning a boolean-valued target concept that classifies
instances as positive or negative based on their attributes. //Example problem (EnjoySport)
The learning task is to determine the days on which Aldo enjoys his favorite water sport,
based on observable weather-related attributes. // Instances (X)- The set of instances consists
of all possible days, each described using six attributes:
Sky, AirTemp, Humidity, Wind, Water, and Forecast. //Target concept (c)-The target concept is
EnjoySport, defined as //c(x)=1c(x) = 1c(x)=1 if Aldo enjoys the sport on day xxx, and c(x)=0c(x)
= 0c(x)=0 otherwise.// Training examples (D)- Each training example is an ordered pair (x,c(x))
(x, c(x))(x,c(x)), where xxx is a day and c(x)c(x)c(x) indicates whether it is a positive (Yes) or
negative (No) example.// Hypothesis representation (H)- Each hypothesis is represented as a
conjunction of constraints on the six attributes, where each constraint can be: “?” → any value
acceptable //a specific value (e.g., Warm) // “0” → no value acceptable // Hypothesis
evaluation -A hypothesis hhh classifies an instance xxx as positive if all attribute constraints
are satisfied, i.e., h(x)=1h(x) = 1h(x)=1.// Most general and most specific hypotheses- Most
general hypothesis: (?,?,?,?,?,?)(?, ?, ?, ?, ?, ?)(?,?,?,?,?,?) → all instances are positive //Most
specific hypothesis: (0,0,0,0,0,0)(0, 0, 0, 0, 0, 0)(0,0,0,0,0,0) → no instance is positive// Goal of
concept learning-The learner’s goal is to find a hypothesis h∈Hh \in Hh∈H such that
h(x)=c(x)h(x) = c(x)h(x)=c(x) for all instances x∈Xx \in Xx∈X, based only on the training
data.//Inductive Learning Hypothesis- Since the learner observes only a subset of instances,
concept learning relies on the assumption that://A hypothesis that fits a sufficiently large set
of training examples well will also perform well on unseen examples.
FIND-S: Finding a Maximally Specific Hypothesis
Purpose of FIND-S- FIND-S is a concept learning algorithm used to find the most specific
hypothesis in the hypothesis space H that is consistent with the observed training examples. //
Search strategy- The algorithm organizes the hypothesis search using the more-general-than
partial ordering, moving from specific hypotheses to more general ones only when
represented by constraints that reject all instances(e.g., ⟨0,0,0,0,0,0⟩).// Processing positive
required. //Initialization -FIND-S begins with the most specific hypothesis in H, typically
examples-When a positive training example is encountered that is not covered by the current
hypothesis, the hypothesis is minimally generalized to include that example. // Minimal
generalization rule-For each attribute that conflicts with the positive example, the constraint is
replaced by: the example’s attribute value, or//“?” if needed to remain consistent with earlier
examples. //Handling negative examples- FIND-S ignores all negative training examples.
If the current hypothesis already classifies a negative example correctly, no change is
required.// Justification for ignoring negatives-Assuming the true target concept lies in H and
the data is noise-free, the most specific hypothesis consistent with all positive examples will
never incorrectly classify a negative example.// Search behavior-The algorithm follows a single
chain in the hypothesis space, moving upward from the most specific hypothesis to
increasingly general hypotheses.// Final output-The result of FIND-S is the maximally specific
hypothesis that is consistent with all observed positive training examples and, indirectly, with
negative examples.// Limitations of FIND-S -Cannot determine whether the learned hypothesis
is the unique correct target concept // Always prefers the most specific hypothesis without
justification // Cannot handle noise or inconsistent training data //Cannot manage cases with
multiple maximally specific hypotheses
CANDIDATE-ELIMINATION Learning Algorithm
Purpose of the algorithm
The CANDIDATE-ELIMINATION algorithm computes the version space, which is the set of all
hypotheses in the hypothesis space H that are consistent with the observed training
examples. //Version space representation- The version space is represented implicitly using
two boundary sets: S: the set of maximally specific hypotheses//G: the set of maximally
⟨?, ?, ?, ?, ?, ?⟩ // S₀ is initialized to the most specific hypothesis in H:
general hypotheses Initialization - G₀ is initialized to the most general hypothesis in H:
⟨0, 0, 0, 0, 0, 0⟩// Processing positive examples -When a positive training example is
encountered:Remove from G all hypotheses that do not cover the example //Generalize
hypotheses in S minimally so that they cover the example //.Minimal generalization of S
Each hypothesis in S that is inconsistent with the positive example is replaced by its minimal
generalizations, provided they are still more specific than some hypothesis in G.// Processing
negative examples //When a negative training example is encountered:Remove from S all
hypotheses that incorrectly cover the negative example// Specialize hypotheses in G minimally
so that they exclude the negative example // Minimal specialization of G
Each hypothesis in G that is inconsistent with a negative example is replaced by its minimal
specializations, provided they are still more general than some hypothesis in S.// Boundary
maintenance- Remove from S any hypothesis that is more general than another hypothesis in
S // Remove from G any hypothesis that is less general than another hypothesis in
G //Monotonic convergence- As more training examples are processed, the S boundary
becomes more general and the G boundary becomes more specific, steadily shrinking the
version space. // Final outcome and properties- After all training examples are processed, the
version space contains all and only those hypotheses consistent with the data. The final
version space is independent of the order of training examples, assuming the data is noise-
free and the target concept lies in H.
Version space concept- The version space (VS) is the set of all hypotheses in a hypothesis
space H that are consistent with the given training data D. Need for compact representation
Explicitly listing all hypotheses in the version space is often infeasible due to its large size.
Hence, a compact representation is required. Boundary-based representation
The version space can be represented compactly using only its most general and most specific
members. Partial ordering of hypotheses -Hypotheses in H are organized using the more-
general-than (≥g) partial ordering, which allows comparison of hypotheses by their
generality. // General boundary (G)
The general boundary G is the set of maximally general hypotheses in H that are consistent
with the training data D. // Formal definition of G - G={g∈H∣Consistent(g,D)∧∄g′∈H[(g
′>gg)∧Consistent(g′,D)]}G = \{ g \in H \mid Consistent(g, D) \land \nexists g' \in H [(g' >_g g) \
land Consistent(g', D)] \}G={g∈H∣Consistent(g,D)∧∄g′∈H[(g′>gg)∧Consistent(g′,D)]} //
Specific boundary (S)
The specific boundary S is the set of minimally general (maximally specific) hypotheses in H
that are consistent with the training data D. Formal definition of S -
S={s∈H∣Consistent(s,D)∧∄s′∈H[(s>gs′)∧Consistent(s′,D)]}S = \{ s \in H \mid Consistent(s, D) \
′)∧Consistent(s′,D)]} Version Space Representation Theorem - The version space consists of
land \nexists s' \in H [(s >_g s') \land Consistent(s', D)] \}S={s∈H∣Consistent(s,D)∧∄s′∈H[(s>gs
all hypotheses that lie between S and G in the general-to-specific ordering. Mathematical
expression of the theorem -
INDUCTIVE BIAS // 1. Inductive Bias
// Inductive Bias refers to the set of assumptions a learning algorithm // uses to generalize
beyond the observed training data.// Machine learning requires these assumptions because
learning// without inductive bias is impossible to uniquely determine the target concept.//
Biased Hypothesis Space- A hypothesis space is biased when it restricts the type of
hypotheses the learner can represent.// The hypothesis space contains only conjunctions of
attribute values.// It can not represent disjunctive concept. // Ex: Sky Airtemp Humidity Wind
Water Forecast EnjoySport
// 1 Sunny Warm Normal Strong Warm Same Yes
// 2 Cloudy Warm Normal Strong Cool Change Yes
// 3 Rainy Cold High Weak Cool Change No // Hypothesis space in decision tree –
constrained.// Unbiased learner// To ensure the target concept is always representable,
choose H as the power set of X containing every possible concept.// This eliminates the
previous representation problem.// Then the S boundary becomes the conjunction of
observed positive examples.// The G boundary becomes the negated disjunction of observed
negative examples.3. Futility of bias-free learning// A learner without inductive bias has no
rational basis for classifying unseen instances.// Therefore, without bias the learner cannot
generalize and cannot classify anything beyond observed examples. 4. Inductive Bias of
Candidate Elimination // The inductive bias of CE is the assumption that the target concept
belongs to the hypothesis space H. // This assumption is correct → CE generalizes correctly.// If
incorrect → CE indicates misclassifies instance because the true concept lies outside H