Q1. Describe the various forms of learning in AI.
Machine Learning is broadly classified into three main forms based on the nature of the data
and the feedback mechanism used for learning:
Form of Feedback
Data Type Goal Example
Learning Mechanism
To learn a
mapping
Direct
function Spam
feedback. The
that can Detection: Training
Labeled algorithm's
predict a model with
Supervised Data (input- predictions
the emails labeled as
Learning output pairs are compared
output for 'spam' or 'not
are provided) against the
new, spam' to classify
known
unseen new emails.
correct labels.
input
data.
To
discover
hidden
patterns, No feedback. Customer
structures The algorithm Segmentation: Gro
Unlabeled
, or explores the uping customers
Unsupervis Data (only
relationsh data's into distinct market
ed Learning input data is
ips within intrinsic segments based on
provided)
the data structure on their purchasing
without its own. behavior.
any pre-
existing
labels.
No predefined To learn Indirect
data. Game
Reinforcem an feedback
An agent inter Playing: Training an
ent optimal via rewards (f
acts with agent to play Chess
Learning policy (a or good
an environme by rewarding it for
sequence actions)
nt. winning the game
of and penalties
actions) (for bad and penalizing it for
that actions). losing.
maximize
sa
cumulativ
e reward
over time.
Q2. Distinguish between Supervised and Unsupervised Learning.
Feature Supervised Learning Unsupervised Learning
Requires labeled data where each Works with unlabeled
Input Data data point has a known output or data where the outputs are
tag. unknown.
To discover hidden patterns,
To predict an outcome or classify
Goal groupings, or the inherent
data. The goal is predefined.
structure of the data.
Can be more complex as there is
Generally less complex as the
Complexity no "correct" answer to guide
learning is guided by the labels.
the learning process.
Linear Regression, Logistic K-Means Clustering, Hierarchical
Key
Regression, Decision Trees, Clustering, Principal Component
Algorithms
Support Vector Machines (SVM). Analysis (PCA).
The model learns by comparing its There is no feedback
Feedback output with the correct, labeled mechanism based on a known
output and correcting for errors. output.
Q3. How do you choose and evaluate the best hypothesis? (including Ockham's Razor)
The process of choosing and evaluating the best hypothesis (model) involves training,
testing, and applying a guiding principle for model selection.
1. Hypothesis: In machine learning, a hypothesis is the model or function learned by an
algorithm that maps inputs to outputs.
2. Evaluation Method (Train/Test Split):
o The dataset is split into two parts: a Training Set (used to train the model)
and a Test Set (used to evaluate the trained model's performance on unseen
data).
o A good hypothesis performs well not just on the training set, but also on the
test set. This ability to perform well on unseen data is called generalization.
3. Choosing the Best Hypothesis (Ockham's Razor Principle):
o Principle: "Among competing hypotheses that explain the data equally well,
the simplest one should be preferred."
o Significance in ML: This principle guides us to choose the least complex
model that achieves good performance. Simpler models are less likely to be
influenced by random noise in the training data and therefore tend
to generalize better, avoiding the problem of overfitting.
Q4. What is overfitting and underfitting, and how do they affect generalization?
Overfitting and underfitting are two of the most common problems that cause a machine
learning model to have poor generalization (i.e., perform poorly on new, unseen data).
Feature Overfitting Underfitting
The model learns the training
The model is too simple to
data too well, capturing not only
capture the underlying
Definition the underlying patterns but also
structure and patterns present
the noise and random
in the data.
fluctuations.
Training Error is very low (high
Both Training and Test Errors
accuracy on seen data).<br>Test
Performance are high. The model is
Error is very high (low accuracy
inaccurate on all data.
on unseen data).
High Bias, Low Variance. The
Low Bias, High Variance. The
model is too rigid and makes
Bias / Variance model is too sensitive to the
strong assumptions about the
training data.
data.
Poor Generalization. The model
Poor Generalization. The
fails to predict future data
model fails to make accurate
Effect on accurately because it has
predictions because it has not
Generalization "memorized" the training data
learned the relevant patterns
instead of learning the general
from the training data.
trend.
Q5. Explain cross-validation in machine learning.
Cross-validation is a resampling technique used to evaluate machine learning models on a
limited data sample. It provides a more robust estimate of a model's performance and its
ability to generalize than a simple train/test split.
K-Fold Cross-Validation (Most Common Method):
1. Partition: The dataset is randomly divided into k equal-sized subsets, or "folds".
2. Iterate: The process is run k times. In each iteration:
o One fold is held out as the validation set.
o The remaining k-1 folds are used as the training set.
o The model is trained on the training set and its performance is evaluated on
the validation set.
3. Average: The performance scores from the k iterations are averaged to produce a
single, more reliable performance estimate for the model.
Purpose: It helps prevent the evaluation from being biased by a single, potentially "lucky" or
"unlucky" train/test split.
Q6. Explain the Decision Tree learning algorithm (ID3).
The ID3 (Iterative Dichotomiser 3) algorithm builds a decision tree using a top-down, greedy
approach. Its goal is to create a tree that classifies examples by sorting them down the tree
from the root to some leaf node.
Algorithm Steps:
1. Start: Begin with the root node, which contains the entire training dataset.
2. Check for Base Cases (Stopping Conditions):
o If all examples in the current node belong to the same class, create a leaf
node with that class label and stop.
o If there are no more attributes to split on, create a leaf node with the
majority class label of the examples in the current node.
3. Find the Best Attribute:
o For each attribute, calculate its Information Gain. Information Gain measures
how much the attribute reduces the uncertainty (entropy) about the final
classification.
o Select the attribute with the highest Information Gain.
4. Split the Node:
o Make the selected attribute the decision node.
o Create a new branch for each possible value of that attribute.
o Divide the examples into subsets based on their value for the selected
attribute and move them down the corresponding branches.
5. Recurse: Recursively apply steps 2-4 to each new subset (branch) until all examples
are classified.
Q7. What is entropy and what is its significance in decision tree learning?
Entropy:
• Definition: Entropy is a metric from information theory that measures the level
of impurity, uncertainty, or disorder in a set of data.
• An entropy of 0 indicates a perfectly pure set (all examples belong to the same class).
• An entropy of 1 (for a binary classification problem) indicates maximum impurity (an
equal number of examples in each class).
Significance in Decision Tree Learning:
The entire goal of the decision tree algorithm is to reduce entropy at each step.
• The algorithm calculates the Information Gain for each attribute. Information Gain is
the expected reduction in entropy achieved by splitting the data on that attribute.
• The attribute that provides the highest Information Gain is chosen as the decision
node because it does the best job of creating purer, more homogeneous subsets. In
short, entropy guides the algorithm in making the most informative splits.
Q8. How to solve a numerical problem for finding the best attribute to split at the root of a
Decision Tree?
This requires following the ID3 algorithm's attribute selection process step-by-step.
Methodology:
1. Calculate Initial Entropy of the Entire Dataset (H(S)):
o Count the total number of positive (𝑃) and negative (𝑁) outcomes.
o Use the formula:
𝑃 𝑃 𝑁 𝑁
𝐻(𝑆) = −( log2 + log2 )
𝑃+𝑁 𝑃+𝑁 𝑃+𝑁 𝑃+𝑁
2. For Each Attribute (e.g., Attribute A):
o Partition the Data: Create subsets for each value of Attribute A (e.g., A=True,
A=False).
o Calculate Entropy of Each Subset: For each subset, calculate its entropy using
the same formula as in Step 1.
o Calculate Information Gain of Attribute A (IG(S, A)): Use the formula for
weighted average entropy:
∣ 𝑆𝑣 ∣
𝐼𝐺(𝑆, 𝐴) = 𝐻(𝑆) − ∑ 𝐻(𝑆𝑣 )
∣𝑆∣
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)
where ∣ 𝑆𝑣 ∣is the number of examples in the subset for value v.
3. Compare and Select:
o Repeat Step 2 for all attributes.
o The attribute with the highest Information Gain is the best attribute to use
for the split at the root node.
Q9. Differentiate between classification and regression.
Feature Classification Regression
Discrete and Categorical (e.g.,
Output Continuous and Numerical (e.g.,
"Spam" or "Not Spam", "Dog" or
Type temperature, price, height).
"Cat", 0 or 1).
To predict the class label or
Goal To predict a real-valued quantity.
category of an input.
Metrics like Mean Squared Error
Metrics like Accuracy, Precision,
Evaluation (MSE), Root Mean Squared Error
Recall, F1-Score.
(RMSE).
Email Spam Filtering: Classifying House Price Prediction: Predicting
Scenario 1
an email as 'spam' or 'not spam'. the selling price of a house.
Medical Diagnosis: Predicting if
Stock Price Forecasting: Predicting
Scenario 2 a patient has a certain disease
the future price of a stock.
(Yes/No).
Image Recognition: Identifying
Weather Prediction: Predicting the
Scenario 3 the object in a picture (e.g., car,
amount of rainfall in millimeters.
bicycle, tree).
Q10. Explain Linear Regression (including Multivariate).
Linear Regression is a supervised learning algorithm used to model the relationship
between a dependent variable (the output) and one or more independent variables (the
inputs) by fitting a linear equation to the observed data. It is used for regression problems
(predicting continuous values).
• Univariate Linear Regression:
o Uses a single input variable (𝑥) to predict a single output variable (𝑦).
o The model is a straight line: 𝑦 = 𝑚𝑥 + 𝑐, where m is the slope and c is the
intercept.
• Multivariate Linear Regression:
o Definition: This is an extension that uses multiple input
variables (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) to predict a single output variable (𝑦).
o Equation: The model is a hyperplane in n-dimensional space:𝑦 = 𝑤0 +
𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 where 𝑤0 is the bias (intercept) and 𝑤1 , … , 𝑤𝑛 are
the weights (coefficients) for each feature.
o Goal: The algorithm's goal is to learn the optimal values for the weights (𝑤𝑖 )
that minimize the error (typically the Mean Squared Error) between the
model's predictions and the actual data points.
Q11. Explain Linear Classification with Logistic Regression.
Logistic Regression is a fundamental algorithm used for classification problems, despite
having "regression" in its name. It is a linear model that predicts the probability that an
input belongs to a particular class.
Mechanism:
1. Linear Combination: First, the algorithm computes a weighted sum of the input
features, just like in linear regression. This produces a score, often called the logit.
𝑧 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
2. Sigmoid Function: This linear score (𝑧) is then passed through a Sigmoid (or Logistic)
1
function:𝑃(𝑦 = 1) = 𝜎(𝑧) = 1+𝑒 −𝑧
The Sigmoid function squashes any real-valued number into a range between 0 and
1, which is interpreted as a probability.
3. Decision Boundary: The model creates a linear decision boundary. For binary
classification:
o If the probability is ≥ 0.5, the model predicts Class 1.
o If the probability is < 0.5, the model predicts Class 0.