0% found this document useful (0 votes)

35 views6 pages

ML Practice Questions and Solutions

The document contains practice questions and solutions related to various machine learning topics, including linear regression, regularization, logistic regression, naive Bayes, support vector machines, decision trees, regression trees, and ensemble learning. Each section presents specific problems or scenarios, requiring the application of concepts and formulas to derive answers. The document also includes discussions on algorithm complexities and assumptions made by different machine learning models.

Uploaded by

ron47507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views6 pages

ML Practice Questions and Solutions

Uploaded by

ron47507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Practice Questions ML

November 2024

Linear Regression
1. Suppose we have a dataset with five predictors, X1 = GPA, X2 = IQ, X3 =
Level (1 for College and 0 for High School), X4 = Interaction between GPA
and IQ, and X5 = Interaction between GPA and Level. The response is
starting salary after graduation (in thousands of dollars). Suppose we use
least squares to fit the model and get the following estimated coefficients:
β̂0 = 50, β̂1 = 20, β̂2 = 0.07, β̂3 = 35, β̂4 = 0.01, β̂5 = −10. Justify your
answer (ISLP 3.7, Q3)

(a) Which answer is correct, and why?

i. For a fixed value of IQ and GPA, high school graduates earn
more, on average, than college graduates.
ii. For a fixed value of IQ and GPA, college graduates earn more,
on average, than high school graduates.
iii. For a fixed value of IQ and GPA, high school graduates earn
more, on average, than college graduates provided that the GPA
is high enough.
iv. For a fixed value of IQ and GPA, college graduates earn more,
on average, than high school graduates provided that the GPA
is high enough.
(b) Predict the salary of a college graduate with an IQ of 110 and a GPA
of 4.0.
(c) True or false: Since the coefficient for the GPA/IQ interaction term
is very small, there is very little evidence of an interaction effect
between GPA and IQ.

2. Consider a linear regression problem with N samples where the input is in

D-dimensional space, and all output values are yi ϵ−1, +1. Which of the
following statements is correct?

(a) linear regression cannot “work” if N ≫ D

(b) linear regression cannot “work” if N ≪ D

1
(c) linear regression can be made to work perfectly if the data is linearly
separable
Solution: Answer c is correct.
3. Consider a data matrix XϵRD×N of N data points in D dimensions, and
target values yn for n = 1, , N . We perform least squares linear regression,
without the use of any regularizer.

(a) Write down the normal equations.

(b) Give the expression to predict a new unseen point xm . Do not assume
knowledge of w but compute it.

Solution: Refer to class notes.

Regularization
It is well-known that ridge regression tends to give similar coefficient values to
correlated variables, whereas the lasso may give quite different coefficient values
to correlated variables. We will now explore this property in a very simple
setting. (ISLP 6.6, Q5)
Suppose that n = 2, p = 2, x11 = x12 , x21 = x22 . Furthermore, suppose
that y1 + y2 = 0 and x11 + x21 = 0 and x12 + x22 = 0, so that the estimate for
the intercept in a least squares, ridge regression, or lasso model is zero: β̂0 = 0.

(a) Write out the ridge regression optimization problem in this setting.

(b) Argue that in this setting, the ridge coefficient estimates satisfy β̂1 = β̂2 .
(c) Write out the lasso optimization problem in this setting.

(d) Argue that in this setting, the lasso coefficients β̂1 and β̂2 are not unique—in
other words, there are many possible solutions to the optimization problem
in (c). Describe these solutions.

Logistic Regression
Suppose we collect data for a group of students in a statistics class with variables
X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a
logistic regression and produce estimated coefficients: β̂0 = −6, β̂1 = 0.05,
β̂2 = 1. (ISLP 4.8, Q6)

(a) Estimate the probability that a student who studies for 40 hours and has
an undergrad GPA of 3.5 gets an A in the class.
(b) How many hours would the student in part (a) need to study to have a
50% chance of getting an A in the class?

2
Naive Bayes
Suppose that we wish to predict whether a given stock will issue a dividend this
year (”Yes” or ”No”) based on X, last year’s percent profit. We examine a large
number of companies and discover that the mean value of X for companies that
issued a dividend was X̄ = 10, while the mean for those that didn’t was X̄ = 0.
In addition, the variance of X for these two sets of companies was σ 2 = 36.
Finally, 80% of companies issued dividends. Assuming that X follows a normal
distribution, predict the probability that a company will issue a dividend this
year given that its percentage profit was X = 4 last year. (ISLP 4.8, Q7)
Hint: Recall that the density function for a normal random variable is
1 2
/2σ 2
f (x) = √ e−(x−µ)
2πσ 2
You will need to use Bayes’ theorem.

Support Vector Machines (SVM)

Here we explore the maximal margin classifier on a toy data set. (ISLP 9.7, Q3)

(a) We are given n = 7 observations in p = 2 dimensions. For each observa-

tion, there is an associated class label.

Obs. X1 X2 Y
1 3 4 Red
2 2 2 Red
3 4 4 Red
4 1 4 Red
5 2 1 Blue
6 4 3 Blue
7 4 1 Blue

Sketch the observations.

(b) Sketch the optimal separating hyperplane, and provide the equation for
this hyperplane (in the form β0 + β1 X1 + β2 X2 = 0).
(c) Describe the classification rule for the maximal margin classifier. It should
be something along the lines of “Classify to Red if β0 + β1 X1 + β2 X2 > 0,
and classify to Blue otherwise.” Provide the values for β0 , β1 , and β2 .
(d) On your sketch, indicate the margin for the maximal margin hyperplane.
(e) Indicate the support vectors for the maximal margin classifier.
(f) Argue that a slight movement of the seventh observation would not affect
the maximal margin hyperplane.

3
(g) Sketch a hyperplane that is not the optimal separating hyperplane, and
provide the equation for this hyperplane.
(h) Draw an additional observation on the plot so that the two classes are no
longer separable by a hyperplane.

Decision Trees
Draw an example (of your own invention) of a partition of two dimensional
feature space that could result from recursive binary splitting. Your example
should contain at least six regions. Draw a decision tree corresponding to this
partition. Be sure to label all as pects of your figures, including the regions
R1,R2,..., the cutpoints t1,t2,..., and so forth. (ISLP 8.4, Q1)

Regression Trees
Provide a detailed explanation of the algorithm that is used to fit a regression
tree. (ISLP 8.4, Q6)

Bagging and Boosting

Suppose we produce ten bootstrapped samples from a data set containing red
and green classes. We then apply a classification tree to each bootstrapped
sample and, for a specific value of X, produce 10 estimates of P (Class is Red|X):

0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75.

There are two common ways to combine these results together into a single
class prediction. One is the majority vote approach. The second approach is
to classify based on the average probability. In this example, what is the final
classification under each of these two approaches? (ISLP 8.4, Q5)

Miscellaneous
1. Assume that you initialize all weights in a neural net to the same value
and you do the same for the bias terms. Which of the following statements
is correct.

(a) This is a good idea since it treats every edge equally.

(b) This is a bad idea

Solution: Answer b is correct since in this case every node on a

particular level will learn the same feature.

4
2. What is the complexity of the back-propagation algorithm for a neural
net with L layers and K nodes per layer?
Solution: O(K 2 L). The dominant term is a multiplication of a
vector with a K × K matrix and this has to be done in each of
the L layers
3. Which of the following are typical benefits of ensemble learning in its basic
form (that is, not AdaBoost and not with randomized decision bound-
aries), with all weak learners having the same learning algorithm and an
equal vote?

(a) Ensemble learning tends to reduce the bias of your classification al-
gorithm.
(b) Ensemble learning can be used to avoid overfitting.
(c) Ensemble learning tends to reduce the variance of your classification
algorithm.
(d) Ensemble learning can be used to avoid underfitting.

Solution: In ensemble learning, increasing the number of classi-

fiers reduces the variance of our model but generally has little
effect on the bias. Therefore, basic ensembling can be used to
avoid overfitting but would generally not be used to avoid un-
derfitting. (By contrast, AdaBoost reduces bias.)
4. Newton’s method always converges, but sometimes it is slower than Gra-
dient Descent.[T/F] ?
Solution: False, sometimes it diverges too
5. What is the loss function and regularizer of what is typically referred to
as “SVM”?
Solution: hinge loss + l2 regularization
6. Name one scenario when you want to use the Huber loss over L2 and vice
versa.
Solution: Huber Loss: if you have bad outliers. L2 loss: if you
want to estimate the mean labels for all inputs
7. For each algorithm name one assumption that it makes on the data:

• Naive Bayes
• Logistic Regression
• SVM
• Regression
• Decision Trees

Solution: refer to notes

5
8. Let x, yϵRd be two points ( e.g. sample or test points). Consider the func-
tion k(x, y) = xT rev(y) where rev(y) reverses the order of the components
in y. rev([123]) = [321]. Show that k cannot be a valid kernel function.
Solution:We have that k((−1, 1), (−1, 1)) = −2, but this is impos-
sible as, if k is a valid kernel, then there is some function ϕ such
that k(x, x) = ϕT (x)ϕ(x) ≥ 0

Common questions

A valid kernel must satisfy Mercer’s theorem, implying it should define the inner products in some feature space and inherently be positive semi-definite. The function k(x, y) = xT rev(y) reverses the order of y’s components, potentially obtaining negative values (k((-1, 1), (-1, 1)) = -2), which violates k(x, x) >= 0, thus invalidating it as a kernel .

Logistic regression requires sufficient samples relative to dimensions because each dimension represents a degree of freedom the model must understand to accurately distinguish classes. When samples are fewer than dimensions, the resulting solution space becomes under-constrained, potentially leading to overfitting and a hyperplane that lacks generalizability across unseen data points .

Huber loss combines the benefits of L2 loss (for small error values) with L1 for large errors, allowing a model to be less sensitive to outliers than L2 while being smoother compared to L1. It is preferred in cases with data containing outliers, providing a more robust fitting by trading off between sensitivity to outlier impacts and loss smooth estimation .

Using logistic regression coefficients ˆβ0 = -6, ˆβ1 = 0.05, and ˆβ2 = 1, the probability is calculated with the formula: P(Y=1|X) = 1 / (1 + exp(-ˆβ0 - ˆβ1*X1 - ˆβ2*X2)). For 40 hours and a GPA of 3.5, this becomes P(Y=1) = 1 / (1 + exp(-(-6 - 0.05 * 40 - 1 * 3.5))) = 1 / (1 + exp(-(-8.5))) = 0.9997 .

When weights in a neural network are initialized identically, all neurons in each layer learn identical features due to symmetric initial conditions. This prevents diversification of features learned across the network, causing gradient descent updates to be the same for similar nodes, thus limiting learning progress and feature extraction capabilities .

SVM focuses on support vectors, the samples closest to the decision boundary which define the hyperplane's position. By maximizing the margin between these vectors and the boundary, SVM ensures that small changes outside this margin do not affect the classification, thus enhancing robustness and helping prevent overfitting to minor variations in the training set .

Majority vote considers only the most frequent class prediction, potentially higher robustness to outliers. Averaging probability provides a nuanced view, accounting for certainty across models, which can improve predictive accuracy but might be swayed by biased or poorly calibrated probabilities. The choice affects bias-variance trade-offs, with majority vote often reducing variance significantly more than averaging .

Ridge regression shrinks coefficient estimates towards zero but doesn't set any to exactly zero, thus balancing between bias and variance. In the given simple setup with x11 = x12 and x21 = x22, ridge penalizes the coefficients equally, driving them to similar values (ˆβ1 = ˆβ2) due to the shared correlation structure, whereas lasso might zero one out selectively .

Naive Bayes assumes normal distribution as it simplifies the problem analytically and computationally, providing a straightforward calculation of likelihoods. The assumption aids in applying Bayes' theorem effectively to update forecasts with observed data, but it also implies we are assuming the data within each class has symmetric, bell-curve distribution, which can mislead if the real data has significant skew or kurtosis .

For high school graduates to earn more than college graduates with a high GPA, the interaction term between GPA and Level must offset the higher baseline salary that comes with Level being 1 for college graduates. Here, GPA interacts negatively with the level coefficient (ˆβ5 = -10), meaning as GPA increases, the advantage of having Level = 1 (college) diminishes, possibly even reversing at high GPA values, reflecting in the salary comparison between graduates .

Estimating Parameter p in ML Exam
No ratings yet
Estimating Parameter p in ML Exam
13 pages
Statistical Machine Learning Exam Guide
No ratings yet
Statistical Machine Learning Exam Guide
12 pages
Machine Learning Assignment Overview
No ratings yet
Machine Learning Assignment Overview
45 pages
Machine Learning Assignment Overview
No ratings yet
Machine Learning Assignment Overview
46 pages
SVM Classifier on Modified Iris Dataset
No ratings yet
SVM Classifier on Modified Iris Dataset
45 pages
Statistical Machine Learning Exam Guide
No ratings yet
Statistical Machine Learning Exam Guide
10 pages
CS 189 Spring 2019 Machine Learning Exam
No ratings yet
CS 189 Spring 2019 Machine Learning Exam
16 pages
CS 6140 Midterm 1 Study Guide
No ratings yet
CS 6140 Midterm 1 Study Guide
12 pages
Machine Learning Concepts and Tasks
No ratings yet
Machine Learning Concepts and Tasks
36 pages
Entropy Calculation for Class Split
No ratings yet
Entropy Calculation for Class Split
32 pages
Machine Learning Midterm Exam 2008
No ratings yet
Machine Learning Midterm Exam 2008
12 pages
2007 Midterm Exam: Machine Learning Concepts
No ratings yet
2007 Midterm Exam: Machine Learning Concepts
17 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
91 pages
Machine Learning Quiz Questions and Answers
No ratings yet
Machine Learning Quiz Questions and Answers
3 pages
Statistical Machine Learning Exam Guide
No ratings yet
Statistical Machine Learning Exam Guide
12 pages
Machine Learning Assignment Overview
No ratings yet
Machine Learning Assignment Overview
54 pages
EPFL Machine Learning Exam Guidelines
No ratings yet
EPFL Machine Learning Exam Guidelines
21 pages
Machine Learning Concepts and Tasks
No ratings yet
Machine Learning Concepts and Tasks
72 pages
CS 189 Midterm Exam Guidelines
No ratings yet
CS 189 Midterm Exam Guidelines
106 pages
CS 189 Spring 2014 Machine Learning Midterm Exam
No ratings yet
CS 189 Spring 2014 Machine Learning Midterm Exam
12 pages
Machine Learning Midterm Exam 2010
No ratings yet
Machine Learning Midterm Exam 2010
15 pages
LDA vs Logistic Regression in ML
No ratings yet
LDA vs Logistic Regression in ML
4 pages
10-701 Midterm Exam Fall 2007 Guide
No ratings yet
10-701 Midterm Exam Fall 2007 Guide
25 pages
CS 189 Machine Learning Final Exam
No ratings yet
CS 189 Machine Learning Final Exam
16 pages
KNN Classifier and Supervised Learning Concepts
No ratings yet
KNN Classifier and Supervised Learning Concepts
11 pages
Machine Learning Exam Overview
No ratings yet
Machine Learning Exam Overview
13 pages
SDSC 3006 ML I Assignment 3
No ratings yet
SDSC 3006 ML I Assignment 3
4 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
CS 675 Machine Learning Midterm Solutions
No ratings yet
CS 675 Machine Learning Midterm Solutions
10 pages
Machine Learning Exam Questions - Monsoon 2023
No ratings yet
Machine Learning Exam Questions - Monsoon 2023
7 pages
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
92 pages
CS189 Midterm Cheat Sheet Guidelines
No ratings yet
CS189 Midterm Cheat Sheet Guidelines
9 pages
IIT Kharagpur Machine Learning Exam 2023
No ratings yet
IIT Kharagpur Machine Learning Exam 2023
9 pages
Machine Learning 10-701 Exam Practice Questions
No ratings yet
Machine Learning 10-701 Exam Practice Questions
14 pages
Exam Questions on Regression and Classifiers
No ratings yet
Exam Questions on Regression and Classifiers
208 pages
NPTEL Machine Learning Assignment Week 0
No ratings yet
NPTEL Machine Learning Assignment Week 0
56 pages
Data Analytics for Operations Research Insights
No ratings yet
Data Analytics for Operations Research Insights
5 pages
Statistical Learning: Key Concepts Explained
No ratings yet
Statistical Learning: Key Concepts Explained
8 pages
Linear Regression Fundamentals Explained
No ratings yet
Linear Regression Fundamentals Explained
6 pages
CSE/ECE 343/543 Mid-Sem Exam 2018
No ratings yet
CSE/ECE 343/543 Mid-Sem Exam 2018
7 pages
IIT Madras CS2007 Machine Learning Quiz 2 Solutions
No ratings yet
IIT Madras CS2007 Machine Learning Quiz 2 Solutions
19 pages
Applied Machine Learning Mid-Semester Solutions
No ratings yet
Applied Machine Learning Mid-Semester Solutions
6 pages
Regression and Classification in ML
No ratings yet
Regression and Classification in ML
6 pages
Machine Learning Midterm Exam Overview
No ratings yet
Machine Learning Midterm Exam Overview
17 pages
Machine Learning Midterm Exam Solutions
100% (1)
Machine Learning Midterm Exam Solutions
17 pages
Regression Algorithms in Machine Learning
No ratings yet
Regression Algorithms in Machine Learning
3 pages
10-701 Machine Learning Midterm Exam
No ratings yet
10-701 Machine Learning Midterm Exam
16 pages
ME-781 Quiz 2: Regression & Classification Analysis
No ratings yet
ME-781 Quiz 2: Regression & Classification Analysis
8 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
12 pages
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
80 pages
Machine Learning Midterm Solutions
No ratings yet
Machine Learning Midterm Solutions
13 pages
SDSC 3006 ML Assignment 1 Overview
No ratings yet
SDSC 3006 ML Assignment 1 Overview
3 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
6 pages
Machine Learning Exam Solutions 2019
No ratings yet
Machine Learning Exam Solutions 2019
4 pages
Linear Regression and K-NN Analysis
No ratings yet
Linear Regression and K-NN Analysis
4 pages
Machine Learning Homework 3: Logistic Regression
No ratings yet
Machine Learning Homework 3: Logistic Regression
7 pages
Machine Learning: Linear Regression Basics
No ratings yet
Machine Learning: Linear Regression Basics
17 pages
Vector Search Candidates in ML
100% (1)
Vector Search Candidates in ML
6 pages
Analysis of Kipling's Poem "If"
No ratings yet
Analysis of Kipling's Poem "If"
3 pages
Understanding Social Groups in Sociology
88% (8)
Understanding Social Groups in Sociology
14 pages
Grade 8 Math: Factoring Polynomials Lesson
No ratings yet
Grade 8 Math: Factoring Polynomials Lesson
6 pages
Professional Skills of Kumud Gupta
No ratings yet
Professional Skills of Kumud Gupta
1 page
Emmanuel Nursing School Application Guide
No ratings yet
Emmanuel Nursing School Application Guide
31 pages
Conflict Management in Primary Schools
No ratings yet
Conflict Management in Primary Schools
17 pages
Winter Internship Application for IoT Research
No ratings yet
Winter Internship Application for IoT Research
1 page
Taxable Income Calculation for Trusts
No ratings yet
Taxable Income Calculation for Trusts
7 pages
Aero-Acoustic Noise Simulation with OpenFOAM
No ratings yet
Aero-Acoustic Noise Simulation with OpenFOAM
19 pages
NLP Course Overview and Objectives
No ratings yet
NLP Course Overview and Objectives
57 pages
Equality and Inclusion in Early Years Practice
No ratings yet
Equality and Inclusion in Early Years Practice
6 pages
Excel in Mathematics Class 5 Guide
No ratings yet
Excel in Mathematics Class 5 Guide
290 pages
Free Christmas Colouring Book
No ratings yet
Free Christmas Colouring Book
26 pages
Rashad Leonard: Project Management Profile
No ratings yet
Rashad Leonard: Project Management Profile
3 pages
Changes in Medicine 1848-1948
100% (2)
Changes in Medicine 1848-1948
12 pages
Bihar Driving Licence for Sandeep Yadav
No ratings yet
Bihar Driving Licence for Sandeep Yadav
2 pages
Pelayanan di Gereja: Tugas dan Teknik
No ratings yet
Pelayanan di Gereja: Tugas dan Teknik
15 pages
MRTT Technical Training Price List 2023-2024
No ratings yet
MRTT Technical Training Price List 2023-2024
4 pages
Grade 7 English Lesson Plan on Conjunctions
No ratings yet
Grade 7 English Lesson Plan on Conjunctions
4 pages
Adult ADHD: Diagnosis and Management
80% (5)
Adult ADHD: Diagnosis and Management
7 pages
Singing Classroom Workshops for Teachers
No ratings yet
Singing Classroom Workshops for Teachers
2 pages
Histology of Lymphoid Organs
No ratings yet
Histology of Lymphoid Organs
82 pages
Nahari Et Al. (2019) - Languaje of Lies Urgent Issues and Prospects in Verbal Lie Detection Research
No ratings yet
Nahari Et Al. (2019) - Languaje of Lies Urgent Issues and Prospects in Verbal Lie Detection Research
24 pages
Angle Relationships and Measurements
No ratings yet
Angle Relationships and Measurements
2 pages
Batac City School Partnership Report 2021
No ratings yet
Batac City School Partnership Report 2021
51 pages
TQM Implementation in Organizations Study
No ratings yet
TQM Implementation in Organizations Study
12 pages
Teacher as Scholar and Role Model
No ratings yet
Teacher as Scholar and Role Model
3 pages
Accounting Information Systems Course
100% (2)
Accounting Information Systems Course
6 pages
From Smoke Signals to 5G Evolution
No ratings yet
From Smoke Signals to 5G Evolution
10 pages
Essential Components of a Research Paper
100% (1)
Essential Components of a Research Paper
11 pages

ML Practice Questions and Solutions

Uploaded by

ML Practice Questions and Solutions

Uploaded by

Practice Questions ML

(a) Which answer is correct, and why?

2. Consider a linear regression problem with N samples where the input is in

(a) linear regression cannot “work” if N ≫ D

(a) Write down the normal equations.

Solution: Refer to class notes.

Support Vector Machines (SVM)

(a) We are given n = 7 observations in p = 2 dimensions. For each observa-

Sketch the observations.

Bagging and Boosting

(a) This is a good idea since it treats every edge equally.

Solution: Answer b is correct since in this case every node on a

Solution: In ensemble learning, increasing the number of classi-

Solution: refer to notes

Common questions

Explain why a kernel function like k(x, y) = xT rev(y) cannot be validly used in machine learning models.

Analyze why logistic regression needs at least as many samples as dimensions in the context of feature space geometry for accurate classification.

How does the Huber loss function offer a compromise between L2 loss and L1 loss, and in what scenarios would it be preferred?

How does logistic regression calculate the probability of a student receiving an 'A' given their study hours and undergraduate GPA, and what is the specific probability for a student with 40 study hours and a GPA of 3.5?

Consider the initialized state where all weights in a neural network are the same. Why might this prevent effective learning progress across nodes?

Describe how the support vector machine (SVM) uses support vectors to maximize the margin between classes and how this affects the classifier's robustness.

Discuss the conceptual differences between using majority vote versus averaging probability in ensemble learning and their respective impacts on prediction outcomes.

In ridge regression, why do the coefficients tend to be similar for correlated variables, and how is this reflected in a simple setting with two measurements?

Why does the Naive Bayes algorithm use a normal distribution assumption in the dividend prediction scenario for companies, and what implications does this have?

Given a dataset with predictors including GPA, IQ, and an interaction between GPA and level, why would high school graduates start with a higher salary than college graduates if the GPA is high enough?

You might also like