0% found this document useful (0 votes)
666 views4 pages

CST383 B

This document outlines the examination structure for the Fifth Semester B.Tech Degree (Minor) Examination in Concepts in Machine Learning at APJ Abdul Kalam Technological University. It includes details on the course code, maximum marks, duration, and a breakdown of questions in two parts: Part A with short answer questions and Part B with detailed questions from various modules. The topics covered include overfitting, learning methods, SVM classifiers, decision trees, reinforcement learning, gradient descent, clustering algorithms, and performance metrics.

Uploaded by

Varada I B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
666 views4 pages

CST383 B

This document outlines the examination structure for the Fifth Semester B.Tech Degree (Minor) Examination in Concepts in Machine Learning at APJ Abdul Kalam Technological University. It includes details on the course code, maximum marks, duration, and a breakdown of questions in two parts: Part A with short answer questions and Part B with detailed questions from various modules. The topics covered include overfitting, learning methods, SVM classifiers, decision trees, reinforcement learning, gradient descent, clustering algorithms, and performance metrics.

Uploaded by

Varada I B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

M 1100CST383122102 Pages: 4

Reg No.:_______________ Name:__________________________


APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
Fifth Semester [Link] Degree (Minor) Examination December 2022 (2020 Admission)

Course Code: CST 383


Course Name: CONCEPTS IN MACHINE LEARNING

Max. Marks: 100 Duration: 3 Hours

PART A
(Answer all questions; each question carries 3 marks) Marks

1 Explain the concept of overfitting and underfitting with a suitable diagram. 3


2 Identify and describe the suitable learning method in each of the following 3
applications.
1. Grouping students into different academic groups
2. Classifying mails into SPAM and not-SPAM
3 Calculate the output of a 3-input neuron with X = [0, 1.0, 0.5], W = [0.9, 0.2, 0.3] 3
and b = - 0.04. Use Sigmoid activation function.
4 What is soft margin SVM classifier? 3
5 Explain Decision Trees with an example. 3
6 List and explain any three applications of Machine Learning. 3
7 Explain Bootstrapping method. 3
8 Illustrate Confusion matrix with an example. 3
9 What is maximum-margin hyper plane? 3
10 Compare and contrast Linear Regression and Logistic Regression. 3
PART B
(Answer one full question from each module, each question carries 14 marks)

Module -1
11 a) Explain Unsupervised Learning with examples. 7
b) A football club which was recently formed won all the two matches it has played. 7
Experts opine that it has only 30% probability of winning against a strong club.
Explain the difference between Maximum Likelihood Estimation and Maximum a
Posteriori Approximation using this example.
12 a) Explain reinforcement learning with an example. 7

Page 1 of 4
1100CST383122102

b) Explain Maximum Likelihood Estimation for Bernoulli’s probability distribution 7


function.
Module -2
13 a) Marks obtained by a group of students in degree and post graduate degree courses 10
are shown in the table below. Using matrix method, calculate the regression
coefficients.
Degree Post
Marks Graduate
Degree marks
70 82
60 71
57 63
45 49
90 92
b) Predict the PG marks of a student whose Degree Mark is 80. If the actual mark is 89, 4
calculate the error.
14 a) Explain in detail the working of gradient descent algorithm. 6
b) Using ID3 algorithm, find out the 1st splitting attribute for the data given below. 8

Module -3
15 a) Explain the kernels used in SVM. 8

Page 2 of 4
1100CST383122102

b) Write notes on following activation functions: 6


1. Tanh
2. ReLU
3. Sigmoid
4. Leaky ReLU
5. Softmax
16 a) Show how SVM classifies a set of data points. 7
b) Explain Back Propagation Algorithm with a suitable example. 7
Module -4
17 a) Use K-means clustering algorithm to divide the following data points into 2 clusters. 7
x y
1 1
2 1
2 3
3 2
4 3
5 5
b) Explain Divisive clustering with an example. 7
18 a) Given the dataset {a, b, c, d, e} and the following distance matrix, construct a 7
dendrogram by complete linkage hierarchical clustering using the agglomerative
method.

b) Explain the working of DBSAN algorithm. 7


Module -5
19 a) Suppose a computer program for recognizing dogs in photographs identifies eight 7
dogs in a picture containing 12 dogs and some cats. Of the eight dogs identified, five
actually are dogs while the rest are cats.

Page 3 of 4
1100CST383122102

Compute
1. Precision
2. Recall
3. Accuracy
4. F measure
5. Error rate
6. Sensitivity
7. Specificity
b) Explain ROC curve and AUC for performance analysis. 7
20 a) Construct ROC curve for the given data. 7

b) Explain Bias-Variance trade-off with a neat diagram. 7


***

Page 4 of 4

Common questions

Powered by AI

Overfitting occurs when a machine learning model captures noise and details from the training data to the extent that it negatively impacts the model's performance on new data. This happens when the model is excessively complex, having learned the idiosyncrasies of the training examples rather than the underlying data pattern. Underfitting, conversely, occurs when a model is too simple, capturing only the overall trend and failing to grasp the underlying relationship in the data well enough. This typically results in high bias and low variance, where the model performs poorly on both the training and test sets .

Hierarchical Clustering creates a tree (dendrogram) representing the nested grouping of samples based on distance or similarity metrics, without requiring a pre-defined number of clusters. It can be divisive (top-down) or agglomerative (bottom-up). It is useful for visualizing inherent data structures, like phylogenetic trees in biological taxonomy. In contrast, K-means Clustering needs the user to specify the number (K) of clusters before clustering begins. It assigns each data point to the nearest cluster centroid, ideal for partitioning flat geometry groups like in market segmentation or image compression. K-means is computationally faster but unsuitable for determining the hierarchical structure present in data .

The matrix method calculates regression coefficients by representing data as matrices, using formulas derived from the least squares method. Given sample data with X (input values) and y (output values), arrange them in matrix form. Calculate the weight matrix using the formula W = (X^T * X)^{-1} * X^T * y, where W contains the regression coefficients. Applying this in practice, consider students' degree marks and post-graduate marks: create matrices with marks and solve for W to obtain coefficients that define the linear relationship between degree and post-grad marks, predicting performance in one based on the other .

Unsupervised learning is used when the output labels are not known or available. This type of learning is ideal for exploratory data analysis to identify patterns or groupings within datasets. Examples include grouping students into different academic groups through clustering (e.g., K-means or hierarchical clustering) based on performance metrics or behavior without predefined categories. Another example is customer segmentation in marketing, where clusters of customers are created based on purchase history, demographics, and behavior, allowing businesses to tailor strategies specific to each segment .

K-means clustering partitions data into K clusters, where each data point belongs to the cluster with the nearest mean. The algorithm involves randomly initializing K centroids, assigning each point to the nearest centroid, and recalculating the centroids based on the points assigned to each cluster. This process repeats until convergence, typically when assignments no longer change. For example, categorizing data points (x, y): (1,1), (2,1), (2,3), (3,2), (4,3), (5,5), (7) into two clusters would result in two separate groups around two centroids, effectively dividing the data into distinct clusters based on proximity .

Decision Trees are interpretable models that split data based on feature values resulting in a tree-like model of decisions. Unlike SVM, which focuses on finding the optimal hyperplane to separate classes and Neural Networks that develop complex hierarchical feature representations through layers, Decision Trees offer simplicity and transparency in their decision-making process. However, they tend to overfit, particularly with deep trees or when the data is noisy. Unlike more computationally intensive models like SVM or Neural Networks, Decision Trees are quick to train and perform well on smaller datasets or as a basis for ensemble methods such as Random Forests .

Linear Regression and Logistic Regression are both used for predictive modeling but serve different purposes. Linear Regression is used for predicting continuous outcomes, such as estimating a person's weight from height. It assumes a linear relationship between input and output variables. Logistic Regression, on the other hand, is used for binary classification problems, predicting discrete outcomes like spam vs. non-spam emails. It uses the logistic function to model probabilities between 0 and 1, enabling it to classify data into categories. A limitation of Linear Regression is its assumption of a linear relationship, which may not hold for complex datasets. Logistic Regression's limitation is its suitability for only binary or binary-like outcomes, though extensions like multinomial logistic regression can allow for multiple categories .

A confusion matrix is a tool for assessing the performance of a classification model by showing the count of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions. For instance, in a binary classification task of identifying emails as spam (positive) or not spam (negative), if out of 100 emails, 30 are spam, and 70 are not, a confusion matrix might show 25 TP (correctly identified spam), 5 FN (misclassified spam), 60 TN (correctly identified non-spam), and 10 FP (misclassified non-spam as spam). These numbers allow for calculation of metrics such as accuracy, precision, recall, and F1 score to comprehensively evaluate model performance .

Backpropagation is a supervised learning algorithm used for training neural networks, involving two main steps: the feedforward process to compute output and the backward pass for error correction. Initially, inputs are passed through the network to compute the output and loss function. During backpropagation, the error is calculated, and gradients of the loss concerning each weight are computed using chain rule, propagating backward from the output to input layers. These gradients are used to minimize loss by updating weights, typically using an optimization technique like gradient descent. The iterative process updates weights in each layer to reduce prediction error and fit the model to the training data efficiently .

A soft margin in SVM allows for some misclassification of data points, providing flexibility to the boundary decision, which is crucial for achieving a balance between maximizing the margin and minimizing classification errors in non-linearly separable and noisy data. This approach improves the generalization of the model by permitting a few datapoints to be on the wrong side of the margin or hyperplane. It helps to avoid overfitting by not forcing a hard decision boundary in datasets with overlap or noise in feature space, ultimately leading to better performance on test data .

You might also like