Important Questions
Sr. No Questions Marks
1. Define Data Mining. 2M
2. Why is data mining required? 2M
3. Enlist the applications of Data Mining. 2M
4. What is Cluster Analysis 2M
5. What is Outlier Analysis 2M
6. Define data, Information, Knowledge 2M
7. Define Correlation, Covariance 2M
8. Compute the similarity between Chicken and Bird using SMC coefficient for 2M
the given data.
Chicken={0,1,1,0,1,0,0,1,1,1}
Bird= {0,1,1,0,0,0,0,1,0,1}
9. Define Time Series Data 2M
10. Define the following (I)Object. (II) Attribute. 2M
11. List data reduction techniques in data mining. 2M
12. Define Ordinal data Attribute. 2M
13. Enlist types of Datasets 2M
14. Define Qualitative Data and Quantitative data 2M
15. Define Data Redundancy 2M
16. Define Data scrubbing, Data auditing 2M
17. Which tools are used for Data Mitigation 2M
18. Explain Ordered Data 2M
19. Explain about knowledge discovery in database process with a neat diagram. 5M
20. Discuss Different Data Mining Function in detail 5M
21. Explain the Multidimensional view of data mining. 5M
22. Explain how data mining works. 5M
23. Explain role of Data Mining in Business Intelligence 5M
24. Illustrate 5 applications of data mining that has been used to solve specific 5 M
Page 1 of 4
problems
25. List and explain the goals of data mining. 5M
26. Discuss about confluence of multiple disciplines in Data Mining. 5M
27. Illustrate the typical view in ML and statistics with a neat diagram 5M
28. Illustrate 5 applications of data mining that have been used to solve specific 5 M
problems
29. How to search for knowledge and interesting patterns in data? 5M
30. Discuss the major issues of Data mining 5M
31. Compare quantitative data and qualitative data. 5M
32. Explain Attribute subset selection methods with an example 5M
33. How to perform correlation analysis between categorical Variable using chi 5 M
square test.
34. A survey on car has had conducted in 2011 and determined that 60% of car 5 M
owners have only one car, 28% have two cars, and 12% have three or more.
Supposing that you have decided to conduct your own survey and have
collected the data below, determine whether your data supports the results of
the study. Use a significance level of 0.05. Also, given that, out of 129 car
owners, 73 had one car and 38 had two cars. df = 2 is 5.99. Apply the chi
square test to get nominal data.
35. Suppose two stocks A and B have the following values in one week: (2, 5), 5 M
(3, 8), (5, 10), (4, 11), (6, 14). If the stocks are affected by the same industry
trends, will their prices rise or fall together using covariance?
36. What is dimensionality Reduction. Explain methods used for reduction the 5 M
dimensionality
37. Illustrate why data preprocessing is a major step in data mining. 5M
38. Consider the following salaries: 5M
25, 30, 28, 55, 60, 42, 70, 75, 50, 48
Apply the binning technique to remove noisy data.
39. Explain about quality measures of data preprocessing. 5M
40. Illustrate similarity, dissimilarity and their properties 5M
41. Define noisy data. Explain how noisy data can be handled in data mining 5M
42. Calculate the cosine similarity distance between d1 and d2 vectors. 5M
d1 3 2 0 5 0 0 0 2 0 0
Page 2 of 4
d2 1 0 0 0 0 0 0 1 0 2
43. Illustrate why data preprocessing is a major step in data mining. 5M
44. Describe quality measures of data preprocessing. 5M
45. List and explain the major task in data preprocessing. 5M
46. Normalize the following group of data: 200 , 300 , 400 , 600, 1000 using 5M
i. Min-Max
ii. Z-Score
iii. Decimal Scaling
47. Explain Data Cube Aggregation 5M
48. Below dataset describes the rate of economic growth (ai) and the rate of return 5M
on the S&P 500(bi). Using the covariance formula, determine whether
economic growth and S&P 500 returns have positive or negative relationship?
Economic Growth % S&P 500 Returns %
(ai) (bi)
2.1 8
2.5 12
4.0 14
3.6 10
49. Explain Data Discretization in detail, Supervised and Unsupervised 5M
Discretization
50. Describe Binarization with example 5M
51. Explain Linear relationship between variables 5M
52. Describe Similarity And Dissimilarity in details 5M
53. Apply entropy-based discretization on the given set S= (16, n), (0, y), (4, y), 10M
(12, y), (16, n), (26, n), (18, y), (24, n), (28, n). If S has partitioned into 2
intervals S1 & S2 with 2 possible split points 14 & 21. Find the Best split
point.
54. Calculate the minkowski distance and Euclidean distance between the 10M
following pairs of points to determine their dissimilarity:
Point X Y
p1 0 2
p2 2 0
p3 3 1
p4 5 1
55. Explain data Reduction methods in Detail 10M
Calculate the entropy discretization for the following data set. If S has 10M
partitioned into 2 intervals S1 & S2 with 2 possible split points 14 & 17. Find
Page 3 of 4
the Best split point.
0 4 12 16 16 18 24 26 28
Y Y Y N N Y N N N
Page 4 of 4