100% found this document useful (2 votes)

2K views4 pages

Data Mining Midterm Exam 2021/2022

The document is a 35 question midterm exam for an undergraduate Data Mining course. It covers key concepts in data mining such as the data mining process, different data types, data preprocessing techniques including cleaning, integration and transformation. Questions also assess understanding of data mining tasks like classification, clustering and regression. Metrics for evaluating similarity between data objects like Euclidean, Manhattan and Minkowski distances are tested. The exam evaluates students' comprehension of fundamental data mining concepts and techniques.

Uploaded by

mostfamhmd12389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views4 pages

Data Mining Midterm Exam 2021/2022

Uploaded by

mostfamhmd12389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction
Questions 1-8
Questions 9-22
Questions 23-40

Suez Canal University

Faculty of Computers & Informatics

Department of Information Systems

Answer _ Midterm Exam

First Term: 2021 /2022
Program: Information System Course: Data Mining Course Code: IS253
Level: 4 Lecturer: Dr. Osama Farouk Date: 25 / 11 /2021
Total pages: 4 Total marks: 40 Time allowed: 1h.
Answer the following questions:
Question (Select the correct answer) (40 marks)
1. Which of the following is an essential process in which the intelligent methods are
applied to extract data patterns?
A. Warehousing B. Data Mining c. Text Mining D. Data Selection
2. Data Matrix, Document Data and Transaction Data are examples of which type of
data set?
A. Graph B. Record C. Numerical D. Ordered
3. For what purpose, the analysis tools pre-compute the summaries of the huge amount
of data?
A. In order to maintain consistency B. For authentication
C. For data access D. To obtain the queries response
4. What are the functions of Data Mining?
A. Association and correctional analysis classification
B. Prediction and characterization
C. Cluster analysis and Evolution analysis
D. All of the above
5. Which of the following statements is correct about data mining?
A. It can be referred to as the procedure of mining knowledge from data
B. Data mining can be defined as the procedure of extracting information from a set of the data
C. The procedure of data mining also involves several other processes like data cleaning,
data transformation, and data integration
D. All of the above
6. Which of the following correctly refers the data selection?
A. A subject-oriented integrated time-variant non-volatile collection of data in support of
management
B. The actual discovery phase of a knowledge discovery process
C. The stage of selecting the right data for a knowledge discovery (KDD) process
D. All of the above
7. Which one of the following can be considered as the correct application of the data
mining?
A. Fraud detection B. Corporate Analysis & Risk management
C. Management and market analysis D. All of the above
8. Which of the following used as the first step in the knowledge discovery process?
A. Data selection B. Data cleaning C. Data transformation D. Data integration

___________________________________________________________________________
Examination committee: Prof. Prof. Prof. Prof.
1
Suez Canal University
Faculty of Computers & Informatics
Department of Information Systems

9. Which of the following terms is used as a synonym for data mining?

A. knowledge discovery in databases B. data warehousing
C. regression analysis D. parallel processing in databases
10.……….. is the out put of KDD
A. Query B. Useful Information C. Data D. Information
11. Data mining is ……………………..
A. time variant non-volatile collection of data
B. The actual discovery phase of a knowledge
C. The stage of selecting the right data
D. None of these
[Link] of the following refers to the steps of the knowledge discovery process, in which
the several data sources are combined?
A. Data selection B. Data cleaning C. Data transformation D. Data integration
13. Data objects with characteristics that are considerably different than most of the
other data objects in the data set refers to ……………… .
A. Noise B. Outliers C. Missing values D. Duplicate data
14. Data warehouse is……………….
A. The actual discovery phase of a knowledge discovery process
B. The stage of selecting the right data for a KDD process
C. A subject-oriented integrated time-variant non-volatile collection of data in support of
management
D. None of these
15. Spatial Data, Temporal Data, Sequential Data, and Genetic Sequence are examples of
which type of data set?
A. Graph B. Record C. Numerical D. Ordered
16. An invalid signal overlapping valid data refers to ………….
A. Noise B. Outliers C. Missing values D. Duplicate data
17. Data objects with characteristics that are considerably different than most of the
other data objects in the data set refers to ………… .
A. Noise B. Outliers C. Missing values D. Duplicate data
[Link] of the following is NOT one of the processes in Data Preprocessing?
A. Feature creation B. Sampling C. Discriminization D. Aggregation
19. Combining two or more attributes (or objects) into a single attribute (or object)
explains about ………… .
A. Binarization B. Aggregation
C. Dimensionality reduction D. Attribute transformation
20. All of the following are the purposes of aggregation EXCEPT:
A. More “stable” data B. Change of Scale C. Remove noise D. Data Reduction
21. The key principle for effective sampling is ………………. .
A. a sample will work almost as well as using the entire set if the sample is representative.
B. the appropriate ratio acceptable as sample.
C. using the correct tools and methodology to arrive at a appropriate value for sample.
D. the efficient and effective way to represent the whole population.
___________________________________________________________________________
Examination committee: Prof. Prof. Prof. Prof.
2
Suez Canal University
Faculty of Computers & Informatics
Department of Information Systems

22.………. is the main technique employed for data selection.

A. Change of scale B. Preprocessing C. Sampling D. Discretization
23. There is an equal probability of selecting any particular item refers to ……………..
A. sampling with replacement B. simple random sampling
C. stratified sampling D. sampling without replacement
24. Split the data into several partitions; draw random samples from each partition
refers to ………… .
A. random sampling B. sampling with replacement
C. sampling without replacement D. stratified sampling
25.....refers to the mapping or classification of a class with some predefined group or class
A. Data Discrimination B. Data Characterization
C. Data Definition D. Data Visualization
26. A college professor wishes to reach a certain level of savings before her retirement.
This is related to which data mining task?
A. Clustering B. Regression C. Association D. Classification
27. Suppose that the data for analysis includes the attribute age. The age values for the
data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25,
30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Five-number summary of a distribution,
Minimum, Q1, Median, Q3, Maximum are according to their order
A. 13,20,25,35,70 B. 13,35, 25, 20, 70 C.13,25,20,30,70 D. 0 ,1 ,1 ,1 ,0
28. Find cosine similarity between documents 1 and 2.
d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
A. 0.74 B. 0.94 C. 0.84 D. 6.15
29. …………….. is combines data from multiple sources into a coherent store
A. Data Discretization B. Data Cleaning C. Data Reduction D. Data integration
30. Suppose two stocks A and B have the following values in one week
(2, 15), (3, 18), (5, 10), (4, 11), (6, 14), If the stocks are affected by the same industry
trends, will their prices ……………..
A. rise together B. fall together C. stand together D. Data integration
31. Suppose S=[2,1,4,4], by using wavelet decomposition, the detail coefficients are:
A. [0, -1, -1, 0] B. [2.75 , -1.25 , 0.5 , 0] C. [ 0.5 , 0, 2.75 , -1.25] D.[-1,0,0,-1]

 Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
32. Compute the Euclidean distance between the two objects.
A. 6 B. 6.7082 C. 11 D. 6.1534
33. Compute the Manhattan distance between the two objects.
A. 6 B. 6.7082 C. 11 D. 6.1534
34. Compute the Minkowski distance between the two objects, using q = 3.
A. 6 B. 6.7082 C. 11 D. 6.1534
35. Compute the supremum distance between the two objects.
A. 6 B. 6.7082 C. 11 D. 6.1534

___________________________________________________________________________
Examination committee: Prof. Prof. Prof. Prof.
3
Suez Canal University
Faculty of Computers & Informatics
Department of Information Systems

 Use these methods to normalize the following group of data: (200, 300 , 400, 600 , 1000)
36. Min-Max normalization by setting min= 0 and max=1
A. ( 0 , 0.125, 0.25 , 0.5 , 1) B. (0.25 , 0.5 , 1 , 0 , 0.125)
C. (-1.06 , -0.7, -0.35 , 0.35 , 1.78) D. (-0.35 , 0.35 , 1.78, -1.06 , -0.7)
37. z-score normalization
A. ( 0 , 0.125, 0.25 , 0.5 , 1) B. (0.25 , 0.5 , 1 , 0 , 0.125)
C. (-1.06 , -0.7, -0.35 , 0.35 , 1.78) D. (-0.35 , 0.35 , 1.78, -1.06 , -0.7)

 Suppose a group of 9 sales price records has been sorted as follows:

28 , 25 , 15 , 21 , 8 , 21 , 24, 4, 34
38. Partition into equal-frequency (equi-depth) bins:
Bin 1 4 X 15
Bin 2 21 21 24 X= ……
Bin 3 25 28 34
A. 4 B. 8 C. 34 D. 22
39. Smoothing by bin means:
Bin 1 9 9 9
Bin 2 X X X X= ……
Bin 3 29 29 29
A. 4 B. 8 C. 34 D. 22
40. Smoothing by bin boundaries:
Bin 1 4 4 15
Bin 2 21 21 24 X= ……
Bin 3 25 25 X
A. 4 B. 8 C. 34 D. 22

My best wishes
Dr. Osama Farouk

___________________________________________________________________________
Examination committee: Prof. Prof. Prof. Prof.
4

Common questions

Min-max normalization rescales data to a specific range, usually 0 to 1, making it suitable for scaling features into a uniform range for certain machine learning algorithms. Z-score normalization, on the other hand, standardizes data based on mean and standard deviation, centering the data at 0 with a standard deviation of 1, useful when data features have different units or scales. Min-max is preferable in constrained scales, while z-score is better for normally distributed data .

Data preprocessing steps such as feature creation and discrimination are essential in the data mining pipeline as they help enhance data quality. Feature creation constructs new attributes that can better capture underlying data patterns, while discrimination identifies and distinguishes important data attributes, making the subsequent mining process more effective and accurate .

Data discretization can greatly enhance the effectiveness of classification algorithms by transforming continuous attributes into categorical ones, simplifying models and reducing complexity. This process can help algorithms such as decision trees to find patterns and relationships more easily, improving interpretability and computational efficiency, yet it may reduce information if not carried out carefully .

Equi-depth binning partitions data into bins containing an equal number of data points, which helps reduce data noise and prepare it for analysis by organizing it into more homogeneous groups. This method differs from equal-width binning, which divides data into intervals of the same size but may contain varying numbers of data points, possibly leading to unequal representation .

Sampling improves data mining processes by reducing the volume of data to a manageable level, thus accelerating processing times and computational efficiency. The key principle for effective sampling is that a well-chosen sample will work almost as well as the entire dataset if it is representative, ensuring the sample reflects the population structure without introducing bias .

Data integration is the crucial step in the knowledge discovery process for combining data from multiple sources. It is important because it creates a unified view necessary for accurate analysis, ensuring that all relevant data is considered, thus enhancing the reliability of subsequent data mining tasks .

The main functions of data mining include association and correctional analysis, prediction and characterization, and cluster analysis and evolution analysis. Association and correctional analysis involve finding relationships between variables. Prediction and characterization entail identifying likely outcomes based on current data, such as forecasting trends. Cluster analysis groups similar data points together, while evolution analysis observes data over time to detect patterns .

Outliers are data objects with characteristics significantly different from others in a dataset, whereas noise refers to invalid signals that may overlap with valid data. Outliers can indicate significant but rare phenomena, thus can be valuable for analysis if identified correctly, while noise typically distorts the data and can lead to incorrect conclusions if not properly managed .

Cosine similarity between two documents is calculated by taking the dot product of the document vectors divided by the product of their magnitudes. In text mining, its significance lies in determining how similar two documents are in terms of word usage, which is particularly useful for document clustering and information retrieval tasks .

Detecting and treating missing values are crucial in data mining as they ensure dataset completeness and integrity, which significantly affects the analysis outcome. Improper handling of missing values can lead to biased results, whereas correct imputation or removal ensures more accurate and reliable model predictions by addressing data gaps .

Data Mining MCQs with Answers Set 1
No ratings yet
Data Mining MCQs with Answers Set 1
11 pages
Data Mining MCQs for Exam Preparation
No ratings yet
Data Mining MCQs for Exam Preparation
7 pages
MCQ
100% (7)
MCQ
37 pages
Data Mining MCQs and Answers
No ratings yet
Data Mining MCQs and Answers
11 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
Business Intelligence MCQs for MBA
100% (1)
Business Intelligence MCQs for MBA
8 pages
Bayesian Network Exam Questions
100% (1)
Bayesian Network Exam Questions
7 pages
MCQs on Data Warehousing Concepts
100% (1)
MCQs on Data Warehousing Concepts
16 pages
DSS Exam Questions and Answers Guide
No ratings yet
DSS Exam Questions and Answers Guide
4 pages
Business Intelligence Decision-Making Insights
33% (3)
Business Intelligence Decision-Making Insights
20 pages
Big Data Analytics MCQs and Answers
100% (2)
Big Data Analytics MCQs and Answers
22 pages
Big Data Analytics MCQs: 7 V's Explained
No ratings yet
Big Data Analytics MCQs: 7 V's Explained
7 pages
Data Mining MCQs for M.Sc IT Students
50% (2)
Data Mining MCQs for M.Sc IT Students
34 pages
Data Mining MCQs for BCA Students
50% (2)
Data Mining MCQs for BCA Students
34 pages
Data Mining MCQs and Interview Questions
No ratings yet
Data Mining MCQs and Interview Questions
7 pages
Database Technologies MCQ Overview
No ratings yet
Database Technologies MCQ Overview
96 pages
Data Warehouse MCQs and Concepts
100% (1)
Data Warehouse MCQs and Concepts
5 pages
Apriori Algorithm and Market Basket Analysis
No ratings yet
Apriori Algorithm and Market Basket Analysis
9 pages
Business Intelligence MCQs and Concepts
100% (1)
Business Intelligence MCQs and Concepts
84 pages
Big Data Analytics MCQ Question Bank
100% (1)
Big Data Analytics MCQ Question Bank
4 pages
Data Warehousing and Mining MCQs
78% (147)
Data Warehousing and Mining MCQs
34 pages
Modeling and Analysis in DSS Systems
No ratings yet
Modeling and Analysis in DSS Systems
11 pages
Data Mining Exam Guidelines and Topics
100% (2)
Data Mining Exam Guidelines and Topics
3 pages
Predictive Analytics MCQs and Answers
100% (1)
Predictive Analytics MCQs and Answers
19 pages
Data Science MCQs with Answers
100% (1)
Data Science MCQs with Answers
4 pages
OLAP Operations for Data Warehouse Design
No ratings yet
OLAP Operations for Data Warehouse Design
7 pages
Clustering MCQs in Data Mining
No ratings yet
Clustering MCQs in Data Mining
26 pages
Data Warehousing Mining MCQs
No ratings yet
Data Warehousing Mining MCQs
12 pages
Clustering Methods: Hierarchical vs K-Means
No ratings yet
Clustering Methods: Hierarchical vs K-Means
9 pages
Top-Down System Analysis Overview
No ratings yet
Top-Down System Analysis Overview
22 pages
Data Warehouse MCQs and Answers
100% (1)
Data Warehouse MCQs and Answers
32 pages
Data Mining Algorithms and Techniques
No ratings yet
Data Mining Algorithms and Techniques
12 pages
Key Data Mining Questions and Concepts
100% (1)
Key Data Mining Questions and Concepts
2 pages
Data Mining and Warehousing MCQs
No ratings yet
Data Mining and Warehousing MCQs
29 pages
Examples of Big Data in Practice
No ratings yet
Examples of Big Data in Practice
8 pages
NoSQL Database Concepts and MCQs
No ratings yet
NoSQL Database Concepts and MCQs
12 pages
Competitive Forces in Encyclopedia Industry
No ratings yet
Competitive Forces in Encyclopedia Industry
27 pages
Machine Learning MCQ Question Bank
100% (1)
Machine Learning MCQ Question Bank
4 pages
Data Mining Exercises and Solutions
No ratings yet
Data Mining Exercises and Solutions
13 pages
Big Data MCQ Questions and Answers
No ratings yet
Big Data MCQ Questions and Answers
47 pages
Data Mining MCQ
75% (4)
Data Mining MCQ
24 pages
SAD Exam Questions and Answers
100% (1)
SAD Exam Questions and Answers
2 pages
Data Mining Concepts and Techniques Quiz
No ratings yet
Data Mining Concepts and Techniques Quiz
10 pages
Big Data Analytics MCQs and Answers
No ratings yet
Big Data Analytics MCQs and Answers
14 pages
Data Mining and Warehousing MCQs
No ratings yet
Data Mining and Warehousing MCQs
32 pages
نظم دعم القرار: مفهوم وأهمية
No ratings yet
نظم دعم القرار: مفهوم وأهمية
20 pages
Advanced Analytics Question Bank
No ratings yet
Advanced Analytics Question Bank
14 pages
Data Analysis & Visualization MCQs
No ratings yet
Data Analysis & Visualization MCQs
6 pages
System Study and Design Concepts
No ratings yet
System Study and Design Concepts
26 pages
Image Mining Project Report
No ratings yet
Image Mining Project Report
12 pages
Advanced DBMS MCQs and Answers
No ratings yet
Advanced DBMS MCQs and Answers
8 pages
Big Data Hadoop MCQs: Sqoop & Hive
No ratings yet
Big Data Hadoop MCQs: Sqoop & Hive
109 pages
System Analysis and Design Fundamentals
100% (1)
System Analysis and Design Fundamentals
11 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
4 pages
Data Mining MCQs for CSE Students
No ratings yet
Data Mining MCQs for CSE Students
4 pages
Data Warehousing & Mining Question Bank
No ratings yet
Data Warehousing & Mining Question Bank
8 pages
Data Mining and Warehousing Test Bank
No ratings yet
Data Mining and Warehousing Test Bank
61 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
6 pages
Data Mining & Analytics Question Bank
No ratings yet
Data Mining & Analytics Question Bank
7 pages
Data Mining and Business Intelligence MCQs
No ratings yet
Data Mining and Business Intelligence MCQs
51 pages
Data Mining Techniques in ADBMS
No ratings yet
Data Mining Techniques in ADBMS
24 pages
001-2022-1114 DLAPITE02 Course Book
No ratings yet
001-2022-1114 DLAPITE02 Course Book
142 pages
Clustering Patterns in Data Analytics
No ratings yet
Clustering Patterns in Data Analytics
29 pages
Data Mining and Warehousing Course Overview
No ratings yet
Data Mining and Warehousing Course Overview
271 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
11 pages
Spark and Big Data Analytics Overview
No ratings yet
Spark and Big Data Analytics Overview
6 pages
IBM Employee Contact List
No ratings yet
IBM Employee Contact List
37 pages
Introduction to Marketing Analytics
No ratings yet
Introduction to Marketing Analytics
33 pages
Machine Learning in Telecom Services
No ratings yet
Machine Learning in Telecom Services
5 pages
Verification vs. Discovery in Data Mining
No ratings yet
Verification vs. Discovery in Data Mining
10 pages
Machine Learning Exam Guide for CSE
No ratings yet
Machine Learning Exam Guide for CSE
6 pages
Data Mining Challenges and Techniques
No ratings yet
Data Mining Challenges and Techniques
9 pages
Data Mining for Crop Price Prediction
No ratings yet
Data Mining for Crop Price Prediction
3 pages
Consumer Behavior Trends
No ratings yet
Consumer Behavior Trends
12 pages
Importance of Archival Materials in Libraries
No ratings yet
Importance of Archival Materials in Libraries
23 pages
Data Warehousing and Mining MCQs Guide
No ratings yet
Data Warehousing and Mining MCQs Guide
24 pages
ShapeFormer: Transformer for Time Series
No ratings yet
ShapeFormer: Transformer for Time Series
11 pages
CV Slawski
No ratings yet
CV Slawski
6 pages
Important Data Mining Questions Guide
No ratings yet
Important Data Mining Questions Guide
3 pages
Machine Learning Weekly Quiz 1 Guide
No ratings yet
Machine Learning Weekly Quiz 1 Guide
7 pages
Machine Learning and Data Mining Overview
No ratings yet
Machine Learning and Data Mining Overview
7 pages
Introduction to Machine Learning BCS602
No ratings yet
Introduction to Machine Learning BCS602
36 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
52 pages
Clustering Techniques in Unsupervised Learning
No ratings yet
Clustering Techniques in Unsupervised Learning
12 pages
Data Mining Course Overview and Structure
No ratings yet
Data Mining Course Overview and Structure
8 pages
Data Mining in Banking: A Review
No ratings yet
Data Mining in Banking: A Review
15 pages
Predictive Analytics in Data Science
No ratings yet
Predictive Analytics in Data Science
14 pages
Data Mining MCQs - Unit 1
No ratings yet
Data Mining MCQs - Unit 1
9 pages
Study Plan for Master's in IT at NTUST
100% (1)
Study Plan for Master's in IT at NTUST
1 page
AI & Machine Learning Syllabus 2024-25
No ratings yet
AI & Machine Learning Syllabus 2024-25
84 pages

Data Mining Midterm Exam 2021/2022

Uploaded by

Data Mining Midterm Exam 2021/2022

Uploaded by

Suez Canal University

Faculty of Computers & Informatics

Answer _ Midterm Exam

9. Which of the following terms is used as a synonym for data mining?

22.………. is the main technique employed for data selection.

 Suppose a group of 9 sales price records has been sorted as follows:

Common questions

How does min-max normalization differ from z-score normalization, and in what scenarios might each be preferable?

Evaluate the importance of data preprocessing steps like feature creation and discrimination in the data mining pipeline.

Analyze the impact of data discretization on the effectiveness of classification algorithms in data mining.

Why is the concept of equi-depth binning important in data mining, and how does it differ from other binning methods?

Explain how sampling improves data mining processes and discuss the key principle for effective sampling.

Which step in the knowledge discovery process is crucial for combining data from multiple sources, and why is it important?

What are the main functions of data mining and how do they differ from each other?

Describe how outliers differ from noise in the context of data mining and what impact they have on data analysis.

Discuss how cosine similarity is calculated between two documents and its significance in text mining applications.

In the context of data mining, how does the detection of missing values and their treatment affect the overall data analysis outcome?

You might also like