0% found this document useful (0 votes)
126 views34 pages

Data Warehouse MCQs with Answers

The document contains a series of multiple-choice questions related to data warehousing concepts, architecture, and processes. It covers topics such as data warehouse characteristics, metadata, OLAP, data mining, and data transformation. Each question is followed by four answer options, testing knowledge on various aspects of data warehousing.

Uploaded by

Mostafa Gamal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views34 pages

Data Warehouse MCQs with Answers

The document contains a series of multiple-choice questions related to data warehousing concepts, architecture, and processes. It covers topics such as data warehouse characteristics, metadata, OLAP, data mining, and data transformation. Each question is followed by four answer options, testing knowledge on various aspects of data warehousing.

Uploaded by

Mostafa Gamal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Certainly.

Here are the questions from the provided PDF with the answers moved to a
separate key at the end.

Data Warehouse Multiple Choice Questions

1. ______ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in


support of management decisions.

A. Data Mining.

B. Data Warehousing.

C. Web Mining.

D. Text Mining.

2. The data Warehouse is ______.

A. read only.

B. write only.

C. read write only.

D. none.

3. Expansion for DSS in DW is ______.

A. Decision Support system.

B. Decision Single System.

C. Data Storable System.

D. Data Support System.

4. The important aspect of the data warehouse environment is that data found within the
data warehouse is ______.

A. subject-oriented.

B. time-variant.

C. integrated.
D. All of the above.

5. The time horizon in Data warehouse is usually ______.

A. 1-2 years.

B. 3-4years.

C. 5-6 years.

D. 5-10 years.

6. The data is stored, retrieved & updated in ______.

A. OLAP.

B. OLTP.

C. SMTP.

D. FTP.

7. ______ describes the data contained in the data warehouse.

A. Relational data.

B. Operational data.

C. Metadata.

D. Informational data.

8. ______ predicts future trends & behaviors, allowing business managers to make
proactive, knowledge-driven decisions.

A. Data warehouse.

B. Data mining.

C. Datamarts.

D. Metadata.
9. ______ is the heart of the warehouse.

A. Data mining database servers.

B. Data warehouse database servers.

C. Data mart database servers.

D. Relational data base servers.

10. ______ is the specialized data warehouse database.

A. Oracle.

B. DBZ.

C. Informix.

D. Redbrick.

11. ______ defines the structure of the data held in operational databases and used by
operational applications.

A. User-level metadata.

B. Data warehouse metadata.

C. Operational metadata.

D. Data mining metadata.

12. ______ is held in the catalog of the warehouse database system.

A. Application level metadata.

B. Algorithmic level metadata.

C. Departmental level metadata.

D. Core warehouse metadata.


13. ______ maps the core warehouse metadata to business concepts, familiar and useful to
end users.

A. Application level metadata.

B. User level metadata.

C. Enduser level metadata.

D. Core level metadata.

14. ______ consists of formal definitions, such as a COBOL layout or a database schema.

A. Classical metadata.

B. Transformation metadata.

C. Historical metadata.

D. Structural metadata.

15. ______ consists of information in the enterprise that is not in classical form.

A. Mushy metadata.

B. Differential metadata.

C. Data warehouse.

D. Data mining.

16. ______ databases are owned by particular departments or business groups.

A. Informational.

B. Operational.

C. Both informational and operational.

D. Flat.

17. The star schema is composed of ______ fact table.


A. one.

B. two.

C. three.

D. four.

18. The time horizon in operational environment is ______.

A. 30-60 days.

B. 60-90 days.

C. 90-120 days.

D. 120-150 days.

19. The key used in operational environment may not have an element of ______.

A. time.

B. cost.

C. frequency.

D. quality.

20. Data can be updated in ______ environment.

A. data warehouse.

B. data mining.

C. operational.

D. informational.

21. Record cannot be updated in ______.

A. OLTP

B. files
C. RDBMS

D. data warehouse

22. The source of all data warehouse data is the ______ environment.

A. operational environment.

B. informal environment.

C. formal environment.

D. technology environment.

23. Data warehouse contains ______ data that is never found in the operational
environment.

A. normalized.

B. informational.

C. summary.

D. denormalized.

24. The modern CASE tools belong to ______ category.

A. analysis.

B. Development

C. Coding

D. Delivery

25. Bill Inmon has estimated ______ of the time required to build a data warehouse, is
consumed in the conversion process.

A. 10 percent.

B. 20 percent.

C. 40 percent
D. 80 percent.

26. Detail data in single fact table is otherwise known as ______.

A. monoatomic data.

B. diatomic data.

C. atomic data.

D. multiatomic data.

27. ______ test is used in an online transactional processing environment.

A. MEGA.

B. MICRO.

C. MACRO.

D. ACID.

28. ______ is a good alternative to the star schema.

A. Star schema.

B. Snowflake schema.

C. Fact constellation.

D. Star-snowflake schema.

29. The biggest drawback of the level indicator in the classic star-schema is that it limits
______.

A. quantify.

B. qualify.

C. flexibility.

D. ability.
30. A data warehouse is ______.

A. updated by end users.

B. contains numerous naming conventions and formats

C. organized around important subject areas.

D. contains only current data.

31. An operational system is ______.

A. used to run the business in real time and is based on historical data.

B. used to run the business in real time and is based on current data.

C. used to support decision making and is based on current data.

D. used to support decision making and is based on historical data.

32. The generic two-level data warehouse architecture includes ______.

A. at least one data mart.

B. data that can extracted from numerous internal and external sources.

C. near real-time updates.

D. far real-time updates.

33. The active data warehouse architecture includes ______.

A. at least one data mart.

B. data that can extracted from numerous internal and external sources.

C. near real-time updates.

D. all of the above.

34. Reconciled data is ______.


A. data stored in the various operational systems throughout the organization.

B. current data intended to be the single source for all decision support systems.

C. data stored in one operational system in the organization.

D. data that has been selected and formatted for end-user support applications.

35. Transient data is ______.

A. data in which changes to existing records cause the previous version of the records to be
eliminated.

B. data in which changes to existing records do not cause the previous version of the
records to be eliminated.

C. data that are never altered or deleted once they have been added.

D. data that are never deleted once they have been added.

36. The extract process is ______.

A. capturing all of the data contained in various operational systems.

B. capturing a subset of the data contained in various operational systems.

C. capturing all of the data contained in various decision support systems.

D. capturing a subset of the data contained in various decision support systems.

37. Data scrubbing is ______.

A. a process to reject data from the data warehouse and to create the necessary indexes.

B. a process to load the data in the data warehouse and to create the necessary indexes.

C. a process to upgrade the quality of data after it is moved into a data warehouse.

D. a process to upgrade the quality of data before it is moved into a data warehouse

38. The load and index is ______.

A. a process to reject data from the data warehouse and to create the necessary indexes.
B. a process to load the data in the data warehouse and to create the necessary indexes.

C. a process to upgrade the quality of data after it is moved into a data warehouse.

D. a process to upgrade the quality of data before it is moved into a data warehouse.

39. Data transformation includes ______.

A. a process to change data from a detailed level to a summary level.

B. a process to change data from a summary level to a detailed level.

C. joining data from one source into various sources of data.

D. separating data from one source into various sources of data.

40. ______ is called a multifield transformation.

A. Converting data from one field into multiple fields.

B. Converting data from fields into field.

C. Converting data from double fields into multiple fields.

D. Converting data from one field to one field.

41. The type of relationship in star schema is ______.

A. many-to-many.

B. one-to-one.

C. one-to-many.

D. many-to-one.

42. Fact tables are ______.

A. completely demoralized.

B. partially demoralized.

C. completely normalized.
D. partially normalized.

43. ______ is the goal of data mining.

A. To explain some observed event or condition.

B. To confirm that data exists.

C. To analyze data for expected relationships.

D. To create a new data warehouse.

44. Business Intelligence and data warehousing is used for ______.

A. Forecasting.

B. Data Mining.

C. Analysis of large volumes of product sales data.

D. All of the above.

45. The data administration subsystem helps you perform all of the following, except
______.

A. backups and recovery.

B. query optimization.

C. security management.

D. create, change, and delete information.

46. The most common source of change data in refreshing a data warehouse is ______.

A. queryable change data.

B. cooperative change data.

C. logged change data.

D. snapshot change data.


47. ______ are responsible for running queries and reports against data warehouse tables.

A. Hardware.

B. Software.

C. End users.

D. Middle ware.

48. Query tool is meant for ______.

A. data acquisition.

B. information delivery.

C. information exchange.

D. communication.

49. Classification rules are extracted from ______.

A. root node.

B. decision tree.

C. siblings.

D. branches.

50. Dimensionality reduction reduces the data set size by removing ______.

A. relevant attributes.

B. irrelevant attributes.

C. derived attributes.

D. composite attributes.

51. ______ is a method of incremental conceptual clustering.


A. CORBA.

B. OLAP.

C. COBWEB.

D. STING.

52. Effect of one attribute value on a given class is independent of values of other attribute
is called ______.

A. value independence.

B. class conditional independence.

C. conditional independence.

D. unconditional independence.

53. The main organizational justification for implementing a data warehouse is to provide
______.

A. cheaper ways of handling transportation.

B. decision support.

C. storing large volume of data.

D. access to data.

54. Multidimensional database is otherwise known as ______.

A. RDBMS

B. DBMS

C. EXTENDED RDBMS

D. EXTENDED DBMS

55. Data warehouse architecture is based on ______.

A. DBMS.
B. RDBMS.

C. Sybase.

D. SQL Server.

56. Source data from the warehouse comes from ______.

A. ODS.

B. TDS.

C. MDDB.

D. ORDBMS.

57. ______ is a data transformation process.

A. Comparison.

B. Projection.

C. Selection.

D. Filtering.

58. The technology area associated with CRM is ______.

A. specialization.

B. generalization.

C. personalization.

D. summarization.

59. SMP stands for ______.

A. Symmetric Multiprocessor.

B. Symmetric Multiprogramming.

C. Symmetric Metaprogramming.
D. Symmetric Microprogramming.

60. ______ are designed to overcome any limitations placed on the warehouse by the nature
of the relational data model.

A. Operational database.

B. Relational database.

C. Multidimensional database.

D. Data repository.

61. ______ are designed to overcome any limitations placed on the warehouse by the nature
of the relational data model.

A. Operational database.

B. Relational database.

C. Multidimensional database.

D. Data repository.

62. MDDB stands for ______.

A. multiple data doubling.

B. multidimensional databases.

C. multiple double dimension.

D. multi-dimension doubling.

63. ______ is data about data.

A. Metadata.

B. Microdata.

C. Minidata.

D. Multidata.
64. ______ is an important functional component of the metadata.

A. Digital directory.

B. Repository.

C. Information directory.

D. Data dictionary.

65. EIS stands for ______.

A. Extended interface system.

B. Executive interface system.

C. Executive information system.

D. Extendable information system.

66. ______ is data collected from natural systems.

A. MRI scan.

B. ODS data.

C. Statistical data.

D. Historical data.

67. ______ is an example of application development environments.

A. Visual Basic.

B. Oracle.

C. Sybase.

D. SQL Server.

68. The term that is not associated with data cleaning process is ______.
A. domain consistency.

B. deduplication.

C. disambiguation.

D. segmentation.

69. ______ are some popular OLAP tools.

A. Metacube, Informix.

B. Oracle Express, Essbase.

C. HOLAP.

D. MOLAP.

70. Capability of data mining is to build ______ models.

A. retrospective.

B. interrogative.

C. predictive.

D. imperative.

71. ______ is a process of determining the preference of customer's majority.

A. Association.

B. Preferencing.

C. Segmentation.

D. Classification.

72. Strategic value of data mining is ______.

A. cost-sensitive.

B. work-sensitive.
C. time-sensitive.

D. technical-sensitive.

73. ______ proposed the approach for data integration issues.

A. Ralph Campbell.

B. Ralph Kimball.

C. John Raphlin.

D. James Gosling.

74. The terms equality and roll up are associated with ______.

A. OLAP.

B. visualization.

C. data mart.

D. decision tree.

75. Exceptional reporting in data warehousing is otherwise called as ______.

A. exception.

B. alerts.

C. errors.

D. bugs.

76. ______ is a metadata repository.

A. Prism solution directory manager.

B. CORBA.

C. STUNT.

D. COBWEB.
77. ______ is an expensive process in building an expert system.

A. Analysis.

B. Study.

C. Design.

D. Information collection.

78. The full form of KDD is ______.

A. Knowledge database.

B. Knowledge discovery in database.

C. Knowledge data house.

D. Knowledge data definition.

79. The first International conference on KDD was held in the year ______.

A. 1996.

B. 1997.

C. 1995.

D. 1994.

80. Removing duplicate records is a process called ______.

A. recovery.

B. data cleaning.

C. data cleansing.

D. data pruning.
81. ______ contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse.

A. Business metadata.

B. Technical metadata.

C. Operational metadata.

D. Financial metadata.

82. ______ helps to integrate, maintain and view the contents of the data warehousing
system.

A. Business directory.

B. Information directory.

C. Data dictionary.

D. Database.

83. Discovery of cross-sales opportunities is called ______.

A. segmentation.

B. visualization.

C. correction.

D. association.

84. Data marts that incorporate data mining tools to extract sets of data are called ______.

A. independent data mart.

B. dependent data marts.

C. intra-entry data mart.

D. inter-entry data mart.

85. ______ can generate programs itself, enabling it to carry out new tasks.
A. Automated system.

B. Decision making system.

C. Self-learning system.

D. Productivity system.

86. The power of self-learning system lies in ______.

A. cost.

B. speed.

C. accuracy.

D. simplicity.

87. Building the informational database is done with the help of ______.

A. transformation or propagation tools.

B. transformation tools only.

C. propagation tools only.

D. extraction tools.

88. How many components are there in a data warehouse?

A. two.

B. three.

C. four.

D. five.

89. Which of the following is not a component of a data warehouse?

A. Metadata.

B. Current detail data.


C. Lightly summarized data.

D. Component Key.

90. ______ is data that is distilled from the low level of detail found at the current detailed
level.

A. Highly summarized data.

B. Lightly summarized data.

C. Metadata.

D. Older detail data.

91. Highly summarized data is _______. A. compact and easily accessible. B. compact and
expensive. C. compact and hardly accessible. D. compact.

92. A directory to help the DSS analyst locate the contents of the data warehouse is seen in
______. A. Current detail data.

B. Lightly summarized data. C. Metadata. D. Older detail data.

93. Metadata contains atleast _________. A. the structure of the data. B. the algorithms
used for summarization. C. the mapping from the operational environment to the data
warehouse.

D. all of the above.

94. Which of the following is not a old detail storage medium? A. Phot Optical Storage. B.
RAID. C. Microfinche.

D. Pen drive.

95. The data from the operational environment enter _______ of data warehouse. A. Current
detail data.
B. Older detail data. C. Lightly summarized data. D. Highly summarized data.

96. The data in current detail level resides till _______ event occurs. A. purge. B.
summarization. C. archieved. D. all of the above.

97. The dimension tables describe the _________. A. entities. B. facts. C. keys. D. units of
measures.

98. The granularity of the fact is the _____ of detail at which it is recorded. A.
transformation.

B. summarization.

C. level.

D. transformation and summarization.

99. Which of the following is not a primary grain in analytical modeling? A. Transaction. B.
Periodic snapshot. C. Accumulating snapshot.

D. All of the above.

100. Granularity is determined by ______.

A. number of parts to a key. B. granularity of those parts.

C. both A and B. D. none of the above.

101. __________ of data means that the attributes within a given entity are fully dependent
on the entire primary key of the entity.

A. Additivity.

B. Granularity.

C. Functional dependency.
D. Dimensionality.

102. A fact is said to be fully additive if __________. A. it is additive over every dimension of
its dimensionality.

B. additive over atleast one but not all of the dimensions.

C. not additive over any dimension. D. None of the above.

103. A fact is said to be partially additive if __________. A. it is additive over every dimension
of its dimensionality.

B. additive over atleast one but not all of the dimensions.

C. not additive over any dimension. D. None of the above.

104. A fact is said to be non-additive if __________. A. it is additive over every dimension of


its dimensionality.

B. additive over atleast one but not all of the dimensions.

C. not additive over any dimension.

D. None of the above.

105. Non-additive measures can often combined with additive measures to create new
_________. A. additive measures.

B. non-additive measures. C. partially additive. D. All of the above.

106. A fact representing cumulative sales units over a day at a store for a product is a
_________. A. additive fact.

B. fully additive fact.

C. partially additive fact.

D. non-additive fact.
107. __________ of data means that the attributes within a given entity are fully dependent
on the entire primary key of the entity.

A. Additivity.

B. Granularity.

C. Functional Dependency.

D. Dependency.

108. Which of the following is the other name of Data mining?

A. Exploratory data analysis.

B. Data driven discovery.

C. Deductive learning.

D. All of the above.

109. Which of the following is a predictive model? A. Clustering. B. Regression. C.


Summarization. D. Association rules.

110. Which of the following is a descriptive model? A. Classification. B. Regression. C.


Sequence discovery. D. Association rules.

111. A __________ model identifies patterns or relationships. A. Descriptive. B. Predictive.


C. Regression. D. Time series analysis.

112. A predictive model makes use of ________. A. current data.

B. historical data.

C. both current and historical data.

D. assumptions.
113. __________ maps data into predefined groups. A. Regression.

B. Time series analysis

C. Prediction.

D. Classification.

114. __________ is used to map a data item to a real valued prediction variable. A.
Regression.

B. Time series analysis.

C. Prediction.

D. Classification.

115. In __________, the value of an attribute is examined as it varies over time. A.


Regression.

B. Time series analysis.

C. Sequence discovery.

D. Prediction.

116. In ________ the groups are not predefined. A. Association rules.

B. Summarization.

C. Clustering.

D. Prediction.

117. Link Analysis is otherwise called as __________. A. affinity analysis.

B. association rules.

C. both A & B.

D. Prediction.
118. ________ is a the input to KDD. A. Data.

B. Information.

C. Query.

D. Process.

119. The output of KDD is __________. A. Data. B. Information.

C. Query.

D. Useful information.

120. The KDD process consists of _______ steps. A. three.

B. four.

C. five.

D. six.

121. Treating incorrect or missing data is called as __________. A. selection.

B. preprocessing.

C. transformation.

D. interpretation.

122. Converting data from different sources into a common format for processing is called
as ________. A. selection.

B. preprocessing.

C. transformation.

D. interpretation.
123. Various visualization techniques are used in __________ step of KDD. A. selection.

B. transformaion.

C. data mining.

D. interpretation.

124. Extreme values that occur infrequently are called as _________. A. outliers.

B. rare values.

C. dimensionality reduction.

D. All of the above.

125. Box plot and scatter diagram techniques are _______. A. Graphical. B. Geometric. C.
Icon-based. D. Pixel-based.

126. __________ is used to proceed from very specific knowledge to more general
information. A. Induction.

B. Compression.

C. Approximation.

D. Substitution.

127. Describing some characteristics of a set of data by a general model is viewed as


__________. A. Induction.

B. Compression.

C. Approximation.

D. Summarization.

128. _____________ helps to uncover hidden information about the data. A. Induction.

B. Compression.
C. Approximation.

D. Summarization.

129. _______ are needed to identify training data and desired results. A. Programmers.

B. Designers.

C. Users.

D. Administrators.

130. Overfitting occurs when a model __________. A. does fit in future states.

B. does not fit in future states.

C. does fit in current state.

D. does not fit in current state.

131. The problem of dimensionality curse involves __________. A. the use of some
attributes may interfere with the correct completion of a data mining task.

B. the use of some attributes may simply increase the overall complexity.

C. some may decrease the efficiency of the algorithm.

D. All of the above.

132. Incorrect or invalid data is known as ________. A. changing data. B. noisy data. C.
outliers. D. missing data.

133. ROI is an acronym of ________. A. Return on Investment.

B. Return on Information.

C. Repetition of Information.

D. Runtime of Instruction
134. The __________ of data could result in the disclosure of information that is deemed to
be confidential.

A. authorized use.

B. unauthorized use.

C. authenticated use.

D. unauthenticated use.

135. __________ data are noisy and have many missing attribute values. A. Preprocessed. B.
Cleaned. C. Real-world. D. Transformed.

136. The rise of DBMS occurred in early __________. A. 1950's.

B. 1960's

C. 1970's

D. 1980's.

137. SQL stand for _________. A. Standard Query Language. B. Structured Query Language.
C. Standard Quick List. D. Structured Query list.

138. Which of the following is not a data mining metric? A. Space complexity.

B. Time complexity.

C. ROI.

D. All of the above.

139. Reducing the number of attributes to solve the high dimensionality problem is called
as ________. A. dimensionality curse.

B. dimensionality reduction.

C. cleaning.
D. Overfitting.

Answer Key

| Question | Answer | | Question | Answer | | Question | Answer | | Question | Answer | | :--- | :-


--: | | :--- | :---: | | :--- | :---: | | :--- | :---: | | 1 | B | | 36 | B | | 71 | B | | 106 | B | | 2 | A | | 37 | D | | 72 |
C | | 107 | C | | 3 | A | | 38 | B | | 73 | B | | 108 | D | | 4 | D | | 39 | A | | 74 | C | | 109 | B | | 5 | D | | 40 |
A | | 75 | B | | 110 | C | | 6 | B | | 41 | C | | 76 | A | | 111 | A | | 7.0 | C | | 42 | C | | 77 | D | | 112 | B | |
8 | B | | 43 | A | | 78 | B | | 113 | D | | 9 | B | | 44 | D | | 79 | C | | 114 | B | | 10 | D | | 45 | D | | 80 | B | |
115 | B | | 11 | C | | 46 | A | | 81 | A | | 116 | C | | 12 | B | | 47 | C | | 82 | B | | 117 | C | | 13 | A | | 48 |
A | | 83 | D | | 118 | A | | 14 | A | | 49 | B | | 84 | B | | 119 | D | | 15 | A | | 50 | B | | 85 | D | | 120 | C | |
16 | B | | 51 | C | | 86 | C | | 121 | B | | 17 | A | | 52 | A | | 87 | A | | 122 | C | | 18 | B | | 53 | C | | 88 | D
| | 123 | D | | 19 | A | | 54 | B | | 89 | D | | 124 | A | | 20 | C | | 55 | B | | 90 | B | | 125 | B | | 21 | D | | 56
| A | | 91 | A | | 126 | A | | 22 | A | | 57 | D | | 92 | C | | 127 | B | | 23 | C | | 58 | C | | 93 | D | | 128 | C |
| 24 | A | | 59 | A | | 94 | D | | 129 | C | | 25 | D | | 60 | C | | 95 | A | | 130 | B | | 26 | C | | 61 | C | | 96 |
D | | 131 | D | | 27 | D | | 62 | B | | 97 | B | | 132 | B | | 28 | C | | 63 | A | | 98 | C | | 133 | A | | 29 | C | |
64 | C | | 99 | B | | 134 | B | | 30 | C | | 65 | C | | 100 | C | | 135 | C | | 31 | B | | 66 | A | | 101 | C | |
136 | C | | 32 | C | | 67 | A | | 102 | A | | 137 | B | | 33 | D | | 68 | D | | 103 | B | | 138 | D | | 34 | B | |
69 | A | | 104 | C | | 139 | B | | 35 | A | | 70 | C | | 105 | A | | | |

check if the answer is correct

Show thinking

Yes, the answers in the provided key are correct. They have been directly extracted from the
source PDF file you provided (MSQ_-_DWH[1].pdf).

Verification Examples

Here are a few examples to confirm the accuracy:

Question 1: The correct answer is B. Data Warehousing

Question 28: The correct answer is C. Fact constellation

Question 65: The correct answer is C. Executive information system


.

Question 94: The correct answer is D. Pen drive

Question 137: The correct answer is B. Structured Query Language

Corrections (with explanations)

• Q12: B → D (Core warehouse metadata)


The warehouse DBMS catalog stores the core/technical warehouse metadata.

• Q13: B → C (End-user level metadata)


End-user (business) metadata maps technical metadata to business terms that
users understand.

• Q14: A → D (Structural metadata)


Formal definitions like COBOL layouts and schemas are structural metadata.

• Q24: A → B (Development)
Modern CASE tools primarily support development (design/build), not just analysis.

• Q28: C → B (Snowflake schema)


A common alternative to star is the snowflake (normalized dimensions), not a
constellation.

• Q32: C → A (At least one data mart)


A generic two-level DW architecture: warehouse + data marts.

• Q46: A → C (Logged change data)


The most common refresh source is DB logs/CDC (logged change data), not ad-hoc
queries.

• Q48: A → B (Information delivery)


Query tools are for delivering information (reports/analysis), not data acquisition.

• Q53: C → B (Decision support)


Main justification for a DW is decision support, not just storing lots of data.

• Q54: B → C (Extended RDBMS)


In these MCQs, MDDB is often positioned as an extended RDBMS (vs plain DBMS).

• Q57: D → C (Selection)
In ETL terminology, selection/projection/aggregation are standard transformation
ops; “filtering” is informal.
• Q64: C → D (Data dictionary)
A data dictionary is a key functional component of metadata management.

• Q69: A → B (Oracle Express, Essbase)


Classic OLAP tools include Oracle Express and Essbase.

• Q71: B → A (Association)
Finding cross-customer preferences = association/market-basket analysis;
“preferencing” isn’t a standard term.

• Q74: C → A (OLAP)
“Roll-up/drill-down/slice/dice” belong to OLAP, not data marts.

• Q80: B → C (Data cleansing)


Removing duplicates is typically called data cleansing (synonym of cleaning, but
this item expects “cleansing”).

• Q85: D → C (Self-learning system)


A self-learning system can (by definition) generate programs/rules to perform new
tasks.

• Q97: B → A (Entities)
Dimension tables describe entities (who/what/where), not facts.

• Q99: B → D (All of the above)


Transaction, periodic snapshot, and accumulating snapshot are all primary grains;
the “not” option is therefore D.

• Q106: B → C (Partially additive fact)


Cumulative daily units are not additive over time (only across other dims) →
partially additive.

• Q108: D → B (Data-driven discovery)


Data mining is inductive/data-driven discovery; “deductive learning” isn’t a
standard synonym.

• Q110: C → D (Association rules)


Descriptive models include clustering and association rules; sequence discovery
is often predictive/temporal.

• Q114: B → A (Regression)
Mapping to a real-valued target is regression (time series is a special case).
• Q125: B → A (Graphical)
Box plots and scatter diagrams are graphical techniques.

• Q127: B → D (Summarization)
Describing data with a general model is summarization (not compression).

• Q128: C → A (Induction)
Induction (generalizing from data) is what uncovers hidden structure/patterns.

• Q138: D → C (ROI)
ROI is a business metric, not an algorithmic (space/time) metric for data mining.

Common questions

Powered by AI

The star schema supports data warehousing by organizing data into a central fact table connected to dimension tables, optimizing query performance for analytic processes. With one fact table and multiple dimension tables providing context, star schemas simplify query structures and enhance data warehouse efficiency .

The time horizon in data warehouses is typically 5-10 years, significantly longer than in operational databases. This is because data warehouses are designed to store historical data for trend analysis and decision support, which require a longer time frame for effective data analysis . In contrast, operational databases are used for current, transactional data with a shorter time horizon .

OLAP tools enhance data analysis by providing a multi-dimensional view of data, allowing users to perform complex calculations, trend analysis, and data modeling. They facilitate rapid querying and reporting, improving decision-making by enabling users to explore data efficiently through operations such as drilling down, slicing, and dicing . This is crucial for leveraging the full potential of data warehouses in business intelligence applications.

Metadata in a data warehouse acts as a guide that describes the data contained within the warehouse, provides the mapping from operational data to data warehouse data, and defines the structure of the operational databases. This is crucial for data retrieval, interpretation, and management as it assists users in understanding and utilizing the data effectively .

Business intelligence can be derived from multidimensional databases through their ability to process and analyze data across multiple dimensions simultaneously, enabling complex queries and facilitating insightful analysis of multidimensional data sets. This capacity aids in identifying patterns, trends, and correlations in large volumes of data, crucial for strategic decision-making .

A data warehouse is subject-oriented, integrated, time-variant, and nonvolatile, while operational databases are typically designed for daily transaction processing and are not subject-oriented or integrated in the same way . The operational environment is more focused on current transaction processing with a short time horizon typically between 30-60 days .

ETL (Extract, Transform, Load) processes are crucial for extracting data from diverse sources, transforming it into a suitable format, and loading it into the data warehouse. This ensures that the data is cleaned, integrated, and organized for analysis, which is fundamental for maintaining the quality and accessibility of the data warehouse .

Dimensionality reduction reduces the dataset size by removing irrelevant or redundant attributes, thereby simplifying models, reducing computational load, and enhancing model performance by focusing on the most impactful attributes . This is crucial in data mining where high-dimensional data can lead to overfitting and computing inefficiency .

Data mining predicts future trends and behaviors, allowing business managers to make proactive, knowledge-driven decisions . By analyzing patterns and relationships in large data sets, businesses can gain insights into customer behavior, market trends, and operational efficiencies, thereby improving strategic and tactical decision-making .

Time-variance in data warehousing means that data is stored with time as a key aspect, often keeping snapshots of data over time. This facilitates historical analysis and trend identification since users can compare data across different time periods, unlike operational databases which focus on current data .

You might also like