0% found this document useful (0 votes)
20 views14 pages

Data Warehouse Concepts and Schemas

A data warehouse is a centralized repository for structured data from multiple sources, designed for business intelligence and decision-making. It features characteristics such as being subject-oriented, integrated, time-variant, and non-volatile, and employs ETL processes for data management. Various schemas like star, snowflake, and fact constellation are used to organize data, each with its own advantages and disadvantages.

Uploaded by

esharao224
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views14 pages

Data Warehouse Concepts and Schemas

A data warehouse is a centralized repository for structured data from multiple sources, designed for business intelligence and decision-making. It features characteristics such as being subject-oriented, integrated, time-variant, and non-volatile, and employs ETL processes for data management. Various schemas like star, snowflake, and fact constellation are used to organize data, each with its own advantages and disadvantages.

Uploaded by

esharao224
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

‭Data Warehouse:-‬

‭1.‬ D
‭ efinition‬‭:‬
‭A‬‭data warehouse‬‭is a centralized repository where‬‭structured data from multiple‬
‭sources is stored, integrated, and analyzed for business intelligence and‬
‭decision-making.‬

‭2.‬ ‭Characteristics‬‭:‬

‭‬
○ ‭ ubject-Oriented‬‭: Focuses on specific business areas‬‭(e.g., sales, finance, HR).‬
S
‭○‬ ‭Integrated‬‭: Combines data from different sources into‬‭a unified format.‬
‭○‬ ‭Time-Variant‬‭: Maintains historical data for trend‬‭analysis.‬
‭○‬ ‭Non-Volatile‬‭: Data remains unchanged once stored; only new data is added.‬

‭Goals of Data Warehousing‬


‭‬ T
○ ‭ o help reporting as well as analysis‬
‭○‬ ‭Maintain the organization's historical information‬
‭○‬ ‭Be the foundation for decision making.‬

‭Components of a Data Warehouse‬‭:‬

‭●‬ E ‭ TL (Extract, Transform, Load)‬‭: Extracts data from‬‭sources, transforms it, and loads it‬
‭into the warehouse.‬
‭●‬ ‭Data Staging Area‬‭: Temporary storage before transformation.‬
‭●‬ ‭Data Warehouse Database‬‭: Centralized storage for processed‬‭data.‬
‭●‬ ‭Metadata‬‭: Information about data (data source, structure,‬‭etc.).‬
‭●‬ ‭OLAP (Online Analytical Processing)‬‭: Enables multidimensional data analysis.‬

‭Examples‬‭:‬

‭‬ A
● ‭ mazon Redshift‬‭: Cloud-based data warehouse for large-scale analytics.‬
‭●‬ ‭A Banking System‬‭: Stores customer transactions and detects fraud patterns.‬

‭Advantages‬‭:‬

‭‬
● I‭mproved Decision-Making‬‭: Provides historical and‬‭trend analysis.‬
‭●‬ ‭Data Consistency‬‭: Ensures a single version of the‬‭truth.‬
‭●‬ ‭Performance Optimization‬‭: Faster querying compared‬‭to traditional databases.‬
‭●‬ ‭Security & Access Control‬‭: Restricts access to sensitive data.‬
‭Top-Down Approach (Bill Inmon)‬

‭Working‬

‭ .‬ C
1 ‭ entral Data Warehouse‬‭: A large, enterprise-wide data‬‭warehouse is built first.‬
‭2.‬ ‭ETL Process‬‭: Data is extracted, transformed, and loaded‬‭into the central warehouse.‬
‭3.‬ ‭Specialized Data Marts‬‭: Smaller, department-specific‬‭data marts (finance, marketing)‬
‭are created from the central warehouse.‬

‭Advantages‬


‭ ‬‭Consistent Data View‬‭– Ensures uniformity across‬‭departments.‬

‭ ‬‭Improved Data Consistency‬‭– Standardized data reduces‬‭errors.‬

‭ ‬‭Better Scalability‬‭– New data marts can be added‬‭easily.‬

‭ ‬‭Enhanced Governance‬‭– Centralized control over‬‭data security and compliance.‬

‭Disadvantages‬

‭ ‬‭High Cost & Time-Consuming‬‭– Requires a large upfront investment.‬

‭ ‬‭Complexity‬‭– Difficult to implement and manage‬‭for large organizations.‬

‭ ‬‭Lack of Flexibility‬‭– Hard to adapt to changing‬‭business needs.‬

‭ ‬‭Data Latency‬‭– Delays in data availability due to batch processing.‬

‭Bottom-Up Approach (Ralph Kimball)‬

‭Working‬

‭1.‬ D ‭ epartment-Specific Data Marts‬‭: Small data marts for‬‭individual teams (sales, HR) are‬
‭created first.‬
‭2.‬ ‭Integration‬‭: These data marts are later combined into a unified data warehouse.‬

‭Advantages‬


‭ ‬‭Faster Report Generation‬‭– Quick insights from department-level‬‭data marts.‬

‭ ‬‭Flexibility‬‭– Easily adaptable to changing business‬‭needs.‬

‭ ‬‭Scalability‬‭– Data marts can be added as needed.‬

‭ ‬‭Lower Cost & Time Investment‬‭– More budget-friendly‬‭than the Top-Down approach.‬

‭Disadvantages‬


‭ ‬‭Inconsistent Dimensional View‬‭– Data marts may not‬‭align perfectly.‬

‭ ‬‭Data Silos‬‭– Independent data marts can lead to‬‭fragmentation.‬

‭ ‬‭Integration Challenges‬‭– Unifying different data‬‭marts can be difficult.‬

‭ ‬‭Risk of Inconsistency‬‭– Data definitions may vary across departments.‬

‭2. Star Schema‬


‭ ‬‭Definition‬
‭A star schema is a type of database schema used in data warehousing where a central‬‭fact‬
‭table‬‭connects to multiple‬‭dimension tables‬‭, forming a star-like structure.‬

‭Components of Star Schema‬


‭1.‬ ‭Fact Table‬

‭ ‬ ‭Stores measurable business data (e.g., sales, revenue).‬



‭○‬ ‭Contains‬‭foreign keys‬‭referencing dimension tables.‬
‭○‬ ‭Primary key is usually a‬‭composite key‬‭.‬
‭2.‬ ‭Dimension Tables‬

‭‬ S
○ ‭ tores descriptive attributes (e.g., product details, customer info).‬
‭○‬ ‭Supports‬‭hierarchies‬‭(e.g., date → month → year).‬
‭○‬ ‭Primary key is referenced in the fact table.‬

‭Characteristics‬


‭ ‬‭Denormalized‬‭structure for fast querying.‬
‭✔‬‭Simple design‬‭that is easy to understand.‬
‭✔‬‭Single join path‬‭between dimensions via the fact table.‬
‭✔‬‭Optimized for OLAP (Online Analytical Processing)‬‭.‬

‭Advantages‬


‭ ‬‭Faster Query Performance‬‭– Fewer joins speed up‬‭queries.‬

‭ ‬‭Easier Data Management‬‭– Simple structure for data‬‭updates.‬

‭ ‬‭Better Readability‬‭– Easy to understand and navigate.‬

‭ ‬‭Referential Integrity‬‭– Built-in integrity between fact and dimension tables.‬

‭Disadvantages‬


‭ ‬‭Data Redundancy‬‭– Denormalization leads to storage‬‭overhead.‬

‭ ‬‭Not Ideal for Complex Relationships‬‭– Cannot handle‬‭many-to-many relationships well.‬

‭ ‬‭Less Flexibility‬‭– Schema changes may require table‬‭redesign.‬

‭ ‬‭Large Fact Table Size‬‭– As data grows, the fact table becomes very large.‬
‭Example‬

‭A‬‭Sales Data Warehouse‬‭using Star Schema:‬

Sales (Sale_ID, Date_ID, Product_ID, Customer_ID,‬


‭●‬ ‭Fact Table:‬‭
‭mount, Quantity)‬
A
‭ ‬ ‭Dimension Tables:‬

‭○‬ ‭
Date (Date_ID, Year, Month, Day)‬
‭○‬ ‭
Product (Product_ID, Product_Name, Category, Price)‬
‭○‬ ‭
Customer (Customer_ID, Name, Location, Age)‬
‭○‬ ‭
Branch (Branch_ID, Branch_Name, City, State)‬

‭If the dimension tables are‬‭normalized‬‭, the schema becomes a‬‭Snowflake Schema‬‭.‬

‭Snowflake Schema‬


‭ ‬‭Definition‬
‭A snowflake schema is a variation of the‬‭star schema‬‭where‬‭dimension tables are‬
‭normalized‬‭into multiple related tables, forming a‬‭snowflake-like structure‬‭.‬
‭Characteristics‬


‭ ‬‭Normalization‬‭– Dimension tables are normalized‬‭to reduce redundancy.‬
‭✔‬‭Complex Structure‬‭– Multiple dimension tables are‬‭linked hierarchically.‬
‭✔‬‭Better Storage Efficiency‬‭– Less data duplication‬‭compared to the star schema.‬
‭✔‬‭Suitable for Complex Relationships‬‭– Handles many-to-one and many-to-many‬
‭relationships.‬

‭Advantages‬


‭ ‬‭Less Redundancy‬‭– Normalized tables reduce duplicate data.‬

‭ ‬‭Better Storage Optimization‬‭– Uses less disk space.‬

‭ ‬‭Improved Data Integrity‬‭– Ensures consistency in‬‭data updates.‬

‭ ‬‭Scalability‬‭– Can support complex hierarchies and relationships.‬

‭Disadvantages‬


‭ ‬‭Complex Queries‬‭– More joins lead to slower query‬‭execution.‬

‭ ‬‭Difficult to Understand‬‭– Structure is more complicated‬‭than the star schema.‬

‭ ‬‭Higher Maintenance‬‭– More tables require additional‬‭effort for management.‬

‭ ‬‭Increased Join Operations‬‭– Query performance may be slower due to multiple joins.‬

‭Example‬

‭A‬‭Sales Data Warehouse‬‭using Snowflake Schema:‬

Sales (Sale_ID, Date_ID, Product_ID, Customer_ID,‬


‭●‬ ‭Fact Table:‬‭
‭mount, Quantity)‬
A
‭●‬ ‭Dimension Tables:‬
‭○‬ ‭
Date (Date_ID, Year_ID)‬‭→‬‭
Year (Year_ID, Year)‬
‭○‬ ‭
Product (Product_ID, Category_ID)‬‭→‬‭
Category (Category_ID,‬
Category_Name)‬

‭○‬ ‭
Customer (Customer_ID, Region_ID)‬‭→‬‭
Region (Region_ID,‬
Country, State, City)‬

‭Unlike a‬‭star schema‬‭, here dimension tables are‬‭split into multiple tables‬‭to normalize data.‬

‭3. Fact Constellation Schema‬


‭ ‬‭Definition‬
‭A‬‭fact constellation schema‬‭is a‬‭complex data warehouse‬‭schema‬‭that consists of‬‭multiple‬
‭fact tables‬‭sharing‬‭common dimension tables‬‭. It is‬‭also known as a‬‭galaxy schema‬‭because‬
‭it contains multiple‬‭star schemas‬‭connected together.‬

‭Characteristics‬


‭ ‬‭Multiple Fact Tables‬‭– Supports different business processes in a single schema.‬
‭✔‬‭Shared Dimension Tables‬‭– Dimensions are reused across multiple fact tables.‬
‭✔‬‭Flexible Data Representation‬‭– Suitable for complex analytical queries.‬
‭✔‬‭Supports Large-Scale Systems‬‭– Used in enterprise-level data warehouses.‬

‭Advantages‬


‭ ‬‭Efficient Data Organization‬‭– Multiple fact tables improve data segmentation.‬

‭ ‬‭Reduced Data Redundancy‬‭– Shared dimensions prevent‬‭duplication.‬

‭ ‬‭Comprehensive Analysis‬‭– Supports complex queries‬‭across multiple domains.‬

‭ ‬‭Scalable Design‬‭– Can handle large datasets effectively.‬

‭Disadvantages‬


‭ ‬‭Complex Design‬‭– More difficult to understand and‬‭manage.‬

‭ ‬‭Increased Query Complexity‬‭– More joins slow down‬‭query performance.‬

‭ ‬‭Higher Maintenance‬‭– Requires more effort to update‬‭and manage tables.‬

‭ ‬‭Storage Overhead‬‭– Large datasets need optimized indexing and storage.‬

‭Example‬

‭A‬‭Retail Business Data Warehouse‬‭using Fact Constellation Schema:‬

‭●‬ ‭Fact Tables:‬

‭○‬ ‭
Sales (Sale_ID, Date_ID, Product_ID, Customer_ID, Revenue,‬
Quantity)‬

‭○‬ ‭
Shipping (Shipping_ID, Date_ID, Customer_ID, Delivery_Time,‬
Cost)‬

‭●‬ ‭Shared Dimension Tables:‬

‭○‬ ‭
Date (Date_ID, Year, Month, Day)‬
‭○‬ ‭
Customer (Customer_ID, Name, Region, City)‬
‭○‬ ‭
Product (Product_ID, Category, Brand, Price)‬

‭ his structure allows analyzing both‬‭sales and shipping‬‭data‬‭using shared‬‭date, customer,‬


T
‭and product dimensions‬‭.‬

‭4. Components of Data Warehouse Architecture:-‬

‭1.‬ ‭External Sources‬

‭ ‬ ‭Data originates from databases, XML, JSON, emails, spreadsheets, etc.‬



‭○‬ ‭Contains structured, semi-structured, and unstructured data.‬
‭2.‬ ‭Staging Area‬

‭‬ T
○ ‭ emporary storage for raw data before loading into the warehouse.‬
‭○‬ ‭Uses‬‭ETL (Extract, Transform, Load)‬‭for data processing.‬
‭■‬ ‭Extract (E):‬‭Pulls data from external sources.‬
‭■‬ ‭Transform (T):‬‭Converts data into a standard format.‬
‭■‬ ‭Load (L):‬‭Loads processed data into the data warehouse.‬
‭3.‬ D
‭ ata Warehouse‬

‭ ‬ ‭Centralized repository for structured, processed, and cleansed data.‬



‭○‬ ‭Stores metadata (data about data) and raw data.‬
‭○‬ ‭Serves as a foundation for reporting, analysis, and decision-making.‬
‭4.‬ D
‭ ata Marts‬

‭○‬ S ‭ ubset of a data warehouse focused on specific business areas (Sales, HR,‬
‭Marketing).‬
‭○‬ ‭Enhances quick and efficient data retrieval for departments.‬
‭○‬ ‭Can be‬‭dependent (from warehouse)‬‭or‬‭independent (separate‬‭source)‬‭.‬
‭5.‬ D
‭ ata Mining‬

‭‬ A
○ ‭ nalyzing large datasets to uncover patterns, trends, and insights.‬
‭○‬ ‭Helps in business intelligence, fraud detection, and predictive analytics.‬
‭○‬ ‭Uses AI, machine learning, and statistical techniques for analysis.‬

‭Difference Between Components‬


‭Three-Tier Data Warehouse Architecture:-‬

‭1. Bottom Tier (Data Sources & Storage)‬

‭‬
● ‭ oundation layer‬‭where data is collected and stored.‬
F
‭●‬ ‭Uses‬‭RDBMS‬‭or‬‭multidimensional databases‬‭for structured‬‭storage.‬
‭●‬ ‭ETL Process:‬‭Extracts, Transforms, and Loads data‬‭into a query-friendly format.‬
‭●‬ ‭Common ETL Tools:‬‭IBM Infosphere, Informatica, Microsoft‬‭SSIS, SnapLogic,‬
‭Confluent.‬
‭Challenges & Solutions:‬

‭‬ D
● ‭ ata Quality Issues →‬‭Use robust ETL tools.‬
‭●‬ ‭Data Compatibility Issues →‬‭Standardize data formats.‬
‭●‬ ‭Scalability →‬‭Design expandable storage solutions.‬

‭2. Middle Tier (OLAP Engine)‬

‭‬ P
● ‭ rocesses and manages‬‭complex analytical queries.‬
‭●‬ ‭OLAP (Online Analytical Processing) Models:‬
‭○‬ ‭ROLAP:‬‭Uses relational databases for large data volumes.‬
‭○‬ ‭MOLAP:‬‭Uses multidimensional cubes for faster queries.‬
‭○‬ ‭HOLAP:‬‭Combines ROLAP & MOLAP for flexibility.‬

‭Challenges & Solutions:‬

‭‬ D
● ‭ ata Latency →‬‭Use real-time processing & incremental‬‭loading.‬
‭●‬ ‭Slow Query Performance →‬‭Optimize indexing & partitioning.‬
‭●‬ ‭Data Integration Issues →‬‭Use advanced integration tools like Talend, Informatica.‬

‭3. Top Tier (Front-End BI Tools)‬

‭‬ U
● ‭ ser-facing layer‬‭for reporting, visualization, and‬‭decision-making.‬
‭●‬ ‭Popular BI Tools:‬‭IBM Cognos, Microsoft BI, SAP BW,‬‭Crystal Reports, SAS BI,‬
‭Pentaho.‬
‭●‬ ‭Presents‬‭data insights via dashboards, graphs, charts,‬‭and reports.‬

‭Challenges & Solutions:‬

‭ ‬ ‭Complex UI →‬‭Provide user training and support.‬



‭●‬ ‭Integration Issues →‬‭Choose tools compatible with warehouse systems.‬

‭5. OLTP and OLAP‬

‭●‬ O ‭ LTP (Online Transaction Processing):‬‭Manages real-time‬‭transactional data,‬


‭ensuring fast and efficient data processing for daily business operations.‬
‭●‬ ‭OLAP (Online Analytical Processing):‬‭Supports complex‬‭queries and data analysis,‬
‭helping in decision-making and business intelligence.‬

‭Benefits & Drawbacks of OLAP and OLTP Services‬

‭✅ Benefits of OLAP Services:‬

‭‬ M
● ‭ aintains data consistency and performs complex calculations.‬
‭●‬ ‭Supports planning, analysis, and budgeting in one platform.‬
‭●‬ ‭Handles large datasets efficiently for enterprise applications.‬
‭‬ E
● ‭ nforces security restrictions for data protection.‬
‭●‬ ‭Provides a‬‭multidimensional‬‭data view for flexible‬‭analysis.‬

‭❌ Drawbacks of OLAP Services:‬

‭‬
● ‭ equires professionals due to complex data modeling.‬
R
‭●‬ ‭Expensive to implement and maintain for large datasets.‬
‭●‬ ‭Data analysis happens after extraction & transformation, causing delays.‬
‭●‬ ‭Not real-time; updated periodically, limiting decision-making efficiency.‬

‭✅ Benefits of OLTP Services:‬

‭‬
● ‭ llows fast‬‭read, write, update, and delete‬‭operations.‬
A
‭●‬ ‭Supports high transaction volumes with real-time access.‬
‭●‬ ‭Provides‬‭strong security‬‭for data protection.‬
‭●‬ ‭Helps in accurate decision-making with up-to-date data.‬
‭●‬ ‭Ensures‬‭data integrity, consistency, and high availability.‬

‭❌ Drawbacks of OLTP Services:‬

‭‬
● ‭ imited‬‭analytical capabilities‬‭, not suited for complex‬‭reporting.‬
L
‭●‬ ‭High‬‭maintenance costs‬‭due to frequent updates & backups.‬
‭●‬ ‭Prone to‬‭disruptions‬‭during hardware failures.‬
‭●‬ ‭May face issues like‬‭duplicate or inconsistent data.‬
‭OLTP‬

‭7.‬‭📌 Data Integration: Overview & Key Points‬

‭✅ What is Data Integration?‬

‭‬ T
● ‭ he process of combining data from multiple sources into a‬‭single, unified view‬‭.‬
‭●‬ ‭Ensures‬‭consistency, accuracy, and accessibility‬‭of data for analysis.‬
‭●‬ ‭Used in‬‭data warehousing, business intelligence, and analytics.‬

‭📌 Problems in Data Integration‬

‭ .‬ D
1 ‭ ata Inconsistency‬‭– Different sources may have conflicting‬‭information.‬
‭2.‬ ‭Data Redundancy‬‭– Duplicate data can lead to unnecessary‬‭storage and processing‬
‭costs.‬
‭3.‬ ‭Data Format Differences‬‭– Various data formats (structured,‬‭semi-structured,‬
‭unstructured) make integration complex.‬
‭4.‬ ‭Scalability Issues‬‭– Large datasets may slow down integration processes.‬
‭5.‬ ‭Security Risks‬‭– Integrating data from different systems may expose sensitive‬
‭information.‬

‭📌 Data Redundancy in Integration‬

‭‬ D
● ‭ efinition:‬‭When the same data is stored in multiple‬‭places, leading to inefficiency.‬
‭●‬ ‭Effects:‬
‭○‬ ‭Wastes‬‭storage space‬‭and increases costs.‬
‭○‬ ‭Leads to‬‭data inconsistency‬‭across systems.‬
‭○‬ ‭Causes‬‭performance issues‬‭in processing and querying.‬
‭●‬ ‭Solution:‬‭Use‬‭ETL (Extract, Transform, Load) tools‬‭and‬‭data normalization‬
‭techniques‬‭to remove redundancy.‬

‭📌 Correlation Analysis in Data Integration‬

‭‬ P
● ‭ urpose:‬‭Identifies relationships between data from different sources.‬
‭●‬ ‭Methods:‬
‭○‬ ‭Statistical Correlation‬‭– Measures how one data set‬‭is related to another.‬
‭○‬ ‭Pattern Recognition‬‭– Detects similarities and trends‬‭in data.‬
‭●‬ ‭Example:‬
‭○‬ ‭Sales & Marketing Integration‬‭– Analyzing customer‬‭purchase behavior and‬
‭marketing campaigns to find correlations.‬

‭📌 Example of Data Integration‬

‭E-commerce Business‬

‭●‬ P ‭ roblem:‬‭Customer data is stored in separate databases‬‭for orders, customer support,‬


‭and marketing.‬
‭●‬ ‭Solution:‬‭Data integration merges all customer records‬‭into a‬‭centralized data‬
‭warehouse.‬
‭●‬ ‭Benefit:‬‭Businesses gain a‬‭360-degree view‬‭of customer‬‭interactions, leading to better‬
‭decision-making and personalized marketing. 🚀‬

‭6.‬‭📌 Data Reduction‬‭in Data Mining (Short Points)‬

‭✅ What is Data Reduction?‬

‭‬ R
● ‭ educes data‬‭volume‬‭while preserving important information.‬
‭●‬ ‭Improves‬‭storage efficiency‬‭and‬‭processing speed‬‭in data mining.‬
‭●‬ ‭Ensures data integrity while reducing‬‭redundancy and complexity‬‭.‬

‭📌 Data Reduction Techniques‬

‭1️⃣ Dimensionality Reduction‬

‭‬ R
● ‭ emoves irrelevant or redundant attributes while keeping key features.‬
‭●‬ ‭Techniques:‬
‭○‬ ‭Wavelet Transform‬‭– Converts data into a compressed‬‭form.‬
‭○‬ ‭Principal Component Analysis (PCA)‬‭– Reduces dimensions‬‭while retaining‬
‭variability.‬
‭○‬ ‭Attribute Subset Selection‬‭– Keeps only the most useful‬‭attributes.‬

‭2️⃣ Numerosity Reduction‬

‭‬ R
● ‭ epresents data in a compact format to reduce storage needs.‬
‭●‬ ‭Types:‬
‭○‬ ‭Parametric‬‭– Uses models like regression, log-linear‬‭analysis.‬
‭○‬ ‭Non-Parametric‬‭– Uses histograms, clustering, sampling,‬‭data cube‬
‭aggregation.‬

‭3️⃣ Data Cube Aggregation‬

‭‬ A
● ‭ ggregates data into‬‭multi-dimensional cubes‬‭for summarization.‬
‭●‬ ‭Example:‬‭Quarterly sales → Annual sales.‬
‭4️⃣ Data Compression‬

‭‬ R
● ‭ educes file size by encoding or modifying data structure.‬
‭●‬ ‭Types:‬
‭○‬ ‭Lossless Compression‬‭– Restores original data (e.g.,‬‭Huffman Encoding).‬
‭○‬ ‭Lossy Compression‬‭– Reduces data with minor loss (e.g.,‬‭JPEG, MP3).‬

‭5️⃣ Discretization Operation‬

‭‬ C
● ‭ onverts continuous data into‬‭small intervals‬‭for‬‭easier processing.‬
‭●‬ ‭Types:‬
‭○‬ ‭Top-Down (Splitting)‬‭– Starts with large intervals and divides further.‬
‭○‬ ‭Bottom-Up (Merging)‬‭– Starts with small intervals and combines similar ones.‬

‭📌 Benefits of Data Reduction‬


‭ Saves‬‭storage space‬‭and‬‭reduces costs‬‭.‬

‭ Improves‬‭processing speed‬‭in data mining.‬

‭ Helps in‬‭energy conservation‬‭.‬

‭ Reduces‬‭hardware requirements‬‭in data centers.‬

You might also like