Understanding Aggregate Data Models

The document discusses aggregate data models and how they differ from relational data models. Some key points: 1. Aggregate data models group related data elements into complex records called aggregates, allowing for nested data structures like lists and maps. 2. This differs from relational models which store data in normalized tables without nested structures. 3. Aggregate models match how NoSQL databases like key-value and document stores work better than relational models by allowing application developers to work with and manipulate data at the aggregate level.

Uploaded by

chitraalavani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

5K views55 pages

Understanding Aggregate Data Models

Uploaded by

chitraalavani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Aggregate Data Models

Data Model
• A data model is a representation that we use
to perceive and manipulate our data.
• It allows us to:
– Represent the data elements under analysis, and
– How these are related to each others
• This representation depends on our
perception.
Data Model: Database View
• In the database field, it describes how we
interact with the data in the database.
• This is distinct from the storage model:
– It describes how the database stores and
manipulate the data internally.
• In an ideal worlds:
– We should be ignorant of the storage model, but
– In practice we need at least some insight to
achieve a decent performance
Data Models: Example
• A Data model is the model of the specific data
in an application
• A developer might point to an entity-
relationship diagram and refer it as the data
model containing
– customers,
– orders and
– products
Data Model: Definition
• In this course we will refer “data
model” as the model by which the
database organize data.
• It can be more formally defined as
meta-model
Last Decades Data Model
• The dominant data model of the last decades
what the relational data model.
1. It can be represented as a set of tables.
2. Each table has rows, with each row
representing some entity of interest.
3. We describe entities through columns
4. A column may refer to another row in the
same or different table (relationship).
NoSQL Data Model
• It moves away from the relational data model
• Each NoSQL database has a different model
– Key-value,
– Document,
– Column-family,
– Graph, and
– Sparse (Index based)
• Of these, the first three share a common
characteristic (Aggregate Orientation).
Relational Model
vs
Aggregate Model
Relational Model
• The relational model takes the information that
we want to store and divides it into tuples (rows).
• However, a tuple is a limited data structure.
• It captures a set of values.
• So, we can’t nest one tuple within another to get
nested records.
• Nor we can put a list of values or tuple within
another.
Relational Model
• This simplicity characterize the relational
model
• It allows us to think on data manipulation as
operation that have:
– As input tuples, and
– Return tuples
• Aggregate orientation takes a different
approach.
Aggregate Model
• It recognizes that, you want to operate on data unit
having a more complex structure than a set of
tuples.
• We can think on term of complex record that allows:
– List,
– Map,
– And other data structures to be nested inside it
• Key-Value, document, and column-family databases
uses this complex structure.
Aggregate Model
• Aggregate is a term coming from Domain-
Driven Design [Evans03]
– An aggregate is a collection of related objects that
we wish to treat as a unit. It is a unit for data
manipulation and management for consistency.
• We like to update aggregates with atomic
operation
• We like to communicate with our data storage
in terms of aggregates
Aggregate Models
• This definition matches really with how key-value,
document, and column-family databases works.
• With aggregates it is easier to work on a cluster,
since they are unit for replication and sharding.
• Aggregates are also easier for application
programmer to work since it solve the impedance
mismatch problem of relational databases.
Example of Relational Model
• Assume we are
building an e-
commerce website;
• We have to store
information about:
users, products,
orders, shipping
addresses, billing
addresses, and
payment data.
Example of Relational Model
• As we are good
relational soldier:
– Everything is
normalized
– No data is
repeated in
multiple tables.
– We have referential
integrity
Example of Relational Model
Example of Aggregate Model
• We have two aggregates: Customers and Orders
• We use the black diamond composition to show
how data fits into the aggregate structure

A possible aggregation
Example of Aggregate Model
• The customer contains a list of billing addresses;
• The order contains a list of: order items, a shipping address, and
payments
• The payment itself contains a billing address for that payment
Example of Aggregate Model
• A single address appears 3 times, but instead of using an id it is copied each time
• This fits a domain where we don’t want shipping, payment and billing address to
change
• What is the difference w.r.t a relational representation?
Example of Aggregate Model
• The link between customer and the order is a
relationship between aggregates
Example of Aggregate Model
• Link from an order item would cross into a separate
aggregate structure for product (not considered
here)
• This is kind of denormalization – similar to tradeoff
with relational database, but is more common with
aggregate because we want to minimize the
number of aggregates we access.
Example of Aggregate Model
• We aggregate to minimize the number of
aggregates we access during data interaction
• •The important think to notice is that,
– We have to think about accessing that data
– We make this part of our thinking when developing the
application data model
• We could draw our aggregate differently, but it
really depends on the “data accessing models”.
• No universal answer for how to draw aggregate boundaries
• It depends entirely on how you tend to manipulate data!
– Accesses on a single order at a time: first solution
– Accesses on customers with all orders: second solution
• Context-specific
– some applications will prefer one or the other
– even within a single system
• Focus on the unit of interaction with the data storage
• Pros:
– it helps greatly with running on a cluster: data will be manipulated
together, and thus should live on the same node!
• Cons:
– an aggregate structure may help with some data interactions but be
an obstacle for others.
Consider a Student information system consisting of 3 entities namely,
Student_info, Course_info, and Marksheet.
Following are the frequent queries in the workload:
1. List the details of students admitted to ‘[Link]’ course.
2. List the details of students staying in ‘Kothrud’ area and studying in
‘[Link]’
3. Find the maximum score value for ‘Databases’ subject
4. List the number of students failing in the subject ‘Computer networks’
(marks < 40)

Given the above workload, derive an aggregate boundary, for aggregating the
three entities. Justify your answer.
Consequences of Aggregate Models
No Distributable Storage
• Relational mapping can captures data elements
and their relationship well.
• It does not need any notion of aggregate entity,
because it uses foreign key relationship.
• But we cannot distinguish for a relationship that
represent aggregations from those that don’t.
• As result we cannot take advantage of that
knowledge to store and distribute our data.
Marking Aggregate Tools
• Many data modeling techniques provides way to
mark aggregate structures in relational models
• However, they do not provide semantic that
helps in distinguish relationships
• When working with aggregate-oriented
databases, we have a clear view of the semantic
of the data.
• We can focus on the unit of interaction with the
data storage.
Aggregate Ignorant
• Relational database are aggregate-ignorant,
since they don’t have concept of aggregate
• Also graph database are aggregate-ignorant.
• This is not always bad.
• In domains where it is difficult to draw
aggregate boundaries aggregate-ignorant
databases are useful.
Aggregate and Operations
• An order is a good aggregate when:
– A customer is making and reviewing an order, and
– When the retailer is processing orders
• However, when the retailer want to analyze its
product sales over the last months, then
aggregate are trouble.
• We need to analyze each aggregate to extract
sales history.
Aggregate and Operations
• Aggregate may help in some operation and not in
• others.
• In cases where there is not a clear view aggregate-
ignorant database are the best option.
• But, remember the point that drove us to
aggregate models (cluster distribution).
• Running databases on a cluster is need when
dealing with huge quantities of data.
Running on a Cluster
• It gives several advantages on computation
power and data distribution
• However, it requires to minimize the number of
nodes to query when gathering data
• By explicitly including aggregates, we give the
database an important view of which
information should be stored together
• But, still we have the problem on querying
historical data
Aggregates and Transactions
ACID transactions
• Relational database allow us to manipulate any
combination of rows from any table in a single
transaction.
• ACID transactions:
– Atomic,
– Consistent,
– Isolated, and
– Durable
have the main point in Atomicity.
Atomicity & RDBMS
• Many rows spanning many tables are updated
into an Atomic operation
• It may succeeded or failed entirely
• Concurrently operations are isolated and we
cannot see partial updates
• However relational database still fail.
Atomicity & NoSQL
• NoSQL don’t support Atomicity that spans
multiple aggregates.
• This means that if we need to update multiple
aggregates we have to manage that in the
application code.
• Thus the Atomicity is one of the consideration
for deciding how to divide up our data into
aggregates
Aggregates Models on NoSQL
Key-Value and Document
• Key-value and Document databases are strongly
aggregate-oriented.
• Both of these types of databases consists of lot of
aggregates with a key used to get the data.
• The two type of databases differ in that:
– In a key-value stores the aggregate is opaque (Blob)
– In a document database we can see a structure in the
aggregate.
Key-Value and Document
• The advantage of opacity is that we can store
whatever we like in the aggregate.
• The database may impose some size limit, but
we have freedom
• A document store imposes limits on what we
can place in it, defining a structure on the
data.
Key-Value and Document
• With a key-value we can only access by its key
• With document:
– We can submit queries based on fields,
– We can retrieve part of the aggregate, and
– The database can create index based on the fields
of the aggregate.
• But in practice they are used differently
Key-Value and Document
• In practice, the line between key-value and
document gets a bit blurry.
• An ID field is put in a document database to do a
key-value style lookup
• With key-value databases we expect aggregates
using a key
• With document databases, we mostly expect to
submit some form of query on the internal
structure of the documents.
Column-Family Stores
• One of the most influential NoSQL databases
was Google’s BigTable [Chang et al.]
• Its name derives from its structure composed
by sparse columns and no schema.
• We don’t have to think of this structure as a
table, but to a two-level map.
Column-Family Stores
• These BigTable-style data model are referred
to as column stores.
• Pre-NoSQL column stores like C-Store used
SQL and the relational model.
• What make NoSQL columns store different is
how physically they store data.
• Most databases has rows as unit of storage,
which helps in writing performances
Column-Family Stores
• However, there are many scenarios where:
– Write are rares, but
– You need to read a few columns of many rows at
once
• In this situations, it’s better to store groups of
columns for all rows as the basic storage unit.
• These kind of databases are called column
stores or column-family databases
Column-Family Stores
• Column-family databases have a two-level aggregate
structure.
• Similarly to key-value the first key is the row
identifier.
• The difference is that retrieving a key return a Map
of more detailed values.
• These second-level values are defined to as columns.
• Fixing a row we can access to all the column-families
or to a particular element.
Example of Column Model
Column-Family Stores
• They organize their columns into families.
• Each column is a part of a family, and column
family acts as unit of access.
• Then the data for a particular column family
are accessed together.
Column-Family Stores:
How to structure data
• In row-oriented:
– each row is an aggregate (For example the customer
with id 456),
– with column families representing useful chunks of
data (profile, order history) within that aggregate
• In column-oriented:
– each column family defines a record type (e.g.
customer profiles) with rows for each of the records.
– You can think of a row as the join of records in all
columnfamilies
Key Points
• An aggregate is a collection of data that we interact with as
a unit.
• Aggregates form the boundaries for ACID operations with
the database
• Key-value, document, and column-family databases can all
be seen as forms of aggregate-oriented database
• Aggregates make it easier for the database to manage data
storage over clusters
• Aggregate-oriented databases work best when most data
interaction is done with the same aggregate
• Aggregate-ignorant databases are better when interactions
use data organized in many different formations

Map-Reduce in Key-Value NoSQL Databases
100% (2)
Map-Reduce in Key-Value NoSQL Databases
67 pages
Big Data Distribution Models Explained
100% (1)
Big Data Distribution Models Explained
24 pages
RDBMS vs Hadoop: Key Differences
No ratings yet
RDBMS vs Hadoop: Key Differences
19 pages
Big Data Analytics: Stream Memory Notes
100% (2)
Big Data Analytics: Stream Memory Notes
27 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
RTAP Applications in Big Data Analytics
50% (2)
RTAP Applications in Big Data Analytics
11 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
Apache Spark: Advantages Over Hadoop
No ratings yet
Apache Spark: Advantages Over Hadoop
102 pages
DBMS Unit 3: Codd's Rules Explained
No ratings yet
DBMS Unit 3: Codd's Rules Explained
29 pages
Introduction to Big Data Concepts
100% (1)
Introduction to Big Data Concepts
15 pages
NoSQL Database Features and Queries
No ratings yet
NoSQL Database Features and Queries
8 pages
Web Application Development Lab Manual
80% (5)
Web Application Development Lab Manual
41 pages
Overview of NoSQL Database Types
No ratings yet
Overview of NoSQL Database Types
16 pages
DWDM Lab Manual for Data Mining
No ratings yet
DWDM Lab Manual for Data Mining
51 pages
EER to ODB Schema Mapping Guide
No ratings yet
EER to ODB Schema Mapping Guide
3 pages
Big Data Technologies Overview
No ratings yet
Big Data Technologies Overview
21 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
115 pages
Processing Big Data with RDBMS
100% (1)
Processing Big Data with RDBMS
42 pages
GFS vs HDFS in Big Data Context
0% (1)
GFS vs HDFS in Big Data Context
55 pages
Introduction to NoSQL Databases
100% (1)
Introduction to NoSQL Databases
20 pages
NoSQL Data Replication and Distribution Models
100% (1)
NoSQL Data Replication and Distribution Models
87 pages
Understanding Hadoop and HDFS Basics
No ratings yet
Understanding Hadoop and HDFS Basics
20 pages
Strategies for Distributed Database Design
100% (2)
Strategies for Distributed Database Design
22 pages
Assessing MapReduce Output Quality
No ratings yet
Assessing MapReduce Output Quality
41 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
19 pages
Properties of Relational Decomposition
No ratings yet
Properties of Relational Decomposition
3 pages
OWASP Top 10 Secure Coding Practices
No ratings yet
OWASP Top 10 Secure Coding Practices
13 pages
Unit Testing MapReduce with MRUnit
No ratings yet
Unit Testing MapReduce with MRUnit
31 pages
Coral Rings in File Organization
50% (4)
Coral Rings in File Organization
14 pages
NoSQL Databases vs. Relational Databases
0% (1)
NoSQL Databases vs. Relational Databases
15 pages
Evolution and Roots of Cloud Computing
No ratings yet
Evolution and Roots of Cloud Computing
26 pages
Database Normalization and Dependencies
100% (1)
Database Normalization and Dependencies
53 pages
Features of Persistent Programming Languages
100% (1)
Features of Persistent Programming Languages
2 pages
Hadoop: Big Data Processing Overview
No ratings yet
Hadoop: Big Data Processing Overview
15 pages
Data Warehousing and Mining Overview
94% (18)
Data Warehousing and Mining Overview
70 pages
CS3492 DBMS Overview and Key Concepts
100% (1)
CS3492 DBMS Overview and Key Concepts
38 pages
Predictive Analytics and Regression Techniques
No ratings yet
Predictive Analytics and Regression Techniques
14 pages
Core Business Drivers of NoSQL
No ratings yet
Core Business Drivers of NoSQL
9 pages
DBMS Unit 3: Normalization Concepts
100% (1)
DBMS Unit 3: Normalization Concepts
16 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
154 pages
Big Data Analytics Unit-2
100% (1)
Big Data Analytics Unit-2
30 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
19 pages
ROCK Clustering Algorithm Explained
100% (2)
ROCK Clustering Algorithm Explained
4 pages
Understanding Multi-Dimensional Data Models
100% (1)
Understanding Multi-Dimensional Data Models
4 pages
Database Hoarding in Mobile Computing
No ratings yet
Database Hoarding in Mobile Computing
20 pages
Big Data Analytics Frameworks and Tools
100% (1)
Big Data Analytics Frameworks and Tools
14 pages
Overview of SA/SD Methodology
No ratings yet
Overview of SA/SD Methodology
33 pages
Big Data Lab Manual: Spark & Hadoop
No ratings yet
Big Data Lab Manual: Spark & Hadoop
65 pages
Concurrency Control in DBMS
No ratings yet
Concurrency Control in DBMS
27 pages
Understanding Apache Pig Architecture
No ratings yet
Understanding Apache Pig Architecture
9 pages
DBMS Relational Design and Normalization
No ratings yet
DBMS Relational Design and Normalization
29 pages
Understanding MapReduce Counters in Hadoop
100% (1)
Understanding MapReduce Counters in Hadoop
31 pages
NoSQL Database Overview and Applications
No ratings yet
NoSQL Database Overview and Applications
64 pages
OOAD Lab Question Set 2013
100% (1)
OOAD Lab Question Set 2013
3 pages
File Organization Techniques in DBMS
No ratings yet
File Organization Techniques in DBMS
22 pages
Data Mining Task Primitives Explained
No ratings yet
Data Mining Task Primitives Explained
4 pages
Parallel Database Architecture Overview
No ratings yet
Parallel Database Architecture Overview
10 pages
Cloud Computing: Key Technologies & Concepts
No ratings yet
Cloud Computing: Key Technologies & Concepts
44 pages
ACID Properties in Transaction Management
100% (2)
ACID Properties in Transaction Management
85 pages
Understanding Aggregate Data Models
No ratings yet
Understanding Aggregate Data Models
16 pages
Understanding Relational Data Models
No ratings yet
Understanding Relational Data Models
22 pages
Microsoft Access Order Entry System Guide
No ratings yet
Microsoft Access Order Entry System Guide
21 pages
Database Management System Overview
No ratings yet
Database Management System Overview
16 pages
Data Science Internship Report on Road Accident Prediction
No ratings yet
Data Science Internship Report on Road Accident Prediction
39 pages
DBMS CIA 2 Question Bank Guide
No ratings yet
DBMS CIA 2 Question Bank Guide
2 pages
SQL/DS: IBM's First RDBMS Launch
No ratings yet
SQL/DS: IBM's First RDBMS Launch
3 pages
Advertisment Management System
No ratings yet
Advertisment Management System
61 pages
ER Diagram for Shopping Mall System
No ratings yet
ER Diagram for Shopping Mall System
13 pages
Dairy Product Management System Report
No ratings yet
Dairy Product Management System Report
68 pages
Infomat 76gim Dep
No ratings yet
Infomat 76gim Dep
470 pages
INFO 2312 Assignment 3 Overview
No ratings yet
INFO 2312 Assignment 3 Overview
3 pages
Data Management Course Overview
No ratings yet
Data Management Course Overview
2 pages
Database Management System Overview
No ratings yet
Database Management System Overview
76 pages
SQL Database Design and Visualization Notes
No ratings yet
SQL Database Design and Visualization Notes
39 pages
SQL Basics and Database Systems Guide
No ratings yet
SQL Basics and Database Systems Guide
18 pages
Auditing Data Structures in CIS
No ratings yet
Auditing Data Structures in CIS
13 pages
Data Handling in Python for Beginners
No ratings yet
Data Handling in Python for Beginners
113 pages
LIS 558 Diagnostic Quiz Overview
No ratings yet
LIS 558 Diagnostic Quiz Overview
8 pages
Relational Database Project Assignment
No ratings yet
Relational Database Project Assignment
2 pages
Online Food Ordering System Overview
No ratings yet
Online Food Ordering System Overview
25 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
11 pages
Advanced RDBMS Transactions and Concepts
No ratings yet
Advanced RDBMS Transactions and Concepts
40 pages
Understanding the Relational Model in DBMS
No ratings yet
Understanding the Relational Model in DBMS
16 pages
Convert E-R Diagrams to Relational Tables
No ratings yet
Convert E-R Diagrams to Relational Tables
8 pages
Data Science Foundations and Applications
No ratings yet
Data Science Foundations and Applications
115 pages
Discretionary Privileges in RDBMS
No ratings yet
Discretionary Privileges in RDBMS
2 pages
NoSQL Database Overview and Comparison
No ratings yet
NoSQL Database Overview and Comparison
36 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
24 pages
MySQL DDL Commands and Examples
No ratings yet
MySQL DDL Commands and Examples
2 pages
Informatics Practices: T C Xi
No ratings yet
Informatics Practices: T C Xi
10 pages

Understanding Aggregate Data Models

Uploaded by

Understanding Aggregate Data Models

Uploaded by

Aggregate Data Models

You might also like