0% found this document useful (0 votes)
19 views21 pages

NoSQL Data Modeling Techniques

The document discusses various distribution models for NoSQL databases, emphasizing the benefits and complexities of scaling out using clusters versus single-server setups. It covers techniques such as sharding, master-slave replication, and peer-to-peer replication, highlighting their advantages for read and write scalability. Additionally, it explains key-value stores, their features, suitable use cases, and scenarios where they may not be the best choice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

NoSQL Data Modeling Techniques

The document discusses various distribution models for NoSQL databases, emphasizing the benefits and complexities of scaling out using clusters versus single-server setups. It covers techniques such as sharding, master-slave replication, and peer-to-peer replication, highlighting their advantages for read and write scalability. Additionally, it explains key-value stores, their features, suitable use cases, and scenarios where they may not be the best choice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Modelling with NoSQL

databases
Module-2
By
Dr Shivakumar C
Distribution Models
• The primary driver of interest in NoSQL has been its ability to run databases
on a large cluster.
• As data volumes increase, it becomes more difficult and expensive to scale
up buy a bigger server to run the database on.
• A more appealing option is to scale out run the database on a cluster of
servers.
• Aggregate orientation fits well with scaling out because the aggregate is a
natural unit to use for distribution.
Distribution Models
• Depending on your distribution model, you can get a data store that will
give you the ability to handle larger quantities of data, the ability to process
a greater read or write traffic, or more availability in the face of network
slowdowns or breakages.
• These are often important benefits, but they come at a cost.
• Running over a cluster introduces complexity so it’s not something to do
unless the benefits are compelling.
Single Server
• The first and the simplest distribution option is the one we would most often
recommend no distribution at all.
• Run the database on a single machine that handles all the reads and
writes to the data store.
• We prefer this option because it eliminates all the complexities that the other
options introduce;
• it’s easy for operations people to manage and easy for application developers
to reason about.
Single Server
• We can use NoSQL with a single-server distribution model if the data model
of the NoSQL store is more suited to the application.
• Graph databases are the obvious category here these work best in a
single-server configuration.
• If your data usage is mostly about processing aggregates, then a
single-server document or key-value store may well be worthwhile
because it’s easier on application developers.
Sharding
• Often, a busy data store is busy
because different people are
accessing different parts of the
dataset.
• In these circumstances we can
support horizontal scalability by
putting different parts of the
data onto different servers a
technique that’s called sharding.
Sharding
• We have different users all talking to different server nodes. Each user only has to talk to one
server, so gets rapid responses from that server.
• The load is balanced out nicely between servers.
• In order to get close to it we have to ensure that data that’s accessed together is clumped together
on the same node and that these clumps are arranged on the nodes to provide the best data access.
• Rebalancing the sharding means changing the application code and migrating the data.
• Many NoSQL databases offer auto-sharding, where the database takes on the responsibility of
allocating data to shards and ensuring that data access goes to the right shard.
• This can make it much easier to use sharding in an application.
Master-Slave Replication
• With master-slave distribution, you
replicate data across multiple nodes. One
node is designated as the master, or
primary.
• This master is the authoritative source for
the data and is usually responsible for
processing any updates to that data.
• The other nodes are slaves, or
secondaries.
• A replication process synchronizes the
slaves with the master
Master-Slave Replication
• Master-slave replication is most helpful for scaling when you have a
read-intensive dataset.
• A second advantage of master-slave replication is read resilience, Should
the master fail, the slaves can still handle read requests.
• Again, this is useful if most of your data access is reads. The failure of the
master does eliminate the ability to handle writes until either the master is
restored or a new master is appointed.
Peer-to-Peer Replication
• Master-slave replication helps with read scalability
but doesn’t help with scalability of writes.
• It provides resilience against failure of a slave, but
not of a master.
• Essentially, the master is still a bottleneck and a
single point of failure.
• Peer-to-peer replication attacks these problems by
not having a master.
• All the replicas have equal weight, they can all
accept writes, and the loss of any of them doesn’t
prevent access to the data store.
Peer-to-Peer Replication
• With a peer-to-peer replication cluster, you can ride over node failures without
losing access to data.
• We can easily add nodes to improve your performance.
• The biggest complication is, again, consistency.
• When you can write to two different places, you run the risk that two people will
attempt to update the same record at the same time a write-write conflict.
• Inconsistencies on read lead to problems but at least they are relatively transient.
Inconsistent writes are forever.
Combining Sharding and Replication
• Replication and sharding are strategies
that can be combined.
• If we use both master-slave replication
and sharding this means that we have
multiple masters, but each data item only
has a single master.
• Depending on your configuration, you
may choose a node to be a master for
some data and slaves for others, or you
may dedicate nodes for master or slave
duties
Combining Sharding and Replication
• Using peer-to-peer replication and
sharding is a common strategy for
column-family databases.
• In a scenario like this you might have tens
or hundreds of nodes in a cluster with
data sharded over them.
• A good starting point for peer-to-peer
replication is to have a replication factor
of 3, so each shard is present on three
nodes. Should a node fail, then the shards
on that node will be built on the other
nodes
Key-Value Databases
• A key-value store is a simple hash table, primarily used when all access to
the database is via primary key.
• Ex: Think of a table in a traditional RDBMS with two columns, such as ID
and NAME, the ID column being the key and NAME column storing the
value. In an RDBMS, the NAME column is restricted to storing data of type
String.
What is a Key-Value Store
• Key-value stores are the simplest NoSQL data stores to use from an API
perspective.
• The client can either get the value for the key, put a value for a key, or
delete a key from the data store.
• Since key-value stores always use primary-key access, they generally have
great performance and can be easily scaled.
• Popular key-value databases are Riak , Redis, Hamster DB, Berkeley DB
, Amazon Dynamo DB
…..
• In some key-value stores, such as Redis, the aggregate being stored does
not have to be a domain object, it could be any data structure.
• Redis supports storing lists, sets, hashes and can do range, diff, union,
and intersection operations.
• These features allow Redis to be used in more different ways than a standard
key-value store.
Key-Value Store Features
• Consistency: Consistency is applicable only for operations on a single
key, since these operations are either a get, put, or delete on a single key.
• Transactions: Different products of the key-value store kind have different
specifications of transactions. Generally speaking, there are no guarantees on
the writes. Many data stores do implement transactions in different ways.
• Query Features: All key-value stores can query by the key and that’s about
it. If you have requirements to query by using some attribute of the value
column, it’s not possible to use the database.
Key-Value Store Features
• Structure of Data: Key-value databases don’t care what is stored in the
value part of the key-value pair. The value can be a blob, text, JSON, XML,
and so on.
• Scaling: Many key-value stores scale by using sharding. With sharding, the
value of the key determines on which node the key is stored.

• Note: BLOB-Binary Large Object are complex files such as images, video,
and audio.
Suitable Use Cases
• Storing Session Information
• User Profiles, Preferences
• Shopping Cart Data
When Not to Use
• Relationships among Data: If you need to have relationships between
different sets of data, or correlate the data between different sets of keys,
key-value stores are not the best solution to use, even though some key-value
stores provide link-walking features.
• Multioperation Transactions: If you’re saving multiple keys and there is a
failure to save any one of them, and you want to revert or roll back the rest
of the operations, key-value stores are not the best solution to be used.
When Not to Use
• Query by Data: If you need to search the keys based on something found in
the value part of the key-value pairs, then key-value stores are not going to
perform well for you.
• There is no way to inspect the value on the database side, with the exception
of some products like Riak Search or indexing engines like Lucene or Solr.
• Operations by Sets: Since operations are limited to one key at a time,
there is no way to operate upon multiple keys at the same time. If you need
to operate upon multiple keys, you have to handle this from the client side.

You might also like