Big Data Analytics(BDA)
GTU #3170722
Unit-2
NoSQL
Looping
Outline
• What is NoSQL?
• Used of NoSQL
• Types of NoSQL DB
• Key-value Oriented
• Graph Oriented
• Column Oriented
• Document Oriented
• Why NoSQL
• Advantages and Features of NoSQL
• Use of NoSQL in Industry
• SQL vs NoSQL
• NewSQL
What is NoSQL?
What is NoSQL?
NoSQL (commonly known as "Not Just SQL") represents a completely
different database framework that can achieve high-performance and
agile processing of large-scale information.
In other words, it is a database infrastructure, very suitable for the huge
needs of big data.
The efficiency of NoSQL can be achieved because, unlike highly structured
relational databases, NoSQL databases are inherently unstructured, which
makes up for the strict consistency requirements for speed and agility.
NoSQL focuses on the concept of distributed databases, where
unstructured data can be stored on multiple processing nodes, and usually
on multiple servers.
This distributed architecture allows NoSQL databases to scale horizontally;
as the data continues to grow, just add more hardware to keep up without
reducing performance.
NoSQL Distributed Database Infrastructure has always been a solution for
handling some of the largest data warehouses on the planet, such as
Where is NoSQL used?
NoSQL databases are widely used in big data and other real-time web
applications.
NoSQL databases is used to stock log data which can then be pulled for
analysis. Likewise, it is used to store social media data and all such data
which cannot be stored and analyzed comfortably in RDBMS.
Non-relational data
storage systems
Log Analysis No fixed table schema
Where to used Social Networking NoSQL No joins
NoSQL? Feeds
No multi-document
Time based data transactions
Relaxes one or more ACID
properties
Type of NoSQL
Types of NoSQL
Traditional RDBMS uses SQL syntax to store and retrieve data from SQL
databases.
They all use a data model that has a different structure than the
traditional row-and-column table model used with relational database
management systems (RDBMSs).
Instead, a NoSQL database system encompasses a wide range of
database technologies that can store structured, semi-structured,
unstructured and polymorphic data.
1. Key-Value Pair Oriented
2. Document Oriented
3. Column Oriented
4. Graph Oriented
Key-Value Pair Oriented
Key-value Stores are the simplest type of NoSQL database.
Data is stored in key/value pairs.
It uses keys and values to store the data. The attribute name is stored in
‘key’, whereas the values corresponding to that key will be held in ‘value’.
In Key-value store databases, the key can only be string, whereas the
value can store string, JSON, XML, Blob(Binary Large Object), etc. Due to
its behavior, it is capable of handling massive data and loads.
The use case of key-value stores mainly stores user preferences, user
profiles, shopping carts, etc.
Key Value
First Name Rahul
Last Name Patel
DynamoDB, Riak, Redis are a few famous examples of Key-value store
NoSQL databases.
Document Oriented
Document Databases use key-value pairs to store and retrieve data from
the documents.
A document is stored in the form of XML and JSON (JavaScript Object
Notation).
Data is stored as a value. Its associated key is the unique identifier for
that value.
The difference is that, in a document database, the value contains
structured or semi-structured data.
This structured/semi-structured value is referred to as a document and
can be in XML, JSON or
Column-Oriented
Column-oriented databases work on
columns and are based on BigTable paper by
Google.
Every column is treated separately. Values of
single column databases are stored
contiguously.
They deliver high performance on
aggregation queries like SUM, COUNT, AVG,
MIN etc. as the data is readily available in a
column.
Column-based NoSQL databases are widely
used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, Hypertable are
NoSQL query examples of column
based database.
Graph Oriented
Graph databases form and store the
relationship of the data.
Each element/data is stored in a node, and
that node is linked to another data/element.
A typical example for Graph database use
cases is Facebook.
It holds the relationship between each user
and their further connections.
Graph databases help search the connections
between data elements and link one part to
various parts directly or indirectly.
The Graph database can be used in social
media, fraud detection, and knowledge graphs.
Examples of Graph Databases are – Neo4J,
Infinite Graph, OrientDB, FlockDB, etc.
Why NoSQL?
In recent times you can easily capture and access data from various
sources, like Facebook, Google, etc.
User’s personal information, geographic location data, user generated
content, social graphs and machine logging data are some of the
examples where data is increasing rapidly.
To use above mentioned properties, it is necessary to process large
volume of data.
For which relational databases are not suitable. The evolution of NoSQL
databases is to handle this large volume of data properly.
Why NoSQL?
NoSQL database is optimum for processing massive volume data with
distributed processing.
NoSQL database supports failover mechanisms and ensures high
availability.
NoSQL database provides easy replication along with horizontally scalable
capability.
NoSQL database is capable of handling structured, semi-structured, and
unstructured data.
NoSQL databases can be installed on commodity hardware and can form
clusters for distributed processing.
NoSQL database offers flexible schema and can be changed at runtime
without service downtime.
Features of NoSQL
Few features of NoSQL databases are as follows:
1. They are open source.
2. They are non-relational.
3. They are distributed.
4. They are schema-less.
5. They are cluster friendly.
6. They are born out of 21st century web applications.
Advantages of NoSQL
Dynamic Schemas
NoSQL databases have good tendency to grow dynamically with changing
requirements.
It can handle structured, semi-structured and unstructured data.
Scalability
Scalability is the measure of a system's ability to increase or decrease in
performance and cost in response to changes in application and system processing
demands.
NoSQL databases support horizontal scaling methodology that makes it easy to add
or reduce capacity quickly without tinkering with commodity hardware.
This eliminates the tremendous cost and complexity of manual sharing that is
necessary when attempting to scale RDBMS.
Sharding
Distributes a single logical database system across a cluster of machines
Uses range-based partitioning to distribute documents based on a specific shard key
Advantages of NoSQL Database – Cont.
Performance
With a NoSQL database, you can increase performance by simply adding
cheaper servers, called commodity servers.
This helps organizations to continue to deliver reliably fast user experiences
with a predictable return on investment for adding resources again, without the
overhead associated with manual sharing.
High Availability
NoSQL databases are generally designed to ensure high availability and avoid the
complexity that comes with a typical RDBMS architecture, which relies on primary
and secondary nodes.
Some ‘distributed’ NoSQL databases use a master less architecture that
automatically distributes data equally among multiple resources so that the
application remains available for both read and write operations, even when one
node fails.
Replication
Auto data replication is also supported in NoSQL databases by default.
Hence, if one DB server goes down, data is restored using its copy created on
another server in network.
Use of NoSQL in industry
Session Store
Managing session data using relational database is very difficult, especially in
case where applications are grown very much.
In such cases the right approach is to use a global session store, which
manages session information for every user who visits the site.
NOSQL is suitable for storing such web application session information very is
large in size.
Since the session data is unstructured in form, so it is easy to store it in
schema less documents rather than in relation database record.
User Profile Store
To enable online transactions, user preferences, authentication of user and
more, it is required to store the user profile by web and mobile application.
In recent time users of web and mobile application are grown very rapidly. The
relational database could not handle such large volume of user profile data
which growing rapidly, as it is limited to single server.
Using NOSQL capacity can be easily increased by adding server, which makes
Use of NoSQL in industry (Cont.)
Content and Metadata Store
Many companies like publication houses require a place where they can store
large amount of data, which include articles, digital content and e-books, in order
to merge various tools for learning in single platform.
The applications which are content based, for such application metadata is very
frequently accessed data which need less response times.
For building applications based on content, use of NoSQL provide flexibility in
faster access to data and to store different types of contents.
Mobile Applications
Since the smart phone users are increasing very rapidly, mobile applications face
problems related to growth and volume.
Using NoSQL database mobile application development can be started with small
size and can be easily expanded as the number of user increases, which is very
difficult if you consider relational databases.
Since NoSQL database store the data in schema-less for the application
developer can update the apps without having to do major modification in
database.
Use of NoSQL in industry (Cont.)
Third-Party Data Aggregation
Frequently a business require to access data produced by third party. For
instance, a consumer packaged goods company may require to get sales data
from stores as well as shopper’s purchase history.
In such scenarios, NoSQL databases are suitable, since NoSQL databases can
manage huge amount of data which is generating at high speed from various data
sources.
Internet of Things
Today, billions of devices are connected to internet, such as smart phones, tablets,
home appliances, systems installed in hospitals, cars and warehouses. For such
devices large volume and variety of data is generated and keep on generating.
Relational databases are unable to store such data. The NOSQL permits
organizations to expand concurrent access to data from billions of devices and
systems which are connected, store huge amount of data and meet the required
performance.
Use of NoSQL in industry (Cont.)
Social Gaming
Data-intensive applications such as social games which can grow users to
millions. Such a growth in number of users as well as amount of data
requires a database system which can store such data and can be scaled to
incorporate number of growing users NOSQL is suitable for such applications.
NOSQL has been used by some of the mobile gaming companies like,
electronic arts, zynga and tencent.
Ad Targeting
Displaying ads or offers on the current web page is a decision with direct
income to determine what group of users to target, on web page
where to display ads, the platforms gathers behavioral and demographic
characteristics of users.
A NoSQL database enables ad companies to track user details and also
place the very quickly and increases the probability of clicks.
AOL, Mediamind and PayPal are some of the ad targeting companies
SQL Vs. NoSQL
SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or key–value
pairs databases
Vertically scalable (by increasing system Horizontally scalable (by creating a cluster of commodity machines)
resources)
Uses SQL Uses UnQL (Unstructured Query Language)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key–value pair of
storing data similar to JSON (Java Script Object Notation)
Excellent support from vendors Relies heavily on community support
Supports complex querying and data Does not have good support for complex querying
keeping needs
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), some others can be
configured for eventual consistency (e.g., Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL, Examples: MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB,
PostgreSQL, etc. Couchbase, Riak, etc.
NewSQL
The term NewSQL was coined by 451 Research analyst Matt Aslett in 2011,
and has been adopted by a number of vendors who do not fit into the traditional
RDBMS mold, yet also diverge from the NoSQL movement.
NewSQL systems are designed to operate in distributed clusters like NoSQL
databases, but they present a relational model and support SQL query semantics.
NewSQL is a class of RDBMS designed to provide the scalability of NoSQL
systems for online transaction processing (OLTP) read-write workloads while
maintaining the ACID guarantees (Atomicity, Consistency, Isolation,
Durability) of traditional database systems.
It delivers a modern solution where businesses can handle large volumes of
data in real-time, without having to sacrifice consistency or reliability.
NewSQL is a modern relational database system that bridges the gap between
SQL and NoSQL.
NewSQL databases aim to scale and stay consistent.
NoSQL databases scale while standard SQL databases are consistent.
NewSQL attempts to produce both features and find a middle ground.
NewSQL
Functionality and Features
Full support for SQL: The ability to process complex SQL queries, a feature that's
not typically available in NoSQL databases.
Scalability: The ability to scale horizontally across multiple nodes for high-
volume traffic while maintaining high transaction rates.
Fault Tolerance: The inclusion of mechanisms to prevent data loss in case of
system failure.
Distributed Transactions: Support for ACID transactions across distributed
databases.
Built-In Machine Learning: Some NewSQL databases incorporate machine
learning functionality directly into the database.
Advantages of NewSQL :
It introduces new implementation to traditional relational databases.
It brings together the advantages of SQL and NoSQL.
It is easy to migrate between the type and needs of the user.
Disadvantages of NewSQL :
They offer partial access to rich traditional SQL systems.
It may cause a problem in-memory architecture for exceeding volumes of data.
Difference between NoSQL vs NewSQL
NoSQL NewSQL
NewSQL is schema-fixed as well as a
NoSQL is a schema-free database.
schema-free database.
It is horizontally scalable. It is horizontally scalable.
It possesses automatically high-availability. It possesses built-in high availability.
It supports cloud, on-disk, and cache It fully supports cloud, on-disk, and cache
storage. storage.
Online Transactional Processing is not Online Transactional Processing is fully
supported. supported.
There are low-security concerns. There are moderate security concerns.
Use Cases: Big Data, Social Network Use Cases: E-Commerce, Telecom industry,
Applications, and IOT. and Gaming.
Examples : VoltDB, CockroachDB, NuoDB,
Examples : DynamoDB, MongoDB,
Difference between SQL, NoSQL vs NewSQL
Feature SQL NoSQL NewSQL
Schema Relational (table) Schema-free Both
Yes, with enhanced
SQL Yes Depends on the system
features
ACID Yes No (BASE) Yes
OLTP Partial support Not supported Full support
Scaling Vertical Horizontal Horizontal
Distributed No Yes Yes
High
Custom Auto Built-in
availability
Low complexity
Queries High complexity queries Both
queries