0% found this document useful (0 votes)

16 views26 pages

Big Data Analytics Overview and Insights

The document provides an overview of Big Data Analytics, defining data and Big Data, and explaining its characteristics such as volume, velocity, variety, veracity, and value. It discusses the sources, types, challenges, and the differences between traditional business intelligence and Big Data environments. Additionally, it highlights the role of data warehouses and Hadoop in managing and processing large datasets.

Uploaded by

ajithabrahamben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views26 pages

Big Data Analytics Overview and Insights

Uploaded by

ajithabrahamben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

19CSE357 BIG DATA ANALYTICS

L-T-P-C: 3-0-0-3
What is DATA?

• Data is raw information—facts, figures, observations—that can be

collected, stored, and analyzed.
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing exponentially
with time. It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.

• Big Data is data that is too large, fast, or complex for traditional tools to handle.
• Analytics is the process of examining data to find patterns, trends, and insights.
Examples of Bigdata
Sources of Bigdata
These data come from many sources like

• Social networking sites: Facebook, Google, LinkedIn all these sites generates
huge amount of data on a day-to-day basis as they have billions of users
worldwide.
• E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of
logs from which users buying trends can be traced.
• Weather Station: All the weather station and satellite gives very huge data which
are stored and manipulated to forecast weather.
• Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
• Share Market: Stock exchange across the world generates huge amount of data
through its daily transaction.
History of Bigdata
Characteristics of Bigdata

Volume : Scale of Data

• The name ‘Big Data’ itself is related to a size which is enormous.

• Refers to the massive amount of data generated every second from sources like social
media, sensors, transactions, etc.

• If the volume of data is very large then it is actually considered as a ‘Big Data’. This
means whether a particular data can actually be considered as a Big Data or not, is
dependent upon the volume of data.

• Ex : Social media platforms like Facebook generate petabytes of data daily through
posts, comments, and multimedia uploads.
Characteristics of Bigdata
Velocity: Speed of Data generation
• The speed at which data is created, collected, and processed.

• In Big Data velocity data flows in from sources like machines, networks, social
media, mobile phones etc.

• There is a massive and continuous flow of data. This determines the potential of
data that how fast the data is generated and processed to meet the demands.

• Ex: There are more than 3.5 billion searches per day are made on Google. Also,
Facebook users are increasing by 22%(Approx.) year by year.
Characteristics of Bigdata
Variety: Different Types of Data
• Variety refers to the different forms and types of data that are generated.

• The arrival of data from new sources that are both inside and outside of an
enterprise.

• It can be structured, semi-structured and unstructured.

• Ex: Emails, social media posts, videos, and sensor data all represent different
forms of data collected by companies.
Characteristics of Bigdata
Veracity: The Trustworthiness of Data

• Veracity refers to the quality and reliability of the data

• It refers to inconsistencies and uncertainty in data, that is data which is

available can sometimes get messy and quality and accuracy are
difficult to control.

• Big Data is also variable because of the multitude of data dimensions

resulting from multiple disparate data types and sources.

• Ex: Customer feedback data may be filled with inconsistencies, biases, or

inaccuracies, requiring validation and cleansing to ensure reliability.
Characteristics of Bigdata
• Value: The Worth of Data

• After having the 4 V’s into account there comes one more V which stands for
Value!

• Value refers to the usefulness or business impact that can be derived from data.

• The bulk of Data having no Value is of no good to the company, unless you turn it
into something useful.

• Data in itself is of no use or importance but it needs to be converted into

something valuable to extract Information. Hence, you can state that Value! isthe
most important V of all the 5V’s.
Characteristics of Bigdata
V's Meaning Need
Volume Amount of data (terabytes, Helps us understand storage needs and
petabytes, etc.) scalable processing strategies (e.g., Hadoop,
Spark).
Velocity Speed of data generation and Enables real-time or near-real-time decision
processing making (e.g., fraud detection, social media
analytics).
Variety Different forms of data (text, images, Drives the need for flexible data models
video, logs) (NoSQL, document stores) and diverse
processing tools.
Veracity Accuracy and trustworthiness of Critical for making reliable, evidence-based
data decisions; affects data cleaning and
preprocessing stages.
Value Usefulness and business value of Justifies the entire analytics process; focuses
the data on extracting actionable insights that impact
outcomes.
Characteristics of Bigdata
Non-Definitional traits of Big Data
• Volatility: It deals with “How long the data is valid? “

• Validity: Validity refers to accuracy & correctness of data.

Data Authenticity-Information that was extracted from the data should be accurate or
correct.

• Variability: Data flows can be highly inconsistent with periodic peaks.

• Visualization:- It is a method of representing intangible ideas using graphs, charts etc

Challenges of Big Data
• Incomplete Understanding of Big Data:
Lack of proper understanding
Skilled persons required.
• Exponential Data Growth
Since data is growing exponentially, it’s becoming difficult to store this data
• Security of Data
Many resources for understanding, storing, and analyzing the data they often forget
to prioritize the security part.
• Data Integration
Data can be in any form from structured data like phone numbers to unstructured
data like videos. So, integrating these data is difficult
• Lack of data professionals
Types of Bigdata
Big data can be broadly classified into three main types:

Structured data
Structured
Semi-structured data

More structured
Semi-structured
Unstructured data

Unstructured
Types of Bigdata
Structured data
• The data that can be stored and processed in a fixed
format is called as Structured Data.
• Typically represented in tables, rows, and columns.
• Data stored in a relational database management
system (RDBMS) is one example of ‘structured’
data.
• It is easy to process structured data as it has a fixed
schema.
• Structured Query Language (SQL) is often used to
manage such kind of Data.

Example
Types of Bigdata
Semi-structured data
• Semi-structured data is one of the types of big data that
represents a middle ground between the structured and
unstructured data categories.

• It combines elements of organization and flexibility,

allowing for data to be partially structured while
accommodating variations in format and content.

• This type of data is often represented with tags, labels,

or hierarchies, which provide a level of organization
without strict constraints.
Types of Bigdata
Unstructured data
• The data which have unknown form, cannot be analyzed unless it is
transformed into a structured format is called as unstructured data.

• It lacks a consistent structure, making it more challenging to organize and

analyze.

• Text Files and multimedia contents like images, audios, videos are example of
unstructured data.

• The unstructured data is growing quicker than others, experts say that80 percent
of the data in an organization are unstructured.
Traditional Business Intelligence versus Big
Data
Traditional BI Environment Big Data Environment
Data is stored in central server Data is stored in a distributed file system.

Analyzes offline or historic data Analyzes offline and real or streaming

data
Supports Structured data only Supports variety of data ie; structured,
semi-structured and unstructured data.
Data warehouse

• A Data Warehouse is a central repository

that stores huge amount of structured,
historical data from various sources within an
organization.
• An ordinary Database can store MBs to GBs
of data and that too for a specific purpose. For
storing data of TB size, the storage shifted to
Data Warehouse.
• To effectively perform analytics, an
organization keeps a central Data Warehouse
to closely study its business by organizing,
understanding, and using its historic data for
taking strategic decisions and analyzing
trends.
key aspects of a data warehouse:
Structure: Data warehouses use a relational database management system (RDBMS)
to store data in a structured format.
Data is organized into tables, rows, and columns, following a predefined schema
called a dimensional model.

ETL Processes:
Extract, Transform, Load (ETL) processes are used to populate the data warehouse.
Data is extracted from different sources, transformed to adhere to the data
warehouse schema, and then loaded into the warehouse.
ETL processes often involve data cleansing, integration, and aggregation to ensure
data quality and consistency.
Historical Data:
• Data warehouses primarily store historical data, providing a long-term view of the
organization's operations.
• Data is typically collected at regular intervals, such as daily, weekly, or monthly,
allowing for trend analysis and historical reporting.
Business Intelligence and Reporting:
• Data warehouses serve as a foundation for business intelligence activities.
• They support ad hoc querying, reporting, and analysis by providing a consolidated
view of data across different business areas.
• Data is often pre-aggregated and optimized for query performance to facilitate fast
and interactive reporting.
Hadoop Environment:
• Hadoop is an open-source framework that enables distributed storage and
processing of large datasets across clusters of commodity hardware.
• It provides a scalable and cost-effective solution for managing Big Data.
• Distributed Storage:
Hadoop uses the Hadoop Distributed File System (HDFS) to store data across
multiple nodes in a cluster.
Data is split into blocks and distributed across the cluster, ensuring high availability
and fault tolerance.
Distributed Processing:
Hadoop leverages the MapReduce framework to process data in parallel across the
cluster.
MapReduce divides data processing tasks into smaller subtasks and distributes them
to different nodes, allowing for efficient parallel processing.
• Scalability:
Hadoop is designed to scale horizontally by adding more nodes to the cluster. This
enables organizations to store and process large volumes of data without relying on
expensive and specialized hardware.
Flexibility and Variety:
Hadoop can handle various types of data, including structured, unstructured, and
semi-structured data.
It allows organizations to store and process diverse data formats, such as text, log
files, images, videos, and more.
Data Processing Frameworks:
Hadoop ecosystem includes several data processing frameworks and tools built on
top of Hadoop, such as Apache Spark, Apache Hive, and Apache Pig.
These frameworks provide higher-level abstractions and APIs for data manipulation,
querying, and analysis.
• Batch and Real-time Processing:
Hadoop supports both batch processing and real-time/streaming processing.
While its MapReduce framework is suitable for batch processing, tools like Apache Spark
enable real-time and near-real-time analytics on streaming data.

Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
111 pages
Understanding Big Data: Types & Characteristics
No ratings yet
Understanding Big Data: Types & Characteristics
17 pages
Introduction to Big Data Concepts
100% (2)
Introduction to Big Data Concepts
33 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
61 pages
Big Data Analytics: Types & Challenges
No ratings yet
Big Data Analytics: Types & Challenges
44 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
87 pages
Big Data and Analytics Overview Guide
No ratings yet
Big Data and Analytics Overview Guide
6 pages
Big Data Analytics Overview and Benefits
No ratings yet
Big Data Analytics Overview and Benefits
58 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
12 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
15 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
50 pages
Types and Characteristics of Digital Data
No ratings yet
Types and Characteristics of Digital Data
96 pages
Big Data Fundamentals and Characteristics
No ratings yet
Big Data Fundamentals and Characteristics
106 pages
Understanding Big Data Technology
No ratings yet
Understanding Big Data Technology
49 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
63 pages
Understanding Big Data Veracity
No ratings yet
Understanding Big Data Veracity
18 pages
Understanding Big Data in 10 Minutes
No ratings yet
Understanding Big Data in 10 Minutes
10 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
17 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
34 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
34 pages
Understanding Big Data: Types and Benefits
No ratings yet
Understanding Big Data: Types and Benefits
5 pages
Understanding Big Data: Types and Applications
No ratings yet
Understanding Big Data: Types and Applications
60 pages
Characteristics of Big Data Explained
No ratings yet
Characteristics of Big Data Explained
21 pages
Big Data: Introduction and Characteristics
No ratings yet
Big Data: Introduction and Characteristics
53 pages
Big Data Fundamentals and Applications
No ratings yet
Big Data Fundamentals and Applications
37 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
20 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
30 pages
Understanding Big Data: Types and Benefits
No ratings yet
Understanding Big Data: Types and Benefits
10 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
152 pages
Understanding Data Types and Big Data
No ratings yet
Understanding Data Types and Big Data
86 pages
Introduction to Big Data and Hadoop
No ratings yet
Introduction to Big Data and Hadoop
29 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
34 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
20 pages
Overview of Big Data Analytics
No ratings yet
Overview of Big Data Analytics
23 pages
Understanding Big Data: Types and Applications
No ratings yet
Understanding Big Data: Types and Applications
15 pages
Big Data Analytics Fundamentals Guide
No ratings yet
Big Data Analytics Fundamentals Guide
151 pages
Understanding Data and Big Data Evolution
No ratings yet
Understanding Data and Big Data Evolution
11 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
18 pages
Understanding Big Data Concepts and Benefits
No ratings yet
Understanding Big Data Concepts and Benefits
19 pages
Understanding Big Data: Types & Characteristics
No ratings yet
Understanding Big Data: Types & Characteristics
12 pages
Overview of Big Data Concepts and Applications
No ratings yet
Overview of Big Data Concepts and Applications
25 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
59 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
52 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
8 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
63 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
14 pages
Big Data Analytics Study Material 2023
No ratings yet
Big Data Analytics Study Material 2023
80 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
37 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
57 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
9 pages
Understanding Big Data and Hadoop
No ratings yet
Understanding Big Data and Hadoop
23 pages
France's Big Data Analytics Landscape
No ratings yet
France's Big Data Analytics Landscape
91 pages
Big Data Module Overview
No ratings yet
Big Data Module Overview
26 pages
Understanding Scales of Measurement and Big Data
No ratings yet
Understanding Scales of Measurement and Big Data
30 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
62 pages
Choosing Tools for Big Data Analysis
No ratings yet
Choosing Tools for Big Data Analysis
85 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
35 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
73 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
13 pages
Class 12 Computer Science Sample Paper
No ratings yet
Class 12 Computer Science Sample Paper
20 pages
NoSQL and Big Data Skills Certification
No ratings yet
NoSQL and Big Data Skills Certification
1 page
Application Archiving
No ratings yet
Application Archiving
5 pages
Transbase Administration Guide
No ratings yet
Transbase Administration Guide
67 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
5 pages
SQL Queries for Worker and Bonus Data
No ratings yet
SQL Queries for Worker and Bonus Data
4 pages
Taxi Operations Analysis and Insights
No ratings yet
Taxi Operations Analysis and Insights
20 pages
Test Bank for Database Management Systems
No ratings yet
Test Bank for Database Management Systems
15 pages
Overview of AWS Glue Features and Benefits
No ratings yet
Overview of AWS Glue Features and Benefits
259 pages
IR 4.0 Foundation Course Overview
No ratings yet
IR 4.0 Foundation Course Overview
8 pages
Infosets and User Groups Overview
No ratings yet
Infosets and User Groups Overview
18 pages
LPIC-1 Exam 101: Questions 101-500
No ratings yet
LPIC-1 Exam 101: Questions 101-500
50 pages
Data Migration Checklist for Nonprofits
No ratings yet
Data Migration Checklist for Nonprofits
5 pages
Segmentation in Virtual Memory Systems
No ratings yet
Segmentation in Virtual Memory Systems
17 pages
Common SQL Questions & Answers
No ratings yet
Common SQL Questions & Answers
5 pages
DBMS Overview and Key Concepts
No ratings yet
DBMS Overview and Key Concepts
47 pages
Domain Password Change Issues
No ratings yet
Domain Password Change Issues
9 pages
PL/SQL Queries for Parts Database
No ratings yet
PL/SQL Queries for Parts Database
8 pages
Understanding PowerCenter Basics
No ratings yet
Understanding PowerCenter Basics
38 pages
Employee Management DAO in Java
No ratings yet
Employee Management DAO in Java
3 pages
Oracle 10g Database Backup Restore Test
No ratings yet
Oracle 10g Database Backup Restore Test
7 pages
Victory School Club Membership System
100% (11)
Victory School Club Membership System
7 pages
Optimizing Query Design in SAP HANA
No ratings yet
Optimizing Query Design in SAP HANA
4 pages
Spreadsheet and Database Exercises Answers
No ratings yet
Spreadsheet and Database Exercises Answers
6 pages
Spark 3.0 Key Features Overview
No ratings yet
Spark 3.0 Key Features Overview
8 pages
ServiceNow Performance Review Metrics
No ratings yet
ServiceNow Performance Review Metrics
26 pages
PL/SQL Cursor Employee Management
No ratings yet
PL/SQL Cursor Employee Management
4 pages
Oracle Security Functions Overview
No ratings yet
Oracle Security Functions Overview
7 pages
Database Encryption Types Explained
No ratings yet
Database Encryption Types Explained
10 pages

Big Data Analytics Overview and Insights

Uploaded by

Big Data Analytics Overview and Insights

Uploaded by

19CSE357 BIG DATA ANALYTICS

• Data is raw information—facts, figures, observations—that can be

Volume : Scale of Data

• It can be structured, semi-structured and unstructured.

• Veracity refers to the quality and reliability of the data

• It refers to inconsistencies and uncertainty in data, that is data which is

• Big Data is also variable because of the multitude of data dimensions

• Ex: Customer feedback data may be filled with inconsistencies, biases, or

• Data in itself is of no use or importance but it needs to be converted into

• Validity: Validity refers to accuracy & correctness of data.

• Variability: Data flows can be highly inconsistent with periodic peaks.

• Visualization:- It is a method of representing intangible ideas using graphs, charts etc

• It combines elements of organization and flexibility,

• This type of data is often represented with tags, labels,

• It lacks a consistent structure, making it more challenging to organize and

Analyzes offline or historic data Analyzes offline and real or streaming

• A Data Warehouse is a central repository

You might also like