0% found this document useful (0 votes)

50 views61 pages

Big Data Analytics Course Overview

The document outlines a course on Big Data Analytics, covering topics such as the introduction to Big Data, Hadoop, MongoDB, Hive, Pig, and Spark. It emphasizes the importance of understanding data classification (structured, semi-structured, and unstructured) and the challenges associated with Big Data, including exponential growth and the need for skilled professionals. The course also highlights the significance of Big Data Analytics in decision-making and the technologies involved in processing large datasets.

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views61 pages

Big Data Analytics Course Overview

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

BIG DATA ANALYTICS

(BCS714D)
MODULE 1:
1. Introduction to Big Data
2. Big Data Analytics.
MODULE 2:
1. Introduction to Hadoop
2. Introduction to Map Reduce Programming
MODULE 3:
1. Introduction to Mongo DB
MODULE 4:
1. Introduction to Hive
2. Introduction to Pig
MODULE 5:
1. Spark and Big Data Analytics
2. Text, Web Content and Link Analytics
Text Books:
1. Seema Acharya and Subhashini Chellappan “Big
data and Analytics” Wiley India Publishers, 2nd
Edition,2019.

2. Rajkamal and Preeti Saxena, “Big Data

Analytics, Introduction to Hadoop, Spark and
Machine Learning”,McGraw Hill Publication,
2019.
MODULE 1
Syllabus

Introduction to Big Data:Classification of data, Characteristics,

Evolution and definition of Big data, What is Big data, Why Big data,
Traditional Business Intelligence Vs Big Data, Typical data warehouse and
Hadoop environment.
Big Data Analytics: What is Big data Analytics, Classification of
Analytics, Importance of Big Data Analytics, Technologies used in Big
data Environments, Few Top Analytical Tools , NoSQL, Hadoop.

TB1: Ch 1: 1.1, Ch2: 2.1-2.5,2.7,2.9-2.11, Ch3: 3.2,3.5,3.8,3.12, Ch4:

4.1,4.2
INTRODUCTION:
Today, data is very important for all kinds of businesses big or
small. It is found both inside and outside the company, and it comes
in different forms from many sources. To make good decisions, we
need to:
 Collect the data
 Understand it
 Use it properly to get useful information
Data is a set of values that represent a concept or concepts. It can be
raw information, such as numbers or text, or it can be more
complex, such as images, graphics, or videos.
[Link] OF DATA
 Digital data can be broadly classified into structured , semi-structured, and
unstructured data.

[Link] data:
 This type of data has no specific format or structure.
 Computers can't easily understand or process it directly.
 Examples: Word files, emails, images, videos, PDFs, research papers,
presentations.
 Around 80–90% of data in companies is unstructured.

2. Semi-Structured Data
 This data has some structure, but not as organized as structured data.
 It's not very easy for computers to use directly.
 Examples: XML files, HTML pages.

3. Structured Data
 This data is well-organized in tables (like rows and columns).
 Computers can easily read and use it.
 Example: Data stored in databases like student records, sales data, etc.
 Relationships between data are also defined (e.g., student and their marks).
In fact, a company called Gartner says that today, about 80% of the data in
companies is unstructured, and only about 10% is structured or semi-
structured.

1. Structured Data
Most structured data is stored in databases called RDBMS (Relational Database
Management Systems), such as Oracle, MySQL, and SQL Server.
Data in these databases is kept in tables. Each table has rows (each row is one record
or entry) and columns (each column stores a particular type of information, such as
employee name or number).
 Each table represents a type of object or entity, for example, “Employee”.
 Every column has a specific meaning and data type (like a number or text)
and can have rules, such as “cannot be empty” or “must be unique”.
Tables can be linked to each other. For example, each Employee record has a
DeptNo, which connects to the Department table. This shows which department
the employee works in. This relationship keeps data organized and avoids
duplication.

 Structured data is stored in:

 Databases (like Oracle, MySQL)
 Spreadsheets (like Excel)
 OLTP (Online Transaction Processing)
systems
1.1.1 Sources of Structured Data

Here’s where it usually comes from:

[Link]
2. Spreadsheets. In a spreadsheet, information is put into rows and columns, just
like in a table.
3. OLTP Systems (Online Transaction Processing)
1.1.2 Ease of working with Structured data
[Link] Structured Data
 Semi-structured data is between structured and unstructured. It’s more
flexible and doesn’t follow strict table rules.
 Key Features:
[Link] Fixed Tables: It doesn’t follow regular database format.
[Link] Tags: Data is stored with tags (like in XML or JSON), making it easier to
understand.
3. Tags and Hierarchies
Tags (like <name>) are used to show the structure and hierarchy of data.
4. Schema Mixed with Data
Information about the structure (schema) is mixed along with the actual data
values.
5. Unknown or Varying Attributes
Data may have different attributes, and we might not know these in advance.
Items in the same group don't need to have the same properties.
1.2.1 Sources of Semi-structured Data
The most common formats for semi-structured data are:
1. XML (eXtensible Markup Language)
Used by web services (especially with SOAP).
Stores data with opening and closing tags.
Example:
xml
<name>John</name>
2. JSON (JavaScript Object Notation)
Used to transfer data between a web server and a browser.
Common in modern web applications (using REST).
Also used in NoSQL databases like MongoDB and Couchbase.
Example:
json
{ "name": "John" }
AN EXAMPLE OF HTML IS AS FOLLOWS:

<HTML> : Enclose the entire HTML document. This indicates the

start and end of the HTML code.
<HEAD> : Contain meta-information about the document, such as its title
and links to stylesheets or scripts.
<TITLE>Place your title here</TITLE>: Sets the page title, which appears in
the browser’s title bar or tab.
<BODY BGCOLOR="FFFFFF">: Defines the main content
area of the webpage. The BGCOLOR="FFFFFF" attribute sets the background
colour of the page to white using the hexadecimal colour code FFFFFF.
<CENTER><IMG SRC="[Link]" ALIGN="BOTTOM"></CENTER>
<HR>
<a href="[Link] Name</a>
<H1>this is a Header</H1>
<H2>this is a sub Header</H2>
Send me mail at <a
href="[Link]
<P>a new paragraph!
<P><B>a new paragraph!</B>
<BR><B><I>this is a new sentence without a paragraph break, in bold
italics.</I></B>
<HR>
</BODY>
</HTML>
SAMPLE JSON DOCUMENT

{
"_id": 9,
"BookTitle": "Fundamentals of Business Analytics",
"AuthorName": "Seema Acharya",
"Publisher": "Wiley India",
"YearofPublication": "2011"
}
[Link] Data
Unstructured data refers to information that does not conform to a predefined
model or structure.
It’s unpredictable, free-form, and often varies widely from one instance to
another.
Examples include social media posts, emails, and logs.
Sometimes, patterns exist in unstructured data, leading to debates about whether
some of it is actually "semi-structured."

Twitter
Message "Feeling miffed®. Victim of twishing."

Facebook
Post "LOL. C ya. BFN"

[Link] - frank [10/Oct/[Link] -0700] "GET /apache_pb.gif HTTP/1.0"

Log Files 200 2326 ...

"Hey Joan, possible to send across the first cut on the Hadoop chapter by Friday
Email EOD or maybe we can meet up over a cup of coffee. Best regards, Tom"
1.3.1 Sources of Unstructured Data
Web pages: The actual content of web pages, which is often complex and not
neatly organized.
Images: Photographs, diagrams, and pictures.
Free form text: Any text that isn’t organized into records or tables, such as essays
or reports.
Audios: Recordings and voice files.
Videos: Multimedia files combining images and sound.
Body of Email: The main content area of emails—not the sender/recipient or time
fields, but what people actually write.
Text messages: SMS or instant messaging text.
Chats: Conversations from online chat applications.
Social media data: Posts, comments, reactions on platforms like Facebook,
Twitter, Instagram, etc.
Word documents
1. Implied Structure:
Sometimes, there's structure present (e.g., date at the start of a log entry) even if it
wasn't pre-defined.
2. Structure Not Helpful:
Data might have some internal structure, but if that structure isn't useful for a given
task, it's still treated as unstructured.
3. Unexpected/Unstated Structure:
Data may be more structured than we realize, but if it's not anticipated or announced,
it's called unstructured
This image shows a seesaw with “Unstructured data” on one side and “Structured
data” on the other, tilting heavily towards unstructured.
Meaning: Unstructured data makes up the majority of enterprise
(business/organizational) information, outweighing structured data.
Techniques to Find Patterns in Unstructured Data
[Link] Mining:
Data mining is the analysis of large data sets to identify consistent patterns
or relationships between variables. It draws upon artificial intelligence,
machine learning, statistics, and database systems.
Think of it as the "analysis step" in the process called "knowledge discovery
in databases."
Popular Data Mining Algorithms:
Association Rule Mining:
Also called: Market basket analysis or affinity analysis
Purpose: Determines "What goes with what?"
For example, if someone buys bread, do they also tend to buy eggs or cheese?
Use: Helps stores recommend or place products together based on previous
purchases.
Regression Analysis:
Purpose: Predicts the relationship between two variables.
How: One variable (dependent variable) is predicted using other variables
(independent variables).
Use: Estimate outcomes, trends, or values based on related data.
Text Analytics/Text Mining:
Extracts meaningful information from unstructured text (like social media or
emails). Tasks include categorization, clustering, sentiment, or entity extraction.
Natural Language Processing (NLP):
Enables computers to understand and interpret human language.
Noisy Text Analytics:
Deals with messy data (chats, messages) that may have errors or informal
language.
Manual Tagging with Metadata:
Attaching manual tags/labels to data to add meaning/structure.
Part-of-Speech Tagging:
Tagging text with its grammatical parts (noun, verb, adjective, etc.)
Unstructured Information Management Architecture (UIMA):
A platform to process unstructured content (text, audio) in real time for extracting
relevant meaning.
[Link] OF DATA
Data has three key characteristics.
1. Composition
This refers to the structure of data: its sources, granularity, types, and
whether it is static or involves real-time streaming.
It answers questions like: What is the origin of the data? Is it organized as
batches or streams? Is it highly granular or summarized?
2. Condition
This addresses the state or quality of the data.
It focuses on whether the data is ready for analysis or if it needs to be
cleansed or improved through enrichment. Typical questions include: Is the
data suitable for immediate use? Does it need preprocessing?
[Link]
Context covers the background in which data was generated or is being
used.
It helps answer where, why, and how the data came about, its sensitivity,
and associated events. For example: Where did this data originate? Why
was it created? What events are linked to it?
[Link] OF BIG DATA
 Before 1980: Simple, structured data stored in mainframes,
limited usage.
 1980s-1990s: Relational databases enable more sophisticated,
relational data storage and some data analysis.
 2000s-now: Boom of the internet and new technologies generates
vast, varied data (structured, unstructured, multimedia); data is
now a strategic asset driving decisions and innovations.
[Link] OF BIG DATA
Flexible Definitions:
 Beyond Human/Technical Limits: Anything exceeding current human or
technical infrastructure for storage, processing, and analysis.
 Relativity: What is considered "big" today may be normal tomorrow, showing
how fast the landscape evolves.
 Massive Scale: Sometimes simply defined as terabytes, petabytes, or even
zettabytes of data.
 Three Vs: Most commonly, Big Data is described using the "3 Vs": Volume,
Velocity, and Variety.
 Big data is high-volume, high-velocity, and high-variety information assets
that demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.
— Gartner IT Glossary
Big data is high-volume, high-velocity, and high-variety information
assets that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making.
— Gartner IT Glossary

Aspect Description

Enormous amounts of data, both structured

Volume and unstructured.

Speed at which new data is generated and

Velocity must be processed.

Diversity in data types: text, images,

Variety videos, logs, streams, etc.
Gartner's definition of big data through a simple flow diagram.
 High-volume, high-velocity, high-variety data
→ Need for cost-effective, innovative information processing
→ Leads to enhanced insight and better decision making
 Cost-effective and Innovative Processing:
Big Data requires new technologies and approaches to ingest, store, and
analyze huge, fast-flowing, and diverse data sets.
 Enhanced Insight and Decision-Making:
The ultimate goal is to derive deeper, richer, and more actionable insights—
turning data into information, then actionable intelligence, leading to better
decisions and greater business value.
This chain is summarized as:
Data → Information → Actionable intelligence → Better decisions
→ Enhanced business value
Big Data isn't just about size; it's about complexity, speed, diversity,
and the ability to draw deeper insights to achieve a competitive edge
in decision making.

Big Data helps turn raw facts into smart

insights, which lead to better decisions
and improved business results.
[Link] WITH BIG
DATA
Main Challenges with Big Data
[Link] Data Growth
Data is growing at an exponential rate, with most existing data generated in just the last
few years.
The challenge is : Will all this data be useful? Should we analyze all data or just a
subset? How do we distinguish valuable insights from noise
[Link] Computing and Virtualization
Managing big data infrastructure often involves cloud computing, which provides cost-
efficiency and flexibility.
However, deciding whether to host data solutions inside or outside the enterprise adds
complexity due to concerns about control and security.
[Link] Decisions
Determining how long to keep data is challenging. Some data may only be relevant for
a short period, while other data could have long-term value.
[Link] of Skilled Professionals
There is a shortage of experts in data science, which is crucial for
implementing effective big data solutions.
[Link] Issues
Big data involves datasets too large for traditional databases.
No clear limit defines when data becomes "big,” and new methods are needed
as data changes rapidly and in unpredictable ways.
Challenges include not only storage, but also capturing, preparing,
transferring, securing, and visualizing the data.
[Link] for Data Visualization
Clear and effective visualization is essential to making sense of vast datasets.
There aren’t enough specialists in data or business visualization to meet
demand.
CHALLENGES WITH BIG DATA
The diagram highlights the following core challenges in handling big data:
Capture (gathering data from multiple sources)
Storage (handling massive volumes of information)
Curation (organizing and maintaining data quality)
Search (efficiently finding relevant information)
Analysis (extracting insights)
Transfer (moving data across locations or systems)
Visualization (presenting data in understandable formats)
Privacy Violations (ensuring data security and privacy)
[Link] IS BIG DATA?

Big data refers to data that is extremely large in volume, moves at a high velocity, and comes in
a wide variety of forms. The concept of big data is usually captured by the "3 Vs":
6.1 Volume:
Massive amounts of data, ranging from terabytes to yottabytes.
Growth of Data (Volume)
Data grows from small units (bits, bytes) to massive scales, as shown in the
growth path:
Bits → Bytes → Kilobytes → Megabytes → Gigabytes → Terabytes →
Petabytes → Exabytes → Zettabytes → Yottabytes
Velocity: The speed at which data is generated and processed, from batch to
real-time streams.
Variety: Diversity in data sources and formats (structured, unstructured—like
text, video, databases, etc.).
Unit Size (in bytes)

Bits 0 or 1

Bytes 8 bits

Kilobytes 1,024 bytes

Megabytes 1,024² bytes

Gigabytes 1,024³ bytes

Terabytes 1,024⁴ bytes

Petabytes 1,024⁵ bytes

Exabytes 1,024⁶ bytes

Zettabytes 1,024⁷ bytes

Yottabytes 1,024⁸ bytes

6.1.1 Where is Big Data Generated?

Big data can be generated from a multitude of sources, both internal and
external:
Files and Documents: XLS, DOC, PDF files (often unstructured).
Multimedia: Video (YouTube), audio, social media.
Communication: Chat messages, customer feedback forms.
Other Examples: CCTV footage, weather forecasts, mobile data.
Internal Data Sources in Organizations
Data Storage: File systems, relational databases (RDBMS like Oracle, MS SQL
Server, MySQL, PostgreSQL), NoSQL databases (MongoDB, Cassandra).
Archives: Scanned documents, customer records, patient health records, student
records, etc.
SOURCES OF BIG DATA
Big data is therefore characterized by its large volume, high velocity, and multiple
varieties, and is generated from numerous sources—ranging from structured databases to
unstructured social media, videos, and organizational records.
6.2 Velocity
Velocity describes the speed at which data is generated, collected, and needs to be
processed. In the past, data used to be processed in batches—meaning all data was
collected over a period and then analyzed together (for example, payroll
calculations). Today, data increasingly needs to be processed in real time, or near
real time, as it arrives. This evolution is summarized as:
Batch → Periodic → Near real time → Real-time processing
Modern organizations now expect their systems to process and respond to data
instantly or within seconds, rather than waiting for slow, scheduled processing.
6.3 Variety
Variety refers to the diversity of data types and sources that organizations must handle. It
is categorized into three types:
Structured Data:
Highly organized and easily searchable (e.g., data stored in relational databases like RDBMS,
traditional transaction processing systems).
Semi-Structured Data:
Not as rigidly organized, but contains tags or markers to separate elements (e.g., HTML, XML).
Unstructured Data:
No predefined structure. Examples include text documents, emails, audios, videos, social media
posts, PDFs, and photos. Unstructured data is the most challenging but also the biggest source of
insights.
Variety means organizations must be able to manage everything from traditional database
records and spreadsheets to social media posts, sensor logs, images, and more—all of
which may require specialized processing techniques.
[Link] BIG DATA?
Big data is important because the more data we have for analysis, the more
accurate our analytical results become. This increased accuracy boosts our
confidence in the decisions we make based on these analyses. With this greater
confidence, organizations can realize significant positive outcomes, namely:
 Enhanced operational efficiency
 Reduced costs
 Less time spent on processes
 Increased innovation in developing new products and services
 Optimization of existing offerings
The process can be visualized as a sequence:
More data → More accurate analysis → Greater confidence in decision
making → Greater operational efficiencies, cost reduction, time reduction,
new product development, and optimized offerings
8. TRADITIONAL BUSINESS INTELLIGENCE VS BIG
DATA
1. Data Storage and Architecture
Traditional BI:
All enterprise data is stored on a central server (usually on a single or a few large
database servers).
Big Data:
Data is stored in a distributed file system (spread across many servers or nodes).
Distributed systems can scale “horizontally” by adding more servers (nodes),
rather than making a single server bigger (“vertical” scaling).

2. Data Analysis Mode

Traditional BI:
Data analysis usually happens in offline mode, meaning data is collected and then
analyzed at a later time (batch processing).
Big Data:
Analysis can happen both in real time and in offline (batch) mode.
3. Data Type and Processing Method
Traditional BI:
Deals mostly with structured data (data that fits neatly into tables, like
databases). The typical approach is to move data to the processing
function (“move data to code”).
Big Data:
Handles all types of data: structured, semi-structured, and unstructured (such
as logs, images, social media text, etc.). In Big Data systems, it is more common
to move the processing function to where the data is (“move code to data”).
9. TYPICAL DATA WAREHOUSE ENVIRONMENT
Step 1: Data Collection (Sources)
 Data comes from different systems inside and outside the company,
such as:
 ERP systems (finance, HR, inventory)
 CRM systems (customer details, sales info)
 Old legacy systems (still in use)
 Third-party apps (external software)
 This data can be in many formats:
 Databases (Oracle, SQL Server, MySQL)
 Excel sheets
 Text/CSV files
Step 2: Data Integration (ETL Process)
 Since data comes in different formats, it must be:
 Extracted (taken out from sources)
 Transformed (cleaned and converted into a common format)
 Loaded (sent into the warehouse)
 This process is called ETL.
Step 3: Loading Data
 After ETL, the cleaned data goes into the Data Warehouse.
 It’s stored at the enterprise level (for the whole company).
 Sometimes, smaller warehouses called Data Marts are made for
specific teams (like sales, HR).
Step 4: Business Intelligence & Analytics
 Once data is ready in the warehouse, companies can use tools to:
 Run quick queries (questions to the database)
 Create dashboards (visual summaries)
 Do data mining (find patterns and trends)
 Generate reports
This helps managers make smarter, faster, data-driven decisions.
 A Data Warehouse is like a super-organized library of business
data.
 Collect from different sources → Clean & combine (ETL) →
Store in warehouse → Analyze using BI tools.
9. TYPICAL HADOOP ENVIRONMENT
Differences Between Hadoop and Data Warehousing
[Link] and Type of Data
Hadoop:
Collects data from a wide and diverse set of sources—web logs, images, videos,
social media content (Twitter, Facebook, etc.), documents, PDFs, and more. It is
designed to handle not just structured data, but also semi-structured and
unstructured data. This includes data both within and outside the company's
firewall.
Data Warehouse:
Traditionally focuses on structured data from well-defined business applications
like ERP, CRM, or legacy systems, typically within the organization's boundaries.
2. Storage Mechanism
Hadoop:
Uses the Hadoop Distributed File System (HDFS) to store data reliably across
many servers. Data of various types and sizes is kept in this distributed file
system, which is highly scalable and fault tolerant.
Data Warehouse:
Uses relational databases or similar systems where data is stored in tables with
fixed schemas (rows and columns).
3. Processing and Output
Hadoop:
Processing is done via MapReduce, a programming model that allows massive
scalability by dividing tasks across multiple nodes. After processing, data can be
sent to different destinations: back to operational systems, to data warehouses,
data marts, or operational data stores (ODS) for further analysis.
Data Warehouse:
Processing is done using SQL queries, and data is mostly kept in place for
analytics and reporting.
4. Integration and Use
In Hadoop environments, data can flow from many types of source
systems (logs, media, social platforms, documents) into Hadoop, where it is
processed and then routed to the relevant business destinations (operational
systems, warehouses, marts, ODS) for final use or reporting
 Hadoop is built for handling massive volumes and varieties of data (structured
and unstructured, internal and external), storing it in a distributed fashion with
flexible processing pipelines.
 Traditional Data Warehousing excels at managing structured, business-
critical data in centralized, organized storage.

Understanding Big Data and Its Types
No ratings yet
Understanding Big Data and Its Types
57 pages
Components of Big Data Architecture
No ratings yet
Components of Big Data Architecture
15 pages
Big Data Analytics: Data Classification Guide
No ratings yet
Big Data Analytics: Data Classification Guide
50 pages
Understanding Big Data Types
No ratings yet
Understanding Big Data Types
63 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
18 pages
Understanding Data Types and Structures
No ratings yet
Understanding Data Types and Structures
59 pages
Big Data Analytics: Data Types Explained
No ratings yet
Big Data Analytics: Data Types Explained
37 pages
Big Data Fundamentals and Analytics Overview
No ratings yet
Big Data Fundamentals and Analytics Overview
53 pages
Big Data Classification Overview
No ratings yet
Big Data Classification Overview
38 pages
Veracity in Big Data Analytics
No ratings yet
Veracity in Big Data Analytics
48 pages
Types and Classification of Digital Data
No ratings yet
Types and Classification of Digital Data
22 pages
Understanding Big Data Types and Challenges
No ratings yet
Understanding Big Data Types and Challenges
64 pages
Understanding Big Data Classification
No ratings yet
Understanding Big Data Classification
42 pages
Understanding Big Data Types and Challenges
No ratings yet
Understanding Big Data Types and Challenges
107 pages
Data Science Basics Lec 1
No ratings yet
Data Science Basics Lec 1
13 pages
Understanding Data Types in Databases
No ratings yet
Understanding Data Types in Databases
11 pages
Bda Unit-1
No ratings yet
Bda Unit-1
54 pages
Understanding Structured and Unstructured Data
No ratings yet
Understanding Structured and Unstructured Data
56 pages
Understanding Data Types: Structured, Unstructured, Big Data
No ratings yet
Understanding Data Types: Structured, Unstructured, Big Data
15 pages
Understanding Big Data and Its Types
No ratings yet
Understanding Big Data and Its Types
12 pages
Big Data Programming Essentials
No ratings yet
Big Data Programming Essentials
80 pages
Data Classification: Types and Sources
No ratings yet
Data Classification: Types and Sources
50 pages
Managing Inconsistent Data Flows in Big Data
No ratings yet
Managing Inconsistent Data Flows in Big Data
14 pages
BDAModule-1
No ratings yet
BDAModule-1
62 pages
Big Data Analytics Overview and Techniques
No ratings yet
Big Data Analytics Overview and Techniques
102 pages
Data Engineering and Big Data Overview
No ratings yet
Data Engineering and Big Data Overview
7 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
198 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
37 pages
Understanding Big Data Types and Management
No ratings yet
Understanding Big Data Types and Management
78 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
75 pages
UNIT-I Data Analytics (1)
No ratings yet
UNIT-I Data Analytics (1)
18 pages
Prescriptive Analytics Techniques Defined
No ratings yet
Prescriptive Analytics Techniques Defined
27 pages
Big Data: Types and Management Strategies
No ratings yet
Big Data: Types and Management Strategies
85 pages
Understanding Big Data Types and Analytics
No ratings yet
Understanding Big Data Types and Analytics
51 pages
_1.1 unit 1 DBMS
No ratings yet
_1.1 unit 1 DBMS
203 pages
Understanding Big Data and Its Types
No ratings yet
Understanding Big Data and Its Types
28 pages
Big Data: Types, Benefits, and Analytics
No ratings yet
Big Data: Types, Benefits, and Analytics
21 pages
Understanding Data Types and Storage
No ratings yet
Understanding Data Types and Storage
29 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
17 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
40 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
20 pages
Understanding Semi-Structured Data
No ratings yet
Understanding Semi-Structured Data
19 pages
Big Data: Types, Characteristics & Challenges
No ratings yet
Big Data: Types, Characteristics & Challenges
11 pages
Big Data Analytics Overview
100% (1)
Big Data Analytics Overview
31 pages
Data Analytics in Business Management
No ratings yet
Data Analytics in Business Management
52 pages
Data Types for Employee Gender Classification
No ratings yet
Data Types for Employee Gender Classification
75 pages
DA Unit 1
No ratings yet
DA Unit 1
36 pages
Big Data Unit 1 Final
No ratings yet
Big Data Unit 1 Final
43 pages
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
No ratings yet
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
8 pages
Understanding Data Types and Sources
No ratings yet
Understanding Data Types and Sources
39 pages
Understanding Big Data Types and Analytics
No ratings yet
Understanding Big Data Types and Analytics
3 pages
BDA Lec 1 &2
No ratings yet
BDA Lec 1 &2
41 pages
Big Data: Understanding the Four V's
No ratings yet
Big Data: Understanding the Four V's
48 pages
Scalable Data Analytics Overview
No ratings yet
Scalable Data Analytics Overview
14 pages
Understanding Big Data Types and Challenges
No ratings yet
Understanding Big Data Types and Challenges
62 pages
Big Data: Understanding Its Impact and Use
No ratings yet
Big Data: Understanding Its Impact and Use
14 pages
Big Data Analytics: Types & Characteristics
No ratings yet
Big Data Analytics: Types & Characteristics
17 pages
Deep Learning and Reinforcement Learning Course
No ratings yet
Deep Learning and Reinforcement Learning Course
16 pages
MapReduce Programming Overview
No ratings yet
MapReduce Programming Overview
39 pages
Deep Learning Experiments and Projects
No ratings yet
Deep Learning Experiments and Projects
15 pages
AI & ML Curriculum Overview 2023-24
No ratings yet
AI & ML Curriculum Overview 2023-24
4 pages
Understanding Hadoop for Big Data
No ratings yet
Understanding Hadoop for Big Data
91 pages
Introduction to MongoDB Features
No ratings yet
Introduction to MongoDB Features
50 pages
E-Waste Management in India: Overview & Regulations
No ratings yet
E-Waste Management in India: Overview & Regulations
4 pages
Candidate Registration Certificate Details
No ratings yet
Candidate Registration Certificate Details
1 page
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
92 pages
Key Machine Learning Concepts and Problems
No ratings yet
Key Machine Learning Concepts and Problems
1 page
Word Processing Techniques in Computing
No ratings yet
Word Processing Techniques in Computing
92 pages
Goals of AI Research Explained
No ratings yet
Goals of AI Research Explained
44 pages
Introduction to Renewable Energy Concepts
No ratings yet
Introduction to Renewable Energy Concepts
90 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
23 pages
Machine Learning Key Concepts and Problems
No ratings yet
Machine Learning Key Concepts and Problems
1 page
Bridging Ethics and HCAI Practice
No ratings yet
Bridging Ethics and HCAI Practice
68 pages
Language Modelling: Grammar vs. Statistics
No ratings yet
Language Modelling: Grammar vs. Statistics
79 pages
Overview of Indian Knowledge System
No ratings yet
Overview of Indian Knowledge System
16 pages
Understanding Human-Centered AI Principles
100% (1)
Understanding Human-Centered AI Principles
61 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
13 pages
DBMS Lab Manual for BCS403
No ratings yet
DBMS Lab Manual for BCS403
11 pages
BAIL606 Machine Learning Lab Syllabus
No ratings yet
BAIL606 Machine Learning Lab Syllabus
15 pages
BRMK557 Model Question Paper 2022-23
No ratings yet
BRMK557 Model Question Paper 2022-23
2 pages
Study On Sliding Mode Control With Reaching Law For DC Motor
No ratings yet
Study On Sliding Mode Control With Reaching Law For DC Motor
12 pages
Steam Cloud Sync Log Analysis
No ratings yet
Steam Cloud Sync Log Analysis
6 pages
ENV ISO 13843 (2001) - (E) - Codified
No ratings yet
ENV ISO 13843 (2001) - (E) - Codified
8 pages
Qustodio User Guide 2017 PDF
100% (1)
Qustodio User Guide 2017 PDF
135 pages
ZKTeco Biometric Solutions Overview
No ratings yet
ZKTeco Biometric Solutions Overview
34 pages
Innovative Solutions For Connected Farmers: Isobus
No ratings yet
Innovative Solutions For Connected Farmers: Isobus
12 pages
Study Group Finder & Collaboration Platform
No ratings yet
Study Group Finder & Collaboration Platform
4 pages
AutoCAD Keyboard Shortcuts Guide
No ratings yet
AutoCAD Keyboard Shortcuts Guide
7 pages
Canonical Cover and Normal Forms Analysis
No ratings yet
Canonical Cover and Normal Forms Analysis
4 pages
Futuristic PHP Website Blueprint
No ratings yet
Futuristic PHP Website Blueprint
16 pages
10 Steps for Effective Website Testing
No ratings yet
10 Steps for Effective Website Testing
5 pages
Network Forensics and OSI Model Overview
No ratings yet
Network Forensics and OSI Model Overview
15 pages
TWDC Data Reunt New67
No ratings yet
TWDC Data Reunt New67
7 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
74 pages
Dell Emc Networker: Server Disaster Recovery and Availability Best Practices Guide
No ratings yet
Dell Emc Networker: Server Disaster Recovery and Availability Best Practices Guide
70 pages
Car Rental System Project Overview
No ratings yet
Car Rental System Project Overview
10 pages
Understanding Bad Sectors and Forensics
No ratings yet
Understanding Bad Sectors and Forensics
5 pages
Project Ideas for Security & Blockchain
No ratings yet
Project Ideas for Security & Blockchain
2 pages
GE Lightspeed 16 Slice CT Scanner Overview
No ratings yet
GE Lightspeed 16 Slice CT Scanner Overview
3 pages
Python Programming in AI Practical File
No ratings yet
Python Programming in AI Practical File
3 pages
HTML Introduction: Don Bosco Secondary and Preparatory School
No ratings yet
HTML Introduction: Don Bosco Secondary and Preparatory School
77 pages
Technology Innovation and Enterprise Architecture
No ratings yet
Technology Innovation and Enterprise Architecture
84 pages
Design Engineer CV Template
No ratings yet
Design Engineer CV Template
2 pages
Spring Boot Overview and Key Features
No ratings yet
Spring Boot Overview and Key Features
23 pages
Understanding Defects in Software Quality
No ratings yet
Understanding Defects in Software Quality
10 pages
Netflix Configuration Settings Guide
No ratings yet
Netflix Configuration Settings Guide
3 pages
Non-Functional Testing Insights
No ratings yet
Non-Functional Testing Insights
9 pages
CATIA V5 R18 Admin Mode Setup Guide
No ratings yet
CATIA V5 R18 Admin Mode Setup Guide
3 pages
Create a Scream Mask in Illustrator
No ratings yet
Create a Scream Mask in Illustrator
35 pages
Decision Making and Branching in C
No ratings yet
Decision Making and Branching in C
32 pages

Big Data Analytics Course Overview

Uploaded by

Big Data Analytics Course Overview

Uploaded by

BIG DATA ANALYTICS

2. Rajkamal and Preeti Saxena, “Big Data

Introduction to Big Data:Classification of data, Characteristics,

TB1: Ch 1: 1.1, Ch2: 2.1-2.5,2.7,2.9-2.11, Ch3: 3.2,3.5,3.8,3.12, Ch4:

 Structured data is stored in:

Here’s where it usually comes from:

<HTML> : Enclose the entire HTML document. This indicates the

[Link] - frank [10/Oct/[Link] -0700] "GET /apache_pb.gif HTTP/1.0"

Enormous amounts of data, both structured

Speed at which new data is generated and

Diversity in data types: text, images,

Big Data helps turn raw facts into smart

Kilobytes 1,024 bytes

Megabytes 1,024² bytes

Gigabytes 1,024³ bytes

Terabytes 1,024⁴ bytes

Petabytes 1,024⁵ bytes

Exabytes 1,024⁶ bytes

Zettabytes 1,024⁷ bytes

Yottabytes 1,024⁸ bytes

6.1.1 Where is Big Data Generated?

2. Data Analysis Mode

You might also like