0% found this document useful (0 votes)

241 views10 pages

Big Data: Nature and Challenges

This document contains questions and notes about big data and intelligent data analysis. It begins with three parts for a question bank (Parts A, B, and C) that cover topics such as the 5 V's of big data, challenges of conventional systems, intelligent data analysis stages and processes. The notes section defines big data and its sources, provides examples of big data in healthcare, and describes intelligent data analysis and the nature of data.

Uploaded by

Avin Vinod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

241 views10 pages

Big Data: Nature and Challenges

Uploaded by

Avin Vinod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

UNIT 1 – INTRODUCTION TO BIG DATA

QUESTION BANK
PART A ( 2 Marks each)
1. What you mean by Big Data? Imp
2. What are the different sources of big data?
3. Explain different applications of Big Data. Imp
4. Write down the nature of data. Imp
5. What are the different stages in Intelligent Data Analysis?
6. Distinguish between Analysis vs Reporting. Imp
7. What you mean by statistical distributions?
8. What you mean by re-sampling?
9. Define prediction error.

PART B ( 5 Marks each)

10. Explain 5V’s of Big Data.
11. Explain the challenges of conventional systems.
12. Explain Intelligent Data Analysis. Imp
13. Explain analytical process in detail.

PART C ( 15 Marks each)

14. Explain in detail nature of data.
15. Explain Intelligent Data Analysis. Imp
16. Explain Modern Data Analytical Tools. Imp

NOTES
INTRODUCTION TO BIG DATA PLATFORM
▪ Big Data is high-volume, high-velocity and/or high-variety information asset that
requires new forms of processing for enhanced decision making, insight discovery
and process optimization.
▪ A collection of data sets so large or complex that traditional data processing
applications are inadequate.
▪ Data of a very large size, typically to the extent that its manipulation and
management present significant logistical challenges.
▪ A single mobile phone user will generate about 40 exabytes of data on every month.
▪ This massive amount of data is termed as Big data.
▪ Hence these are a large amount of data.
▪ To classify any data as Big data, this is possible with the the concept of 5v.
1. Volume
2. Velocity
3. Variety

Prepared by NITHIN SEBASTIAN

4. Veracity
5. Value
▪ Consider the following example of Health care industry.
➢ Volume:
o High data volumes impose distinct data storage and processing
demands, as well as additional data preparation, curation, and
management processes.
o Hospitals and clinics across the world generate massive volumes of
data.
o 2314 Exabytes of data are collected annually.
➢ Velocity:
o In Big Data environments, data can arrive at fast speeds, and
enormous datasets can accumulate within very short periods of time.
o These data are patient records and test results.
o All this data is generated at a very high speed. Which attributes to the
velocity of big data.
➢ Variety:
o
o Data variety refers to the multiple formats and types of data that need
to be supported by Big Data solutions.
o Data variety brings challenges for enterprises in terms of data
integration, transformation, processing, and storage.
o It referred to the various data types.
Structured data: Excel Records.
Semi-structured data: Log files
Un-structured data: X-ray Images.
➢ Veracity:
o Veracity refers to the quality or fidelity of data.
o Data that enters Big Data environments needs to be
assessed for quality, which can lead to data processing
activities to resolve invalid data and remove noise.
o Accuracy and trustworthiness of the generated data
termed as veracity.
o Noise is data that cannot be converted into information and thus has
no value, whereas signals have value and lead to meaningful
information.
➢ Value:
o Value is defined as the usefulness of data for an enterprise.
o Analysing all these data will benefit the medical sector by,
Faster disease detection
Better treatment
Reduced cost
o These are known as the value of big data.
▪ To store and process big data, various frameworks are used.

Prepared by NITHIN SEBASTIAN

Cassandra
Hadoop
Spark
▪ Big data is analysed for numerous applications in games like:
HALO 3
CALL OF DUTY
▪ Designers analyse users’ data to understand at which stage users pause, restart, quit
playing.
▪ This insight can help them to rework on the game and improve the user experience.
▪ Also, big data also helped with disaster management during hurricane in 2012 in USA
and necessary measures were taken.

TYPES/SOURCES OF BIG DATA

▪ It is suggested by IBM and the Big Data task team:
• Social networks and web data: such as Facebook, Twitter, e-mails, blogs, and
YouTube.
• Transactions data and Business Processes data: such as credit card
transactions, flight bookings, etc. and public agencies data such as medical
records, insurance business data, etc.
• Customer master data: such as data for facial recognition and for the name,
date of birth, marriage anniversary, gender, location and income category.
• Machine-generated data: such as machine-to-machine or Internet of Things
(IOT) data, and the data from sensors, trackers, web logs and computer
systems log. Computer generated data is also considered as machine
generated data from data stores. Usage of programs for processing of data
using data repositories, such as database or file, generates data and also
machine generated data.
• Human-generated data: such as biometrics data, human-machine interaction
data, e-mail records with a mail server and MySql database of student grades.

CHALLENGES OF CONVENTIONAL SYSTEMS

▪ Big data is huge amount of data, hence conventional systems cannot store, manage
and analyse within a time interval.
1. Data is collected in large quantities and it is not possible to process
everything.
2. Data must be meaningful and collected in real time. Where meaningful data
means a data without irregularity, mistakes and inconsistency. Realtime data
means the data can’t be old or outdated.
3. Data is collected from multiple sources (text, audio, video etc) and must be
categorized. This is an important aspect of data collection. Doing manually, it
is impossible for humans, because it is extremely challenging, time
consuming, requires a lot of manpower.

Prepared by NITHIN SEBASTIAN

4. Collect correct data. By eliminating flaws, mistakes, incompleteness,

inconsistency, irregularity from data. Because wrong data will produce
problems in future.
5. Comparison of data using multiple tools. The collected data must be
represented in the form of graphs, charts, statistics etc.
6. The data must be accessible to the respective person.
7. Lack of knowledgeable professionals who can handle and deals with diverse
and large data.
▪ To overcome these challenges and drawbacks, intelligent data analysis is used.
▪ The valuable information can be gathered with the help of machines, thus reducing
processing time, cost and errors.

INTELLIGENT DATA ANALYSIS (IDA)

▪ Intelligent Data Analysis (IDA) discloses hidden facts that are not known previously
and provides potentially important information or facts from large quantities of data.
▪ It also helps in making a decision.
▪ IDA helps to obtain useful information, necessary data and interesting models from a
lot of data available online in order to make the right choices.
▪ The main goal of intelligent data analysis is to obtain knowledge.
▪ Data analysis is the process of a combination of extracting data from data set,
analysing, classification of data, organizing, reasoning and so on.
▪ Intelligent data analysis is the combination of advanced statistical techniques,
human intuition and serious computing power to address real world data intensive
problems.
▪ Analysis is a scientific process to discover the meaningful patterns and structures
hidden within the mountain of available data and transform this data into
information for better decisions.
▪ IDA is one of the major issues in Artificial Intelligence (AI) and information.
▪ Intelligent data analysis discloses hidden facts that are not known previously and
provides potentially important information or facts from large quantities of data.
▪ IDA helps to obtain useful information, necessary data and interesting models.
▪ In general, IDA includes three stages:
1. Preparation of data
2. Data mining
3. Data validation and explanation

NATURE OF DATA
▪ Data are raw facts that have not been processed to explain their meaning.
▪ Data are stored in the database and data base management system manages data.
i.e., it stores, update and retrieve from the database.
▪ There are 3 types of data:
1. Structured data
2. Semi-structured data

Prepared by NITHIN SEBASTIAN

3. Unstructured data
STRUCTURED DATA
▪ Stored in tabular format.
▪ i.e., in the form of rows and columns.
▪ Structured data clearly defined and data is stored in a pre-defined data model.
▪ Ex: Excel files, SQL data bases
▪ Data are stored in rows and columns are related to each other.
▪ Hence get a proper view and understanding of data.
▪ Real life example:

▪ Structured data are stored in Relational Databases.

UN- STRUCTURED DATA

▪ No predefined structure.
▪ No data model
▪ Data is irregular and ambiguous.
▪ Ex: text, numbers, images, audios, videos, messages, social media post etc.
▪ Easy to extract data.
▪ 80- 90% of data are unstructured data.
▪ Real life example:
▪ Face book, Instagram & YouTube are unstructured data.
▪ It is complex task to analyse such data. hence Artificial Intelligence is used.
▪ Ex: Face recognition by google.
▪ Previously, only structured data was used extensively. But with the help of Artificial
Intelligence, unstructured data are commonly used.
▪ So, unstructured data is the most useful kind of data. & It provides a lot of
information.
SEMI-STRUCTURED DATA
▪ It falls between structured and unstructured data.
▪ It is a combination of both.
▪ Ex: Emails, XML, WWW.

ANALYTIC PROCESS
▪ The steps involved in data analytic process are:

Prepared by NITHIN SEBASTIAN

1. Collecting data
2. Cleaning data
3. Manipulating data
4. Analysing data
5. Visualizing data
▪ Ex: Travel industry:
Collecting Data
✓ If one person is travelling to Delhi, he uses one of the Aviation website,
provides basic details like destination, date of travel, price etc., He select one
with his budget & make the payment confirm. These data are collected by
travel company.
✓ Similarly, many people do the same thing, then it generates a lot of data.
✓ These data are stored in their web servers in tabular format. Hence it is easier
for analyst to analyse this data
Cleaning data
✓ If there are some missing values or un-structed data in tabular format, by
replacing missing values or deleting that row is called cleaning data.
✓ Now the data is clean and ready for analyse.
Manipulating Data
✓ Manipulates the data to create required features and variables.
✓ Ex: if the analyst adds new columns like return date etc.
Analysing Data
✓ Data analyse using logical methods and analytical techniques.
✓ Once it’s ready, advanced analytics processes can turn big data into big insights.
✓ Some of these big data analysis methods include:
● Data mining sorts through large datasets to identify patterns and relationships by
identifying anomalies and creating data clusters.
● Predictive analytics uses an organization’s historical data to make predictions
about the future, identifying upcoming risks and opportunities.
● Deep learning imitates human learning patterns by using artificial intelligence and
machine learning to layer algorithms and find patterns in the most complex and
abstract data.

Visualizing Data
✓ Present data after complete analysis on data.
✓ Visualization means showing analysed data in visual or graphical format for
easy interpretation.

DATA ANALYSIS TOOLS

▪ Data analysis tools provide the analysed results visually.
▪ They are:

Prepared by NITHIN SEBASTIAN

Hadoop:
✓ It is an open-source framework that efficiently stores and processes big
datasets on clusters of commodity hardware.
✓ This framework is free and can handle large amounts of structured and
unstructured data, making it a valuable mainstay for any big data operation.
Microsoft Excel :
✓ Developed by Microsoft.
✓ It is a spread sheet program, used to create grid of numbers, texts and
various formulas.
✓ Easy to use & widely used tool.
✓ Excel works with other office software. i.e., Excel spreadsheets can be
easily added to Word document and Power point presentations.
✓ The biggest benefits of Excel are ability to organize large amounts of
data into orderly logical spreadsheets and charts.
RapidMiner:
✓ It is a data science software platform which helps with data
presentation and analysis.
✓ It is an integrated environment for:
• Data preparation
• Analysis
• Machine learning
• Deep learning
✓ It is widely used in every business and commercial sector.
✓ It has data exploration features such as:
• Graphs
• Descriptive statistics
• Visualization which allows users to get valuable insights.
✓ It has more than 1500 operators for data transformation and analysis
tasks.
Talend:
✓ It is an open-source software platform which offers data integration
and management.
✓ It specializes in big data integration.
✓ It is also available in open-source and premium versions.
✓ It is one of the best tools for cloud computing and big data
integration.
KNIME:
✓ It is a free and open-source data analysis tool to create data science
applications and build machine learning models.
✓ It is an analysing, reporting and integration platform.
✓ KNIME has been used in pharmaceutical research and customer data
analysis, business intelligence, text mining & financial data analysis.
✓ It provides interactive graphical user interface to create visual
workflows.

Prepared by NITHIN SEBASTIAN

▪ Other tools are:

SAS ( Statistical Analysis System)
R and Python
Apache spark
Power BI
Tableau

ANALYSIS VS REPORTING
REPORTING
▪ It is the process of organizing data in the form of graphs and charts.
▪ Reporting is used to provide facts, which can use to draw conclusions, avoid
problems or create plans.
▪ Reporting presents the actual data to end-users, after collecting, sorting and
summarizing it to make it easy to understand.
▪ Reporting offers no judgment or insight.
▪ It focuses on what is happening.
▪ High-level overview of data.
ANALYSING
▪ It is the process of exploring data in order to extract a meaningful insight.
▪ Analytics offers pre-analysed conclusions that a company can use to solve problems
and improve its performance.
▪ Analytics doesn't present the data but instead draws information from the available
data and uses it to generate insights, forecasts and recommended actions.
▪ It focuses on why is something happening.
▪ Data analytics focuses on “why” something is happening within an organization.
▪ Interpret data at a deeper level.

STATISTICAL CONCEPTS
▪ Statistics is a applied mathematics were we collect, organize, analyse and interpret
numerical facts.
▪ Statistical methods are the concepts models and formulas of mathematics used in
the Statistical analysis of data.
▪ It is the science of collecting, exploring and presenting large amounts of data to
identify patterns and trends.
▪ It is also called quantitative analysis.
SAMPLING DISTRIBUTIONS
• In statistics, a population is the entire pool from which a statistical sample is
drawn.
• A population may refer to an entire group of people, objects, events, hospital
visits, or measurements.
• A population can thus be said to be an aggregate observation of subjects
grouped together by a common feature.

Prepared by NITHIN SEBASTIAN

• A lot of data drawn and used by academicians, statisticians, researchers,

marketers, analysts, etc. are actually samples, not populations.
• A sample is a subset of a population.
• A sampling distribution is a probability distribution of a statistic obtained
from a larger number of samples drawn from a specific population.
• The sampling distribution of a given population is the distribution of
frequencies of a range of different outcomes that could possibly occur for a
statistic of a population.
• Sampling Distribution is a statistic that aims to guess a large number of
samples obtained from a specific group repeatedly.
• In statistics, the probability is used for calculating the likely occurrence of a
phenomenon.
• This is done by collecting samples from populations.
• A lot of data that is collected over time, aim to calculate the probabilities of
an event.
• This data is collected with utmost precision.
• Sampling distribution involves more than one statistical value of a sample.
• The primary purpose of Sampling Distribution is to establish representative
results of small samples of a comparatively larger population.
• The significance of sampling distribution :
✓ It provides accuracy.
✓ Provides consistency.
RE-SAMPLING

• The problem with the sampling process is that we only have a single estimate
of the population parameter, with little idea of the variability or uncertainty in
the estimate.
• One way to address this is by estimating the population parameter multiple
times from our data sample. This is called resampling.
• Re-sampling is the method that consists of creating or drawing repeated
samples from the original samples.
• Resampling involves the selection of randomized cases with replacement
from the original data sample in such a manner that each number of a sample
drawn has a number of cases that are similar to the original data sample.

STATISTICAL INFERENCE

• Statistical inference is the process of analysing the result and making

conclusions from data.
• It is also called inferential statistics.
• Statistical inference is a method of making decisions about the parameters of
a population, based on random sampling.
• It helps to assess the relationship between the dependent and independent
variables.

Prepared by NITHIN SEBASTIAN

• different types of statistical inferences are:

✓ Pearson Correlation
✓ Bi-variate regression
✓ Multi-variate regression
✓ Chi-square statistics and contingency table
✓ ANOVA or T-test
• The statistical inference has a wide range of application in different fields,
such as:
✓ Business Analysis
✓ Artificial Intelligence
✓ Financial Analysis
✓ Fraud Detection
✓ Machine Learning etc.
PREDICTION ERROR

• Predictive analytical processes use new and historical data to forecast activity,
behaviour, and trends.
• A prediction error is the failure of some expected event to occur.
• When prediction fails, humans can use different methods, examining
predictions and failures and deciding some methods to overcome such errors
in the future.
• Applying that type of knowledge can inform decisions and improve the
quality of future prediction.

Prepared by NITHIN SEBASTIAN

Common questions

Conventional systems face several challenges in managing big data due to its volume, variety, and velocity. These systems struggle with storing and processing large quantities of data in real-time while ensuring that the data is meaningful and free from inconsistencies. Challenges include the need for real-time data processing, managing data from diverse sources, and the requirement for accurate data. Solutions involve adopting frameworks like Hadoop, which can handle structured and unstructured data, and utilizing intelligent data analysis to reduce processing time, cost, and errors .

The value of big data in the healthcare sector is attributed to its ability to enhance disease detection, improve treatment outcomes, and reduce costs. By analyzing extensive data sets, healthcare providers can identify disease patterns early, tailor treatments based on patient data, and streamline operations to minimize expenses. This results in more effective patient care and resource allocation, demonstrating the strategic importance of big data in healthcare .

Big data analytics can improve user experience in video games by analyzing user data to understand at which stages players pause, restart, or quit the game. This insight allows designers to rework on game elements and enhance the user experience. Furthermore, by utilizing big data, developers can tailor the gaming experience based on behavioral patterns, leading to better engagement and satisfaction .

Machine-generated data plays a crucial role in IoT by providing the data needed for devices to communicate and operate efficiently. This data comes from sensors, trackers, web logs, and other computer systems' logs, enabling the collection and processing of real-time information. By analyzing this data, IoT systems can perform predictive maintenance, improve operations, and enable automation, making them crucial for IoT infrastructure .

Intelligent Data Analysis (IDA) enhances decision-making by uncovering hidden facts and providing potentially important information from large data sets. It helps in making decisions by obtaining useful insights, necessary models, and pertinent data, which are crucial for informed decision-making processes. IDA involves preparation, data mining, and data validation stages, making it a vital tool for transforming data into actionable knowledge .

Data analysis tools like Hadoop and RapidMiner support big data initiatives by providing robust frameworks for data storage, processing, and analysis. Hadoop is an open-source framework that efficiently manages large datasets across clusters, ideal for both structured and unstructured data. RapidMiner offers a comprehensive platform for data preparation, machine learning, and data visualization. These tools enable organizations to handle big data efficiently, derive insights, and make data-driven decisions effectively .

Statistical inference contributes to AI and ML by enabling the assessment of relationships between variables and making informed decisions based on random sampling from data. It encompasses techniques like Pearson correlation, bivariate and multivariate regression, and ANOVA, which help in model building, hypothesis testing, and understanding data trends. In AI and ML, statistical inference is critical in training models and validating them, thus enhancing predictive accuracy and reliability .

The main components of the data analytic process include collecting, cleaning, manipulating, analyzing, and visualizing data. These steps contribute to gaining insights by ensuring that the data is accurate and relevant before analysis. Data collection gathers necessary data, cleaning removes errors and inconsistencies, manipulation prepares data for analysis by creating relevant variables, analysis extracts insights using logical and advanced analytics methods, and visualization presents analyzed data in an interpretable format, facilitating informed decision-making .

Unstructured data, which constitutes 80-90% of all data, is significant due to its abundance and richness in information. Unlike structured data, unstructured data does not follow a predefined format, which makes it versatile but also challenging to analyze. The impact on data analysis is notable; it requires robust tools and technologies like artificial intelligence to process, interpret, and extract valuable insights. Despite the complexity, unstructured data offers immense value by providing deep insights into consumer behavior and trends when properly analyzed .

Analysis and reporting serve different roles in data management. Reporting presents organized data in the form of graphs and charts, offering a high-level overview and focusing on what is happening, without judgment or insight. It aids end-users in understanding the data's state for informed decision-making. In contrast, analysis explores data to extract meaningful insights and provide pre-analyzed conclusions that drive problem-solving and performance improvements. Analysis delves into the 'why,' providing deeper interpretations and recommendations based on data .

Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
Understanding Data and Big Data Analytics
No ratings yet
Understanding Data and Big Data Analytics
21 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
34 pages
Understanding Big Data Platforms
No ratings yet
Understanding Big Data Platforms
20 pages
Understanding Big Data Analytics Concepts
No ratings yet
Understanding Big Data Analytics Concepts
119 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
50 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
8 pages
Big Data Analytics Overview and Insights
100% (1)
Big Data Analytics Overview and Insights
16 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
20 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
24 pages
Understanding Big Data Types and Analytics
No ratings yet
Understanding Big Data Types and Analytics
58 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
20 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
68 pages
Data Analytics Syllabus for B.Tech CSE
No ratings yet
Data Analytics Syllabus for B.Tech CSE
348 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
63 pages
Understanding Big Data Analytics Challenges
No ratings yet
Understanding Big Data Analytics Challenges
39 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
63 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
97 pages
Understanding Data and Big Data Concepts
No ratings yet
Understanding Data and Big Data Concepts
45 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
32 pages
Big Data Analytics Study Material
No ratings yet
Big Data Analytics Study Material
69 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
60 pages
Understanding Big Data and Its Applications
No ratings yet
Understanding Big Data and Its Applications
14 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
39 pages
Pixel-Oriented Visualization in Data Analytics
No ratings yet
Pixel-Oriented Visualization in Data Analytics
61 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
59 pages
Big Data Analytics: Types & Characteristics
No ratings yet
Big Data Analytics: Types & Characteristics
17 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
23 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
29 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
37 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
20 pages
Big Data Analytics Challenges Survey
No ratings yet
Big Data Analytics Challenges Survey
6 pages
Data Analytics New Quantum AKTU
No ratings yet
Data Analytics New Quantum AKTU
210 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
37 pages
Data Analytics Course Overview and Insights
No ratings yet
Data Analytics Course Overview and Insights
65 pages
Introduction to Big Data Concepts
100% (2)
Introduction to Big Data Concepts
33 pages
Big Data Analytics: Types & Challenges
No ratings yet
Big Data Analytics: Types & Challenges
44 pages
Understanding Big Data in Business
No ratings yet
Understanding Big Data in Business
84 pages
Understanding Data and Big Data Analytics
No ratings yet
Understanding Data and Big Data Analytics
34 pages
Data Analytics Overview for CSE Students
No ratings yet
Data Analytics Overview for CSE Students
26 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
69 pages
Big Data Analytics Fundamentals Guide
No ratings yet
Big Data Analytics Fundamentals Guide
44 pages
Understanding Big Data Types and Analytics
No ratings yet
Understanding Big Data Types and Analytics
17 pages
Understanding Semi-Structured Data
No ratings yet
Understanding Semi-Structured Data
15 pages
Understanding Big Data Analysis Techniques
No ratings yet
Understanding Big Data Analysis Techniques
26 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
10 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
31 pages
Understanding Big Data Essentials
No ratings yet
Understanding Big Data Essentials
31 pages
Data Science Fundamentals Overview
No ratings yet
Data Science Fundamentals Overview
47 pages
Big Data Seminar Overview and Insights
No ratings yet
Big Data Seminar Overview and Insights
57 pages
Big Data: Introduction and Characteristics
No ratings yet
Big Data: Introduction and Characteristics
53 pages
Big Data Analytics: Overview and Applications
No ratings yet
Big Data Analytics: Overview and Applications
78 pages
Data Analytics Fundamentals with R
No ratings yet
Data Analytics Fundamentals with R
76 pages
Big Data Analysis Exam Key & Scheme
No ratings yet
Big Data Analysis Exam Key & Scheme
54 pages
Big Data Analytics Unit 1 Overview
No ratings yet
Big Data Analytics Unit 1 Overview
25 pages
Big Data Concepts and Characteristics
No ratings yet
Big Data Concepts and Characteristics
35 pages
Understanding Data Types and Big Data
No ratings yet
Understanding Data Types and Big Data
86 pages
Big Data Analytics: Key Concepts Explained
No ratings yet
Big Data Analytics: Key Concepts Explained
24 pages
Understanding Big Data: 5Vs and Applications
No ratings yet
Understanding Big Data: 5Vs and Applications
38 pages
Business Strategies of Dinshaw's vs Jersey
No ratings yet
Business Strategies of Dinshaw's vs Jersey
14 pages
Descriptive Survey on School Environment Impact
No ratings yet
Descriptive Survey on School Environment Impact
7 pages
Improving JHS Students' Chemistry Skills
No ratings yet
Improving JHS Students' Chemistry Skills
9 pages
Introduction to Sociology Concepts
No ratings yet
Introduction to Sociology Concepts
61 pages
Ebook & Testbank Statistics Informed Decisions Using Data 4th Edition by Michael Sullivan III
No ratings yet
Ebook & Testbank Statistics Informed Decisions Using Data 4th Edition by Michael Sullivan III
295 pages
Uses and Misuses of Statistics
85% (20)
Uses and Misuses of Statistics
15 pages
Research Methodology Complete Research Project Blueprint Udemy
No ratings yet
Research Methodology Complete Research Project Blueprint Udemy
1 page
Industrial Statistics and Quality Control
No ratings yet
Industrial Statistics and Quality Control
20 pages
AI Utilization in Grade 11 ABM Studies
No ratings yet
AI Utilization in Grade 11 ABM Studies
43 pages
Importance of Impervious Cover Analysis
No ratings yet
Importance of Impervious Cover Analysis
4 pages
Impact of Montessori Method in Nigeria
100% (1)
Impact of Montessori Method in Nigeria
42 pages
Research Proposal Preparation Guide
No ratings yet
Research Proposal Preparation Guide
7 pages
Testbank for Basic Marketing Research
No ratings yet
Testbank for Basic Marketing Research
20 pages
BPCC 134 Practical File Overview
No ratings yet
BPCC 134 Practical File Overview
25 pages
Branding A B2B Service - Does A Brand Differentiate A Logistics Service Provider?
No ratings yet
Branding A B2B Service - Does A Brand Differentiate A Logistics Service Provider?
10 pages
Social Media Marketing for Startups
100% (2)
Social Media Marketing for Startups
12 pages
Research MKT 426 Final
No ratings yet
Research MKT 426 Final
13 pages
Customer Satisfaction at Mang Inasal Valenzuela
50% (2)
Customer Satisfaction at Mang Inasal Valenzuela
35 pages
Key Concepts in Biomedical Research
No ratings yet
Key Concepts in Biomedical Research
145 pages
Public Support for School Safety Policies
No ratings yet
Public Support for School Safety Policies
37 pages
B2B Sampling Practices and Recommendations
No ratings yet
B2B Sampling Practices and Recommendations
16 pages
Traveler Destination Choice Model
No ratings yet
Traveler Destination Choice Model
8 pages
Predicting Online Gaming Addiction in Teens
No ratings yet
Predicting Online Gaming Addiction in Teens
12 pages
11A16BB65833
No ratings yet
11A16BB65833
15 pages
Marketability of Da Buena's Handicrafts
No ratings yet
Marketability of Da Buena's Handicrafts
71 pages
Understanding Probability Models and Rules
No ratings yet
Understanding Probability Models and Rules
14 pages
SPSS Case Studies on Employee Productivity
No ratings yet
SPSS Case Studies on Employee Productivity
23 pages
HSE Clinical Audit Guide 2023
No ratings yet
HSE Clinical Audit Guide 2023
64 pages
Hot Water Foot Bath Therapy for Fever
No ratings yet
Hot Water Foot Bath Therapy for Fever
17 pages
Skewness and Kurtosis Computation Guide
No ratings yet
Skewness and Kurtosis Computation Guide
1 page

Big Data: Nature and Challenges

Uploaded by

Big Data: Nature and Challenges

Uploaded by

1

UNIT 1 – INTRODUCTION TO BIG DATA

PART B ( 5 Marks each)

PART C ( 15 Marks each)

Prepared by NITHIN SEBASTIAN

Prepared by NITHIN SEBASTIAN

TYPES/SOURCES OF BIG DATA

CHALLENGES OF CONVENTIONAL SYSTEMS

Prepared by NITHIN SEBASTIAN

4. Collect correct data. By eliminating flaws, mistakes, incompleteness,

INTELLIGENT DATA ANALYSIS (IDA)

Prepared by NITHIN SEBASTIAN

▪ Structured data are stored in Relational Databases.

Prepared by NITHIN SEBASTIAN

DATA ANALYSIS TOOLS

Prepared by NITHIN SEBASTIAN

Prepared by NITHIN SEBASTIAN

▪ Other tools are:

Prepared by NITHIN SEBASTIAN

• A lot of data drawn and used by academicians, statisticians, researchers,

• Statistical inference is the process of analysing the result and making

Prepared by NITHIN SEBASTIAN

• different types of statistical inferences are:

Prepared by NITHIN SEBASTIAN

Common questions

What are the challenges faced by conventional systems in managing big data, and how can these be overcome?

What factors contribute to the value of big data in the healthcare sector?

In what ways can big data analytics improve user experience in video games?

Explain the role of machine-generated data in the context of the Internet of Things (IoT).

How does intelligent data analysis (IDA) enhance decision-making in data-intensive environments?

How do data analysis tools like Hadoop and RapidMiner support big data initiatives?

How does statistical inference contribute to artificial intelligence and machine learning?

What are the main components of the data analytic process, and how do they contribute to gaining insights?

Discuss the significance of unstructured data and its impact on data analysis.

Compare and contrast the roles of analysis and reporting in data management.

You might also like