0% found this document useful (0 votes)

69 views9 pages

Big Data Analysis of COVID-19 Mortality

The project proposal focuses on analyzing socioeconomic and healthcare factors influencing COVID-19 mortality rates using big data analytics. It aims to explore patterns, identify key variables, and build predictive models to improve public health responses. The study employs various data science techniques and utilizes a comprehensive COVID-19 dataset from Our World in Data to derive insights for better health crisis preparedness.

Uploaded by

m-10498244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views9 pages

Big Data Analysis of COVID-19 Mortality

Uploaded by

m-10498244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

KULLIYYAH OF INFORMATION AND COMMUNICATION

TECHNOLOGY

SEMESTER 2, 2024/2025

CSCI 4341 BIG DATA ANALYTICS

PROJECT PROPOSAL

PROJECT TITLE :
MULTIVARIATE BIG DATA ANALYSIS OF SOCIOECONOMIC AND
HEALTHCARE FACTORS INFLUENCING COVID-19 MORTALITY RATES

NO. NAME MATRIC NUMBER

1 NURAMIRATUL AISYAH BINTI RUZAIDI 2212736

2 SHARIFAH SYAZWINA BINTI SYED SYAMSULHARIS 2214326

3 ANIS NAZIRA BINTI ABD GHANI 2219732

4 WISSEBO ABDULMAJID 2218587

LECTURER’S NAME : DR. SHARYAR WANI

1. Introduction

The global outbreak of COVID-19 in 2019 challenged health systems, strained

economies, and exposed vulnerabilities in healthcare access and socioeconomic resilience
across countries that were hidden before. Understanding the driving factors behind mortality
rates is very important not only for the evaluation of past responses but also for preparing for
future public health crises. With the wealth of publicly available big data, especially the
comprehensive COVID-19 dataset by Our World in Data (OWID), researchers are now
equipped to explore the correlation of multiple variables at a global scale using advanced
data science techniques.

Data science distinguishes itself from conventional analytics through its emphasis on
algorithmic modelling, machine learning, and predictive inference. While data analytics
typically focuses on summarizing and contextualizing past data, data science allows for
hypothesis testing, forecasting, and the discovery of hidden patterns. According to Johns
Hopkins' Data Science Specialization and BuiltIn's classification, modern data science
encompasses multiple types of analysis beyond descriptive statistics, such as causal
analysis, inferential analysis, diagnostic analysis, predictive modeling, and mechanistic
analysis. These methods will be employed to gather insights into the impact of healthcare
infrastructure and socioeconomic indicators on COVID-19 mortality rates.

This study will try to display the multidisciplinary nature of data science by integrating
statistical learning, data mining, and predictive algorithms to analyse over 160,000 global
observations. Furthermore, the project will introduce innovation by framing contemporary
questions at the intersection of global health and socioeconomics, employing multivariate
and temporal dimensions, and highlighting disparities between countries. All sources
referenced in this project, including the dataset, academic theories, and data science
methodologies, will be cited in IEEE format.

2. Research Questions and Hypotheses

A. Two-variable Data Science Questions

1. Causal Analysis:

● Question: Does higher GDP per capita directly cause lower COVID-19
mortality per million?
● Hypothesis: Countries with higher GDP per capita tend to have significantly
lower COVID-19 death rates.

2. Inferential Analysis:

● Question: Is the difference in COVID-19 death rates statistically significant

between continents with higher hospital bed capacity vs. those with lower?
● Hypothesis: Continents with more hospital beds per thousand show
significantly reduced mortality.
3. Predictive Analysis:

● Question: Can new deaths be predicted accurately using new case numbers
alone?
● Hypothesis: A linear regression model using new case counts will produce a
high predictive power for new deaths.

B. Multivariable Data Science Questions

4. Diagnostic Analysis:

● Question: Which combination of variables (ICU capacity, age demographics,

poverty level) best explains spikes in mortality during pandemic peaks?
● Hypothesis: A mix of inadequate ICU capacity and high aging population
correlates with mortality surges.

5. Predictive Analysis:

● Question: Can a machine learning model using vaccination rates, testing

rates, stringency index, and health infrastructure predict daily deaths
accurately?
● Hypothesis: Ensemble models like Random Forest or XGBoost will yield
>80% accuracy in forecasting death rates.

6. Causal Analysis:

● Question: Do socioeconomic indicators (HDI, poverty rate, GDP per capita)

causally impact COVID-19 fatality rates?
● Hypothesis: Countries with lower socioeconomic scores experience higher
mortality, independent of testing or reporting bias.

7. Inferential Analysis:

● Question: Is there a statistically significant difference in death rates between

countries with high and low healthcare spending per capita?
● Hypothesis: High-spending countries exhibit significantly better survival
outcomes.

8. Time Series Forecasting:

● Question: How do vaccination trends, policy stringency, and testing rates

affect the trajectory of death rates over time?
● Hypothesis: Improvements in vaccination and stringency policies result in
downward mortality trends, with a lag of ~14 days.

9. Cluster Analysis:

● Question: Can countries be grouped into clusters based on their COVID-19

mortality profile and healthcare infrastructure?
● Hypothesis: Clustering will reveal distinct regional risk groups, particularly
separating Global North vs. Global South.
10.Mechanistic Analysis:

● Question: What mechanisms explain why high-income nations with aged

populations still suffered high mortality?
● Hypothesis: Delay in lockdowns, inconsistent mask policies, and comorbidity
prevalence explain this paradox.

11.Exploratory Correlation Mapping:

● Question: What are the strongest correlates of total COVID-19 deaths among
10+ candidate variables?
● Hypothesis: Age, GDP, ICU beds, and vaccination levels are most correlated
with total deaths.

12.Dimensionality Reduction + Predictive Modeling:

● Question: After applying PCA to reduce the dataset's dimensionality, can we

still achieve high predictive accuracy for death outcomes?
● Hypothesis: Principal Components derived from healthcare and
socioeconomic dimensions will retain >90% of predictive signal.

3. Research Objective

This study aims to analyse how socioeconomic and healthcare factors influence
COVID-19 mortality rates using big data techniques. The objectives are:

● To explore patterns and key factors related to COVID-19 deaths across different
countries.

● To identify combinations of variables like ICU capacity, age and poverty level linked to
high death rates.

● To compare COVID-19 death rates between countries with different income levels
and healthcare spending.

● To examine whether factors like GDP per capita or Human Development Index (HDI)
have a direct effect on mortality.

● To build predictive models that estimate death rates based on case numbers,
vaccination rates, and other indicators.

● To provide recommendations based on the findings to help improve public health

responses in future pandemics.

4. Research Significance
This study uses data science methods to understand better the impact of
socioeconomic and healthcare factors on COVID-19 mortality. By analysing big data from
multiple countries, the research provides useful insights for governments and health
organisations to make informed, data-driven decisions.

Through diagnostic and predictive analysis, this study helps identify high-risk
conditions and patterns that contribute to higher death rates. Causal and inferential
techniques offer evidence on how income, healthcare access, and vaccination affect survival
outcomes.

The findings can support better planning for future health crises by helping countries
improve healthcare systems, allocate resources more effectively, and strengthen public
health policies. Overall, this research promotes global health preparedness and smarter
decision-making using data.

5. Literature Review

Numerous studies have explored the influence of socioeconomic and healthcare

factors on COVID-19 mortality using a variety of data science approaches. These studies
reveal how variables such as age, race, income level, healthcare capacity, and comorbidities
have significantly impacted COVID-19 outcomes. Techniques ranging from traditional
regression and statistical modeling to advanced machine learning have been employed to
uncover these relationships, offering both explanatory and predictive insights.

The table below summarizes the findings from five articles that are closely aligned
with the objectives of this proposal.

No Year Authors Research Techniques Results Future Works

Problem / Used (if any)
Application

1 2020 Li et al. Multivariate Multivariate Higher death Extend

analysis of regression rates analysis to
COVID-19 analysis associated other
case and with higher countries and
death rates in percentage of include more
U.S. counties Black time-depende
population nt variables
and lower
temperatures

2 2022 El Jai et al. Socio- Statistical Identified Suggested

economic modeling and strong links integration of
modeling of data analytics between mobility and
short-term socio- policy data in
COVID-19 economic future studies
trends indicators
and
short-term
case/death
fluctuations

3 2020 Albitar et al. Identifying Meta- Age, Recommend

clinical and analysis & comorbidities targeted
demographic systematic especially interventions
risk factors review diabetes and for high-risk
for COVID-19 hypertension,
mortality and male patients
gender
significantly
increased risk

4 2020 Cao et al. Influence of Spatial Older Suggested

demographic regression population integrating
and analysis and income health system
socioeconomi inequality variables for
c factors on linked to improved
case-fatality higher models
rate (CFR) case-fatality
globally rates

5 2022 Jamshidi et Predicting Machine Developed a Suggested

al. COVID-19 Learning model with integration
mortality Algorithms high accuracy with broader
using for predicting datasets for
machine mortality in validation
learning ICU patients

These studies demonstrate a growing interest in combining demographic, clinical,

and socioeconomic data to understand and predict COVID-19 mortality. Our current project
builds on this body of work by applying multivariate and big data analytics to a global
dataset, incorporating a wide range of variables and techniques, including causal inference,
predictive modeling, clustering, and dimensionality reduction, to develop a more holistic
understanding of the factors that influence COVID-19 death rates.

6. Methodology

A. Datasets and Tools Description

This dataset is sourced from Kaggle and originally developed by Our World in Data
(OWID) in collaboration with the University of Oxford, making it reliable and widely
accessible resources on the COVID-19 pandemic. They have developed a comprehensive
repository of global datasets focused on major issues affecting humanity. In response to the
COVID-19 pandemic, extensive data has been collected daily from countries and territories
worldwide.s. This dataset serves as a critical resource for researchers, policymakers, and
the general public to make informed decisions.

The dataset provides detailed information on COVID-19 cases, testing,

hospitalizations, vaccinations, and related metrics. It incorporates demographic,
epidemiological, healthcare, and policy-related variables to enable deep analysis of the
pandemic's global impact, underlying risk factors, and the effectiveness of healthcare
responses. It contains 67 attributes and 166,326 observations, covering daily and cumulative
counts of COVID-19 cases and deaths, testing and vaccination rates, hospital and ICU
occupancy, and various governmental policy measures. It supports time-series forecasting,
policy impact studies, and comparative country-level health system performance
evaluations.

For the analysis of the COVID-19 dataset, VS Code with Python language will be
used as primary tools. Python handled data preparation, cleaning, and exploratory data
analysis (EDA). Libraries such as Pandas, NumPy, and Matplotlib supported data
manipulation, computation, and visualization.

B. Research Process

1. Data Collection and Preprocessing

In the data collection and preprocessing phase, we will perform data cleaning steps
to handle missing values, remove duplicate rows, and normalize both numerical and
categorical columns. The dataset currently has quite a lot of missing values, with an
overall missing value percentage of 44.54%, which makes cleaning a crucial step
before any further analysis.

2. Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a key step to better understand the distribution
and relationships within the dataset before applying machine learning models or
other analyses. Through EDA, we will identify missing values, detect outliers, and
visualize the data using methods such as correlation analysis, box plots, and scatter
plots. This process will help uncover valuable insights and reveal trends that exist in
the data.

3. Machine Learning Algorithms Used

Various Machine Learning algorithms are planned to be used. Regression analysis
techniques will be employed to predict mortality rates based on multiple influencing
factors. Classification algorithms will be used to categorize countries or regions
based on high or low mortality risk. Clustering methods will be applied to identify
patterns among countries with similar socioeconomic and healthcare profiles.
References

[1] BuiltIn, “Types of Data Analysis: 8 Types & How to Use Them.” [Online]. Available:
[Link]

[2] Our World in Data, “COVID-19 dataset,” 2024. [Online]. Available:

[Link]

[3] “Data Science vs Data Analytics. Medium Article on the Distinctions.” [Online]. Available:
[Link]

[4] C. Bambra, R. Riordan, J. Ford, and F. Matthews, “The COVID-19 pandemic and health
inequalities,” J. Epidemiol. Community Health, vol. 74, no. 11, pp. 964–968, 2020, doi:
10.1136/jech-2020-214401.

[5] A. Y. Li et al., “Multivariate analysis of factors affecting COVID-19 case and death rate in
U.S. counties: The significant effects of Black race and temperature,” medRxiv, 2020, doi:
10.1101/2020.04.17.20069708.

[6] M. El Jai, M. Zhar, D. Ouazar et al., “Socio-economic analysis of short-term trends of

COVID-19: Modeling and data analytics,” BMC Public Health, vol. 22, p. 1633, 2022, doi:
10.1186/s12889-022-13788-4.

[7] O. Albitar, R. Ballouze, J. P. Ooi, and S. M. Sheikh Ghadzi, “Risk factors for mortality
among COVID-19 patients,” Diabetes Res. Clin. Pract., vol. 166, p. 108293, 2020, doi:
10.1016/[Link].2020.108293.

[8] Y. Cao, A. Hiyoshi, and S. Montgomery, “COVID-19 case-fatality rate and demographic
and socioeconomic influencers: Worldwide spatial regression analysis based on
country-level data,” BMJ Open, vol. 10, no. 10, p. e043560, 2020, doi:
10.1136/bmjopen-2020-043560.

[9] E. Jamshidi, A. Asgary, N. Tavakoli, and A. Zali, “Using machine learning to predict
mortality for COVID-19 patients on day 0 in the ICU,” Front. Digit. Health, vol. 3, p. 681608,
2022, doi: 10.3389/fdgth.2021.681608
BuiltIn. (n.d.). Types of Data Analysis: 8 Types & How to Use Them. Retrieved from
[Link]

Our World in Data. (2024). COVID-19 dataset. Retrieved from

[Link]

Data Science vs Data Analytics. (n.d.). Medium Article on the Distinctions. Retrieved from
[Link]

Bambra, C., Riordan, R., Ford, J., & Matthews, F. (2020). The COVID-19 pandemic and
health inequalities. Journal of Epidemiology and Community Health, 74(11), 964–968.
[Link]

Li, A. Y., Hannah, T. C., Durbin, J. R., Dreher, N., McAuley, F. M., Marayati, N. F., Spiera, Z.,
Ali, M., Gometz, A., Kostman, J. T., & Choudhri, T. F. (2020). Multivariate analysis of
factors affecting COVID-19 case and death rate in U.S. counties: The significant effects of
Black race and temperature. medRxiv. [Link]

El Jai, M., Zhar, M., Ouazar, D. et al. Socio-economic analysis of short-term trends of
COVID-19: Modeling and data analytics. BMC Public Health 22, 1633 (2022).
[Link]

Albitar, O., Ballouze, R., Ooi, J. P., & Sheikh Ghadzi, S. M. (2020). Risk factors for mortality
among COVID-19 patients. Diabetes Research and Clinical Practice, 166, 108293.
[Link]

Cao, Y., Hiyoshi, A., & Montgomery, S. (2020). COVID-19 case-fatality rate and demographic
and socioeconomic influencers: Worldwide spatial regression analysis based on
country-level data. BMJ Open, 10(10), e043560.
[Link]

Jamshidi, E., Asgary, A., Tavakoli, N., & Zali, A. (2022). Using machine learning to predict
mortality for COVID-19 patients on day 0 in the ICU. Frontiers in Digital Health, 3,
681608. [Link]

Common questions

Machine learning is employed to predict COVID-19 mortality using multiple variables like vaccination rates, testing rates, and healthcare infrastructure. The study hypothesizes that ensemble models such as Random Forest or XGBoost can achieve over 80% accuracy in forecasting death rates . Machine learning techniques also aid in time series forecasting and identifying patterns through clustering of countries based on mortality and health infrastructure profiles .

The study proposes that both healthcare infrastructure (such as ICU capacity) and socioeconomic indicators (like GDP per capita and poverty levels) significantly influence COVID-19 mortality rates. It highlights that countries with inadequate healthcare capacity combined with socioeconomic vulnerabilities witness more severe mortality impacts . The research aims to uncover patterns using multivariate big data analysis at a global scale, analyzing over 160,000 observations to understand these impacts .

Multivariate data analysis enables the study to explore the complex, intertwined effects of multiple factors on COVID-19 mortality rates. By considering various variables such as ICU capacity, age demographics, and socioeconomic factors simultaneously, the study uncovers hidden patterns and interdependencies that single-variable analysis might miss. This methodology leverages big data's full potential, providing a comprehensive understanding of how different health and socioeconomic aspects impact mortality .

Time series forecasting is crucial for understanding the dynamic changes in COVID-19 death trends over time, particularly how vaccination rates, policy stringency, and testing affect these trends. The study posits that improvements in vaccinations and policy implementations lead to reduced mortality with a lag period, thereby informing timely public health interventions and policy-making . It helps in observing the trajectory of death rates, revealing how timely actions can alleviate health outcomes .

Clustering analysis groups countries into clusters based on similarities in their COVID-19 mortality profiles and healthcare infrastructure. This method can reveal distinct regional risk patterns and disparities, particularly highlighting differences between the Global North and South. Understanding these clusters helps identify high-risk areas and tailor health interventions more effectively . It provides insights into regional vulnerabilities and capacities for handling pandemic challenges .

Exploratory correlation mapping helps identify the strongest correlates of COVID-19 mortality among numerous candidates, such as age, GDP, ICU bed availability, and vaccination rates. By mapping these correlations, this approach highlights which factors are most strongly linked to high death rates, thus informing policymakers on which areas require the most attention to mitigate future pandemic impacts . It helps detect impactful variables, guiding resource allocation and policy adjustments .

Dimensionality reduction through techniques like PCA is expected to simplify the dataset while retaining the majority of the predictive signals related to COVID-19 mortality outcomes. This reduction is anticipated to maintain over 90% of the predictive accuracy by focusing on principal components derived from healthcare and socioeconomic dimensions. This facilitates efficient processing and modeling, aiding in building robust predictive models . It streamlines complex data analysis while preserving essential information .

Predictive models in this research aim to estimate mortality rates based on diverse factors like case numbers, vaccination rates, and socioeconomic indicators. These models are used to forecast death rates and identify high-risk conditions and demographic trends, aiding in timely and effective public health responses. They also serve to validate hypotheses about factors influencing mortality, ultimately guiding recommendations for future health crises planning . The objective is to derive actionable insights for improving response strategies .

Causal and inferential analyses are employed to discern and quantify the impact of socioeconomic indicators like Human Development Index, poverty rates, and GDP per capita on COVID-19 mortality. The study hypothesizes that these factors can causally affect mortality rates, showing higher mortality in countries with lower socioeconomic scores, independent of testing or reporting biases . Inferential analysis can identify significant differences in mortality linked to healthcare spending disparities, thus providing data-driven evidence for policy changes .

The study integrates a range of data science methodologies including causal inference, predictive modeling, clustering, and dimensionality reduction to build a comprehensive understanding of COVID-19 mortality factors. By combining these methods, the research aims to provide a holistic analysis, from identifying causal relations to predicting outcomes and clustering regions for targeted interventions. This multifaceted approach leverages big data's capabilities to gain deeper insights into the pandemic's complexities . The integration ensures thoroughness and precision in addressing mortality challenges .

Big Data Analysis of COVID-19 Mortality
No ratings yet
Big Data Analysis of COVID-19 Mortality
9 pages
COVID-19 Mortality Trends Analysis
No ratings yet
COVID-19 Mortality Trends Analysis
19 pages
COVID-19 Analysis Using Big Data Techniques
No ratings yet
COVID-19 Analysis Using Big Data Techniques
17 pages
Data Analytics in COVID-19 Management
No ratings yet
Data Analytics in COVID-19 Management
2 pages
COVID-19 Impact Analysis and Comparison
No ratings yet
COVID-19 Impact Analysis and Comparison
9 pages
Big Data Framework for COVID-19 Response
No ratings yet
Big Data Framework for COVID-19 Response
20 pages
COVID-19 Data Visualization and Prediction
No ratings yet
COVID-19 Data Visualization and Prediction
85 pages
Data 09 00025
No ratings yet
Data 09 00025
19 pages
(IJCST-V12I5P4) :saiakhil Chilaka
No ratings yet
(IJCST-V12I5P4) :saiakhil Chilaka
5 pages
COVID-19 Fatality Rate Fluctuations in Brazil
No ratings yet
COVID-19 Fatality Rate Fluctuations in Brazil
58 pages
COVID-19 Infection Factors Analysis
No ratings yet
COVID-19 Infection Factors Analysis
10 pages
COVID-19 Real-Time Forecasts & Risk Analysis
No ratings yet
COVID-19 Real-Time Forecasts & Risk Analysis
10 pages
Data Science for Disease Outbreak Prediction
No ratings yet
Data Science for Disease Outbreak Prediction
10 pages
Predictive Modeling in COVID-19 Response
No ratings yet
Predictive Modeling in COVID-19 Response
10 pages
MMMMM
No ratings yet
MMMMM
23 pages
COVID-19 Spread: Deep Learning Insights
No ratings yet
COVID-19 Spread: Deep Learning Insights
26 pages
1 en 42 Chapter Author
No ratings yet
1 en 42 Chapter Author
18 pages
Lessons from COVID-19 Pandemic Data
100% (1)
Lessons from COVID-19 Pandemic Data
2 pages
COVID-19 Tracking Dashboard in India
No ratings yet
COVID-19 Tracking Dashboard in India
17 pages
AI and Big Data in Epidemiology
No ratings yet
AI and Big Data in Epidemiology
11 pages
Bayesian Models for COVID-19 Forecasting
No ratings yet
Bayesian Models for COVID-19 Forecasting
13 pages
COVID-19 Data Analytics Models Survey
No ratings yet
COVID-19 Data Analytics Models Survey
14 pages
COVID-19 Prediction Models Overview
No ratings yet
COVID-19 Prediction Models Overview
4 pages
Covid-19 Data Analysis in India
No ratings yet
Covid-19 Data Analysis in India
18 pages
COVID-19 Mortality Risk Prediction Study
No ratings yet
COVID-19 Mortality Risk Prediction Study
19 pages
COVID-19 Analytics and Data Science Insights
No ratings yet
COVID-19 Analytics and Data Science Insights
6 pages
SSRN Id3635047
No ratings yet
SSRN Id3635047
93 pages
Data Science in Healthcare: COVID-19 Insights
No ratings yet
Data Science in Healthcare: COVID-19 Insights
4 pages
COVID-19 Global Impact Analysis
No ratings yet
COVID-19 Global Impact Analysis
4 pages
COVID-19 Data Analysis with R & Python
No ratings yet
COVID-19 Data Analysis with R & Python
3 pages
AI in Epidemiology: Surveillance & Insights
No ratings yet
AI in Epidemiology: Surveillance & Insights
9 pages
Analysis of The Epidemic Curve of The Waves of COVID-19 Using Integration of Functions and Neural Networks in Peru
No ratings yet
Analysis of The Epidemic Curve of The Waves of COVID-19 Using Integration of Functions and Neural Networks in Peru
17 pages
COVID-19 Risk Assessment Tool Development
No ratings yet
COVID-19 Risk Assessment Tool Development
15 pages
COVID-19 Pandemic: Impacts and Insights
No ratings yet
COVID-19 Pandemic: Impacts and Insights
31 pages
COVID-19 Stochastic Modeling Insights
No ratings yet
COVID-19 Stochastic Modeling Insights
17 pages
COVID-19 Risk Prediction Using Clustering
No ratings yet
COVID-19 Risk Prediction Using Clustering
5 pages
A Downscaling Approach To Compare COVID-19 Count Data From Databases Aggregated at Di Fferent Spatial Scales
No ratings yet
A Downscaling Approach To Compare COVID-19 Count Data From Databases Aggregated at Di Fferent Spatial Scales
23 pages
OxCOVID19 Database: COVID-19 Data Hub
No ratings yet
OxCOVID19 Database: COVID-19 Data Hub
11 pages
COVID-19 Agent-Based Model in Islands
No ratings yet
COVID-19 Agent-Based Model in Islands
25 pages
Predictive Monitoring for COVID-19 Insights
No ratings yet
Predictive Monitoring for COVID-19 Insights
13 pages
Diagnostics 12 01396 v2
No ratings yet
Diagnostics 12 01396 v2
28 pages
COVID-19 Prediction Using Tree Models
No ratings yet
COVID-19 Prediction Using Tree Models
12 pages
National Framework for COVID-19 Data
No ratings yet
National Framework for COVID-19 Data
9 pages
Age, Hospitalization, and Mortality Analysis
No ratings yet
Age, Hospitalization, and Mortality Analysis
30 pages
Key Determinants in Epidemiology
No ratings yet
Key Determinants in Epidemiology
4 pages
Real-Time Country-Level Pandemic Analysis
No ratings yet
Real-Time Country-Level Pandemic Analysis
10 pages
COVID-19 Predictive Modeling for Humanitarian Response
No ratings yet
COVID-19 Predictive Modeling for Humanitarian Response
15 pages
COVID-19 Data Analysis and Insights
No ratings yet
COVID-19 Data Analysis and Insights
6 pages
Data-Driven Nursing in COVID-19
No ratings yet
Data-Driven Nursing in COVID-19
3 pages
Statistical Modeling in Public Health
No ratings yet
Statistical Modeling in Public Health
3 pages
Descriptive Statistics in COVID-19 Analysis
No ratings yet
Descriptive Statistics in COVID-19 Analysis
10 pages
Data Science in A Pandemic: Review
No ratings yet
Data Science in A Pandemic: Review
12 pages
COVID-19 Mortality Trends Analysis
No ratings yet
COVID-19 Mortality Trends Analysis
8 pages
AI/ML Strategies for COVID-19 Response
No ratings yet
AI/ML Strategies for COVID-19 Response
24 pages
COVID-19 Data Analysis Insights Report
No ratings yet
COVID-19 Data Analysis Insights Report
4 pages
COVID-19 Nutrition Data Mapping Guide
No ratings yet
COVID-19 Nutrition Data Mapping Guide
338 pages
Omicron Impact on Ethnic Groups in NZ
No ratings yet
Omicron Impact on Ethnic Groups in NZ
30 pages
Final Year Progress Report: Computer Science
No ratings yet
Final Year Progress Report: Computer Science
22 pages
Computation and Complexity Quiz 2022/23
No ratings yet
Computation and Complexity Quiz 2022/23
1 page
Developer Perceptions of Open Source Quality
No ratings yet
Developer Perceptions of Open Source Quality
10 pages
Causes of Species Extinction Explained
No ratings yet
Causes of Species Extinction Explained
1 page
5 Reasons to Ban Single-Use Plastic
No ratings yet
5 Reasons to Ban Single-Use Plastic
1 page
Data Visualization in Public Health
100% (1)
Data Visualization in Public Health
38 pages
Cell Culture and Cytology Lab Overview
No ratings yet
Cell Culture and Cytology Lab Overview
78 pages
Introduction to UV-Visible Spectroscopy
No ratings yet
Introduction to UV-Visible Spectroscopy
10 pages
Types of Quantitative Research Designs
No ratings yet
Types of Quantitative Research Designs
11 pages
Caza Bacterias
100% (1)
Caza Bacterias
20 pages
Understanding Organizational Behaviour
No ratings yet
Understanding Organizational Behaviour
40 pages
Statistical Inference Course Overview
No ratings yet
Statistical Inference Course Overview
148 pages
Importance of Literature Surveys in Research
No ratings yet
Importance of Literature Surveys in Research
12 pages
Statistical Analysis and Forecasting Techniques
No ratings yet
Statistical Analysis and Forecasting Techniques
21 pages
Understanding Randomness and Statistics
No ratings yet
Understanding Randomness and Statistics
9 pages
Carl F. Graumann Auth., Joseph R. Royce, Leendert P. Mos Eds. Humanistic Psychology Concepts and Criticisms
100% (1)
Carl F. Graumann Auth., Joseph R. Royce, Leendert P. Mos Eds. Humanistic Psychology Concepts and Criticisms
321 pages
Differences in Planar Chromatography
100% (1)
Differences in Planar Chromatography
12 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
Applied Multivariate Analysis Syllabus
0% (1)
Applied Multivariate Analysis Syllabus
2 pages
Statistics for Business Testbank 14th Ed.
No ratings yet
Statistics for Business Testbank 14th Ed.
23 pages
Evaluating and Rethinking The Case Study: Randy Stoecker
No ratings yet
Evaluating and Rethinking The Case Study: Randy Stoecker
25 pages
New Political History in the U.S.
No ratings yet
New Political History in the U.S.
20 pages
Forensic Pathology: Trauma Analysis
No ratings yet
Forensic Pathology: Trauma Analysis
31 pages
Significance Testing with Chi-Square and t-Test
100% (3)
Significance Testing with Chi-Square and t-Test
4 pages
Business Research Methodology Quiz
No ratings yet
Business Research Methodology Quiz
52 pages
TTP Leader Guide Final
No ratings yet
TTP Leader Guide Final
49 pages
Statistics in Human Performance Guide
No ratings yet
Statistics in Human Performance Guide
15 pages
Understanding Mathematical Modeling
No ratings yet
Understanding Mathematical Modeling
4 pages
Introduction to Statistics by Dr. Liuzzi
No ratings yet
Introduction to Statistics by Dr. Liuzzi
15 pages
Understanding Statistical Hypotheses
No ratings yet
Understanding Statistical Hypotheses
1 page
Total Quality Management in Construction
No ratings yet
Total Quality Management in Construction
172 pages
Liquidity, Profitability, Leverage & Financial Distress
No ratings yet
Liquidity, Profitability, Leverage & Financial Distress
12 pages
Needham Puzzle: China's Industrial Gap
No ratings yet
Needham Puzzle: China's Industrial Gap
25 pages
Single-Slit Laser Diffraction Lab
No ratings yet
Single-Slit Laser Diffraction Lab
4 pages
Grade 9 Project Table of Contents
No ratings yet
Grade 9 Project Table of Contents
2 pages

Big Data Analysis of COVID-19 Mortality

Uploaded by

Big Data Analysis of COVID-19 Mortality

Uploaded by

KULLIYYAH OF INFORMATION AND COMMUNICATION

CSCI 4341 BIG DATA ANALYTICS

NO. NAME MATRIC NUMBER

1 NURAMIRATUL AISYAH BINTI RUZAIDI 2212736

2 SHARIFAH SYAZWINA BINTI SYED SYAMSULHARIS 2214326

3 ANIS NAZIRA BINTI ABD GHANI 2219732

4 WISSEBO ABDULMAJID 2218587

LECTURER’S NAME : DR. SHARYAR WANI

The global outbreak of COVID-19 in 2019 challenged health systems, strained

2. Research Questions and Hypotheses

A. Two-variable Data Science Questions

1.​ Causal Analysis:

2.​ Inferential Analysis:

●​ Question: Is the difference in COVID-19 death rates statistically significant

B. Multivariable Data Science Questions

4.​ Diagnostic Analysis:

●​ Question: Which combination of variables (ICU capacity, age demographics,

5.​ Predictive Analysis:

●​ Question: Can a machine learning model using vaccination rates, testing

6.​ Causal Analysis:

●​ Question: Do socioeconomic indicators (HDI, poverty rate, GDP per capita)

7.​ Inferential Analysis:

●​ Question: Is there a statistically significant difference in death rates between

8.​ Time Series Forecasting:

●​ Question: How do vaccination trends, policy stringency, and testing rates

9.​ Cluster Analysis:

●​ Question: Can countries be grouped into clusters based on their COVID-19

●​ Question: What mechanisms explain why high-income nations with aged

11.​Exploratory Correlation Mapping:

12.​Dimensionality Reduction + Predictive Modeling:

●​ Question: After applying PCA to reduce the dataset's dimensionality, can we

●​ To provide recommendations based on the findings to help improve public health

​ Numerous studies have explored the influence of socioeconomic and healthcare

No Year Authors Research Techniques Results Future Works

1 2020 Li et al. Multivariate Multivariate Higher death Extend

2 2022 El Jai et al. Socio- Statistical Identified Suggested

3 2020 Albitar et al. Identifying Meta- Age, Recommend

4 2020 Cao et al. Influence of Spatial Older Suggested

5 2022 Jamshidi et Predicting Machine Developed a Suggested

​ These studies demonstrate a growing interest in combining demographic, clinical,

A.​ Datasets and Tools Description

The dataset provides detailed information on COVID-19 cases, testing,

B.​ Research Process

1.​ Data Collection and Preprocessing​

2.​ Exploratory Data Analysis​

3.​ Machine Learning Algorithms Used​

[2] Our World in Data, “COVID-19 dataset,” 2024. [Online]. Available:

[6] M. El Jai, M. Zhar, D. Ouazar et al., “Socio-economic analysis of short-term trends of

Our World in Data. (2024). COVID-19 dataset. Retrieved from

Common questions

What role does machine learning play in predicting COVID-19 mortality rates in this study?

What role does machine learning play in predicting COVID-19 mortality rates in this study?

How does the intersection of healthcare infrastructure and socioeconomic indicators influence COVID-19 mortality rates according to the proposed study?

How does the intersection of healthcare infrastructure and socioeconomic indicators influence COVID-19 mortality rates according to the proposed study?

What is the significance of using multivariate data analysis in this study on COVID-19 mortality?

What is the significance of using multivariate data analysis in this study on COVID-19 mortality?

What is the importance of time series forecasting in analyzing COVID-19 death trends?

What is the importance of time series forecasting in analyzing COVID-19 death trends?

How does clustering help in understanding regional differences in COVID-19 mortality profiles?

How does clustering help in understanding regional differences in COVID-19 mortality profiles?

What insights do exploratory correlation mapping offer in identifying key drivers of COVID-19 mortality?

What insights do exploratory correlation mapping offer in identifying key drivers of COVID-19 mortality?

What are the anticipated outcomes from applying dimensionality reduction techniques like PCA to the COVID-19 datasets in this research?

What are the anticipated outcomes from applying dimensionality reduction techniques like PCA to the COVID-19 datasets in this research?

What are the objectives of using predictive models in this research on COVID-19 mortality?

What are the objectives of using predictive models in this research on COVID-19 mortality?

How does the study propose to use causal and inferential analyses to understand the impact of socioeconomic factors on COVID-19 survival outcomes?

How does the study propose to use causal and inferential analyses to understand the impact of socioeconomic factors on COVID-19 survival outcomes?

How is the study integrating different data science methodologies to address COVID-19 mortality challenges?

How is the study integrating different data science methodologies to address COVID-19 mortality challenges?

You might also like

1. Causal Analysis:

2. Inferential Analysis:

● Question: Is the difference in COVID-19 death rates statistically significant

4. Diagnostic Analysis:

● Question: Which combination of variables (ICU capacity, age demographics,

5. Predictive Analysis:

● Question: Can a machine learning model using vaccination rates, testing

6. Causal Analysis:

● Question: Do socioeconomic indicators (HDI, poverty rate, GDP per capita)

7. Inferential Analysis:

● Question: Is there a statistically significant difference in death rates between

8. Time Series Forecasting:

● Question: How do vaccination trends, policy stringency, and testing rates

9. Cluster Analysis:

● Question: Can countries be grouped into clusters based on their COVID-19

● Question: What mechanisms explain why high-income nations with aged

11.Exploratory Correlation Mapping:

12.Dimensionality Reduction + Predictive Modeling:

● Question: After applying PCA to reduce the dataset's dimensionality, can we

● To provide recommendations based on the findings to help improve public health

Numerous studies have explored the influence of socioeconomic and healthcare

These studies demonstrate a growing interest in combining demographic, clinical,

A. Datasets and Tools Description

B. Research Process

1. Data Collection and Preprocessing

2. Exploratory Data Analysis

3. Machine Learning Algorithms Used