0% found this document useful (0 votes)
70 views6 pages

Data Science Fundamentals Module

Data Science Interview Common Topics 1. Data Preprocessing: Handling missing values, outliers, and data normalization. 2. Machine Learning: Supervised and unsupervised learning, model evaluation, and selection. 3. Data Visualization: Communicating insights effectively using plots and charts. 4. Statistical Analysis: Hypothesis testing, confidence intervals, and regression analysis. 5. Domain Knowledge: Understanding the specific industry or problem domain. Types of Questions 1. Technical Questi

Uploaded by

havanoproduction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

Data Science Fundamentals Module

Data Science Interview Common Topics 1. Data Preprocessing: Handling missing values, outliers, and data normalization. 2. Machine Learning: Supervised and unsupervised learning, model evaluation, and selection. 3. Data Visualization: Communicating insights effectively using plots and charts. 4. Statistical Analysis: Hypothesis testing, confidence intervals, and regression analysis. 5. Domain Knowledge: Understanding the specific industry or problem domain. Types of Questions 1. Technical Questi

Uploaded by

havanoproduction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Feb 2024 - June 2024 Module Description

HTENG418/HCSCI439

Fundamentals of Data Science and


Big Data
Harry Mafukidze
(MEng Electronic Engineering)
Forewood
Data science is an interdisciplinary field encompassing scientific methods, processes, and
systems to extract knowledge or insights from data in various forms, either structured
or unstructured. It draws principles from mathematics, statistics, information science,
computer science, machine learning, visualization, data mining, and predictive analytics.
However, it is fundamentally grounded in mathematics. This module explains and applies
the fundamentals of data science crucial for learners who are interested in practicing data
science. It is an example driven subject providing complete Python examples to comple-
ment and clarify data science concepts, and enrich the learning experience.
The module is a necessary precursor to applying and implementing machine learning al-
gorithms, because it introduces the learner to foundational principles of the science of
data. In-depth knowledge of Python programming isn’t required, although basic princi-
ples of any high-level programming language is required.

Learning Outcomes
Ș Apply quantitative modeling and data analysis techniques to solve real world business
problems
Ș Demonstrate proficiency with statistical analysis of data
Ș Develop core competencies in programming, statistics, data analytics and machine
learning
Ș Develop the ability to build and assess data-based models
Ș Effectively present results using data visualization techniques
What do data scientists do?
Regardless of whether data science is just a part of statistics, and regardless of the domain to which
we’re applying data science, the goal is the same: to turn data into actionable value. The professional
society defines the related field of analytics as “the scientific process of transforming data into insight
for making better decisions.”

Turning data into actionable value usually involves answering questions using data. Here’s a typical
workflow for how that plays out in practice.

1. Obtain data that you hope will help answer the question.
2. Explore the data to understand it.
3. Clean and prepare the data for analysis.
4. Perform analysis, model building, testing, etc. (The analysis is the step most people think of
as data science, but it’s just one step! Notice how much more there is that surrounds it.)
5. Draw conclusions from your work.
6. Report those conclusions to the relevant stakeholders.

Our module focuses on all the steps except for the analysis. It is assumed that you have some basic
statistical analysis in one of the modules, and we will leverage that. (Later in our course we will review
simple linear regression and hypothesis testing.) If you have taken other relevant modules in statis-
tics, mathematical modeling, etc., and want to bring that knowledge in to use in this module, great,
but it’s not a requirement.
Syllabus
1. Introduction: 4. Mathematical Foundattions
- Big Data Overview, - Simple Linear Regression
- Importance of data science, - Multiple Linear Regression
- Big data analytics in industry verticals. - Logistic Regression
2. Data Analytics Lifecycle and methodology: 5. Visualization Techniues
- Business Understanding, - Data Visualization
- Data Understanding, - Histograms
- Data Preparation, - Scatter Plots
- Modelling and Evaluation. 6. Unsupervised Learning
3. Data exploration and pre-processing, Data - K-Means clustering
Analytics: - Density based clustering
- Theory and Methods, 7. Supervised Learning
- Unstructured Data Analytics, - CNN
- Data Visualization Techniques, - ANN
- Creating final deliverables - RNN
Assessment
• Assignments 15%
• Tests 25%
• Exam 60%

References
• Marz N, Warren J, Big Data: Principles and best practices of scalable realtime data
systems.
• Mayer-Schönberger V, Cukier K Big data: A revolution that will transform how we live,
work, and think.
• Zikopoulos P, Eaton C, Understanding big data: Analytics for enterprise class hadoop
and streaming data

You might also like