STA 249 Probability and Statistics
Lecture 1: Introduction to Statistics and Data Analysis
A S S I S T. P R O F. D R . Y E T K İ N T U A Ç
A N K A R A U N I V E R S I T Y, FA C U LT Y O F S C I E N C E , D E PA R T M E N T O F
S TAT I S T I C S
Y T U A C @ A N K A R A . E D U.T R
2 0 2 5 - 2 0 2 6 FA L L
How to contact?
[Link]
Email: ytuac@[Link]
Room number: Block A Dekan Yardımcıları Office
Office Hours: Thursday, 14:00 – 17:00
BREAKDOWN OF GRADES:
Here is the plan:
The grade for STA 249 will be composed of the grades on:
One midterm (40%),
Final exam (60%),
Attendance (min. 50%). (Unless you are not exempt from attendance)
Reference Books
Some of this lecture notes are prepared according to the contents of
1. PROBABILITY & STATISTICS FOR ENGINEERS & SCIENTISTS by Walpole,
Myers, Myers and Ye
2. Statistics for Biomedical Engineers and Scientists How to Visualize and Analyze
Data» by Andrew P. King and Robert J. Eckersley
3. Course Notes on [Link]
How do we analyse our data?
Throughout the course, we will practice data analysis using IBM SPSS
(Statistical Package for the Social Sciences)
Students are expected to install SPSS on their personal computers from
University database at, [Link]
COURSE OBJECTIVES:
We will attempt to cover some or all of the following topics in general:
Give basic statistical concepts,
Understand randomness,
Model the phenomenon of randomness,
Establish the relationship between the problem in the real world and statistical theory,
Have knowledge about some concepts of probability theory,
Learn how to make data analysis with SPSS.
Course Content
Week 1. Introduction To Statistics And Data Analysis
Week 2. Summarizing Data: Tables And Diagrams
Week 3. Summarizing Data: Measures Of Tendency And Dispersion
Week 4. Probability
Week 5. Discrete Random Variables And Their Probability Distributions Probability
Week 6. Continuous Random Variables And Their Probability Distributions
Week 7. Sampling Distributions and Central Limit Theorem
Week 8. Properties of Point Estimators and Methods of Estimation
Week 9-10. Hypothesis Testing Statistics
Week 11-12. Simple Linear Regression and Correlation
Statistical Thinking
Engineers solve problems of interest to society by the efficient application of scientific
principles.
The engineering or scientific method is the approach to formulating and solving these
problems. (Chemometrics)
Statistical Thinking
The field of Probability
Used to quantify likelihood or chance
Used to represent risk or uncertainty in engineering applications
Can be interpreted as our degree of belief or relative frequency
The field of Statistics
Deals with the collection, presentation, analysis, and use of data to
Make decisions
Solve problems.
Definitions
Definitions
Statistics is the science of
◦ collection of methods for planning experiments,
◦ obtaining data, and then organizing,
◦ summarizing,
◦ analyzing,
◦ interpreting,
◦ drawing conclusions.
Definitions
The study of statistics has two major branches – descriptive(exploratory) statistics and inferential
statistics.
• Descriptive statistics is the branch of statistics that involves the organization, summarization,
and display of the data.
• Inferential statistics is the branch of statistics that involves using a sample to draw conclusions
about population. A basic tool in the study of inferential statistics is probability (i.e. α= 0.05).
Definitions
Population
◦ All subjects possessing a common characteristic that is being studied.
There are different types of population.
They are:
Finite Population
Infinite Population
Existent Population
Hypothetical Population
Sample
◦ A subgroup or subset of the population.
Individuals are the objects described by a set of data. Individuals may be people, but they may
also be animals or things (experiment units).
The term sample size simply means the number of elements in the sample.
Often in statistics, we compare samples from two different populations and try to determine
statistically if the populations are significantly different (Comparison tests).
Sampling
Sampling
Sampling consists of selecting some part of a population to observe so that one may estimate
something about the whole population.
Some questions:
– How best to obtain the sample and make the observations?
– Once the sample data are in hand, how best to use them to estimate the characteristic of the
whole population?
Sampling
Basically, there are two types of sampling. They are:
Probability sampling
Non-probability sampling
Probability Sampling
In probability sampling, the population units cannot be selected at the discretion of the
researcher. This can be dealt with following certain procedures which will ensure that every unit
of the population consists of one fixed probability being included in the sample. Such a method is
also called random sampling.
Some of the techniques used for probability sampling are:
Simple Random Sampling
Stratified Sampling
Cluster Sampling
Systematic Sampling
Simple Random Sampling
Every individual or item from the frame has an equal chance of being selected
Selection may be with replacement or without replacement
Samples obtained from table of random numbers or computer random number generators
Stratified Sampling
Stratified sampling is a method of dividing a
population into distinct subgroups (strata)
based on shared characteristics, and then
randomly sampling from each subgroup to
ensure representation across the entire
population.
Example: Imagine you're studying dietary
habits in a city. Rather than randomly picking
people from the whole population, you first
split them into age groups (e.g., teens, adults,
seniors), then randomly select participants
from each group so all age ranges are fairly
represented.
Cluster Sampling
Cluster sampling is a method where the population is divided into naturally occurring groups
(clusters), and then entire clusters are randomly selected for inclusion in the sample rather than
sampling individuals across the whole population.
Imagine you're studying school performance across a city. Instead of randomly selecting students
from every school, you randomly pick 5 schools (clusters) and include all students from those
schools in your sample. This saves time and resources while still capturing group-level variation.
Systematic Sampling
Systematic sampling is a method where you
select every nᵗʰ individual from a list or sequence
after choosing a random starting point. It’s
simple, efficient, and often used when the
population is ordered or evenly spaced
Imagine you have a list of 64 patients in a
hospital database. You want to select a sample of
8. You randomly pick a starting point — say,
patient #3 — and then select every 8ᵗʰ patient N = 64
from there: #3, #11, #19, #27… until you reach 8
n=8 First Group
patients. This ensures a spread-out, evenly
spaced sample. k=8
BASIC BUSINESS STATISTICS, 8E © 2002 PRENTICE-HALL, INC.
Non Probability Sampling
In non-probability sampling, the population units can be selected at the discretion of the
researcher. Those samples will use the human judgements for selecting units and has no
theoretical basis for estimating the characteristics of the population. Some of the techniques
used for non-probability sampling are
Quota sampling
Judgement sampling
Purposive sampling
Population and Sample Examples
All the students in the class are population whereas the top 10 students in the class are the
sample.
All the members of the parliament is population and the female candidates present there is the
sample.
Types of Variables
Variable
◦ Characteristic or attribute that can assume different values.
• Random Variable
◦ A variable whose values are determined by chance (throw a dice or flip a coin)
Types of Variables
A variable is any characteristic of an individual. A variable can take different values for different
individuals.
Qualitative Variables
◦ Variables which assume non-numerical (categorical) values.
◦ Nominal
◦ Ordinal
Quantitative Variable
◦ Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values. Usually obtained by
counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by measurement.
Types of Variables
Categorical variables
Categorical variables have values that describe a 'quality' or 'characteristic' of a data unit, like 'what
type' or 'which category’.
Categorical variables further described as nominal or ordinal:
A nominal variable is a categorical variable. Observations can take a value that is not able to be
organised in a logical sequence. Examples of nominal categorical variables include gender, business
type, eye colour, religion and brand.
An ordinal variable is a categorical variable. Observations can take a value that can be logically
ordered or ranked. The categories associated with ordinal variables can be ranked higher or lower
than another, but do not necessarily establish a numeric difference between each category. Examples
of ordinal categorical variables include academic grades (i.e. A, B1, B2, C1,…), clothing size (i.e. small,
medium, large, extra large) and attitudes (i.e. strongly agree, agree, disagree, strongly disagree).
The data collected for a categorical variable are qualitative data.
Types of Variables
Numeric variables
Numeric variables have values that describe a measurable quantity as a number, like 'how many' or
'how much'. Therefore numeric variables are quantitative variables.
Numeric variables further described as either continuous or discrete:
A continuous variable is a numeric variable. Observations can take any value between a certain set of
real numbers. The value given to an observation for a continuous variable can include values as small
as the instrument of measurement allows. Examples of continuous variables include height, time,
age, and temperature.
A discrete variable is a numeric variable. Observations can take a value based on a count from a set of
distinct whole values. A discrete variable cannot take the value of a fraction between one value and
the next closest value. Examples of discrete variables include the number of registered cars, number
of business locations, and number of children in a family, all of which measured as whole units (i.e. 1,
2, 3 cars).
The data collected for a numeric variable are quantitative data.
Types of Variables
Qualitative
Quantitative
(non-numerical-
(numerical)
categorical)
Nominal (sex, color Continuous
of eyes) (Height, weight, age)
Discrete
Ordinal (stage of
cancer, education (numbers of sisters
levels) or brothers, phones,
cars in a park)
Some Definitions
Parameter
◦ Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
◦ Characteristic or measure obtained from a sample.
Descriptive Statistics
◦ Collection, organization, summarization, and presentation of data.
Inferential Statistics
◦ Generalizing from samples to populations using probabilities. Performing hypothesis testing,
determining relationships between variables, and making predictions.
Scale
◦ It is the tools and equipment used to obtain numerical data
Example (Descriptive Statistics)
Collect data
◦ e.g. Survey
Present data
◦ e.g. Tables and graphs
Characterize data
◦ e.g. Sample mean = X i
n
Example (Inferential Statistics)
Estimation
◦ e.g.: Estimate the population mean weight using the
sample mean weight
Hypothesis testing
◦ e.g.: Test the claim that the population mean weight is
120 pounds
Drawing conclusions and/or making decisions concerning a population based on
sample results.
Example-1
Consider the following dataset with information about 10 different basketball players:
Solution-1
Qualitative Qualitative Quantitative Quantitative Quantitative
Nominal Nominal Discrete Continuous Discrete