Course Name: Principle of Statistics
Table of Contents
Ch1: Fundamental Elements of Statistics
1.1 What are Statistics?
1.2 Why Study Statistics?
1.3 Who Uses Statistics?
1.4 Origin and Growth of Statistics
1.5 Four stages of statistical process
1.6 Functions of Statistics
1.7 Types of statistics
1.8 Types of Variables
1.9 Collecting Data and Obtaining Data
Ch2: Presentation of statistical data
2.1 some statistical terminology
2.2 Presentation of ungrouped data
2.3 Presentation of grouped data
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 1
Course Name: Principle of Statistics
Ch3: Measures of Central Tendency
3.1 The mean Ungrouped and grouped Data
3.2 The median Ungrouped and grouped Data
3.3 The mode Ungrouped and grouped Data
3.4 The range
3.5 Mean Deviation Definition
3.6 Sample Variance
3.7 variance and standard deviation
3.8 standard deviation Grouped data
Ch4: Regression Analysis
4.1 Linear model assumptions
4.2 Simple linear regression
4.3 Multiple linear regression
4.4 Problems for Regression Analysis
(i) Regression equation of X on Y
(ii) Regression coefficient of Y on X
(iii) Regression equation of Y on X
Chapter 5: Correlation Analysis
5.1 Types of correlation coefficient formulas
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 2
Course Name: Principle of Statistics
5.2 What is Pearson Correlation?
5.3 Potential problems with Pearson correlation
Ch6: Probability
6.1 Introduction to Probability
6.2 Laws of Probability
6.3 Empirical Probability
The Addition Rules for Probability
The Multiplication Rules and Conditional
Probability
Conditional Probability
Chapter One
Fundamental Elements of Statistics
What are Statistics?
Statistics is the science of collecting, organizing, presenting, analyzing,
and interpreting numerical data to assist in making more effective decisions.
Statistics is the science of data. It involves collecting, classifying,
summarizing, organizing, analyzing, and interpreting numerical information.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 3
Course Name: Principle of Statistics
Statistics is a branch of mathematics that examines ways to collect,
Analyze, interprets and presents data in a meaningful way.
Why Study Statistics?
1) Numerical information is every where
2) Statistical techniques are used to make decisions that affect our daily lives.
3) The knowledge of statistical methods will help you understand how decisions
are made and give you a better understanding of how they affect you.
4) Develop an understanding of some basic ideas of statistical reliability,
stochastic process (probability concepts).
5) Statistics is important in every aspect of society (Govt., People or Business)
6) To develop an appreciation for variability and how it effects product, process
and system.
7) It is estimating the present; predicting the future
8) Study methods that can be used to solve problems, build knowledge.
9) Statistics make data into information
No matter what line of work you select, you will find yourself faced with
decisions where an understanding of data analysis is helpful.
Who Uses Statistics?
Statistical techniques are used extensively by marketing, accounting,
quality control, consumers, professional sports people, hospital
administrators, educators, politicians, physicians, or Doctors, etc...
Origin and Growth of Statistics:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 4
Course Name: Principle of Statistics
The word ‘Statistics’ and ‘Statistical’ are all derived from the Latin word
Status, means a political state.
The theory of statistics as a distinct branch of scientific method is of
comparatively recent growth.
Research particularly into the mathematical theory of statistics is rapidly
proceeding and fresh discoveries are being made all over the world.
Four stages of statistical process
1) Collection of Data: It is the first step and this is the foundation upon which the entire
data set.
2) Presentation of data: The mass data collected should be presented in a suitable, concise
form
3) Analysis of data:
4) Interpretation of data:
Functions of Statistics
There are many functions of statistics. Let us consider the following five
important functions:
1) Condensation:
2) Comparison:
3) Forecasting:
4) Estimation:
5) Tests of Hypothesis:
1. Key Statistical Concepts
1) Experimental unit Object upon which we collect data
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 5
Course Name: Principle of Statistics
2) A population is a collection of all possible individuals, objects, or
measurements of interest.
3) Variable Variables are properties or characteristics of some event, object, or person that
can take on different values or amounts;
Constants do not vary.
Variables may be...
1) Independent or dependent;
2) discrete or continuous;
3) Qualitative or quantitative.
4) A sample a sample is a portion, or part, of the population of interest.
A measurement is a number or attribute computed for each member of a population or of a
sample. The measurements of sample elements are collectively called the sample data.
A parameter is a number that summarizes some aspect of the population as a whole.
Types of statistics
There are two main branches of statistics: descriptive and inferential. The Descriptive
statistics is used to say something about a set of information that has been collected only.
The Inferential statistics is used to make predictions or comparisons about larger group (a
population) using information gathered about a small part of that population. Thus, inferential
statistics involves generalizing beyond the data, something that descriptive statistics does not do.
1) Descriptive statistics: methods of organizing, summarizing, and presenting data in
an informative way.
2) EXAMPLE 1: The United States government reports the population of the United States
was 179,323,000 in 1960; 203,302,000 in 1970; 226,542,000 in 1980; 248,709,000 in 1990,
and 265,000,000 in 2000.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 6
Course Name: Principle of Statistics
2) Inferential Statistics: A decision, estimate, prediction, or generalization about a
population, based on a sample.
Note: In statistics the word population and sample have a broader meaning.
A population or sample may consist of individuals or objects
Types of Variables
Quantitative data are measurements that are recorded on a naturally occurring
numerical scale. Or are numerical measurements that arise from a natural numerical
scale. Quantitative data are further classified as either discrete or continuous.
Discrete data are numeric data that have finite number of possible value.
A classic example of discrete data is a finite subset of the counting number. (1, 2, 3,
4, 5, 6, 7, 8) perhaps corresponding to (Strongly disagree…… Strongly Agree).
Continuous data have infinite possibilities: 1.4, 1.41, 1.414, 1.4142, 1.41421…
The real numbers are continuous with no gaps or interruptions. Physically measurable
quantities of length, volume, time, mass.
Qualitative data are measurements that cannot be measured on a natural numerical
scale; they can only be classified into one of a group of categories or are
measurements for which there is no natural numerical scale, but which consist of
attributes, labels, or other nonnumeric characteristics.
Qualitative data are nonnumeric.
Data Analysis is a process of gathering, modeling, and transforming data with the
goal of highlighting useful information, suggesting conclusions, and supporting
decision making. The data analysis has multiple facets and approaches, encompassing
diverse techniques under variety of names, in difference business, science, and social
science domain.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 7
Course Name: Principle of Statistics
Some definitions are the following as:
1) Raw data: data collected in original form.
2) Frequency: the number of times a certain values or class of values occurs.
3) Frequency Distribution: the organizations of raw data in table form with
classes and frequencies.
4) Categorical frequency distribution: A frequency distribution in which the data
is only nominal or ordinal.
5) Ungrouped frequency distribution: A frequency distribution of numerical data.
The raw data is not grouped.
6) Grouped frequency distribution: A frequency distribution where several
numbers are grouped into one class.
7) Class limits: separate one class in a grouped frequency distribution from
another. The limits could actually appear in the data have gaps between the
upper limit of one class and the lower limit of the next.
8) Class boundaries: separate one class in a grouped frequency distribution from
another. Boundaries have one more decimal place than the raw data and
therefore do not appear in the data. There is no gap between the upper
boundary of one class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 unit from the lower class limit and
the upper class boundary is found by adding 0.5 units to the upper class limit.
9) Class width: the difference between the upper and lower boundaries of any
class. The class width is also the difference between the lower consecutive
classes or the upper limit of two consecutive classes. It is not the difference
between the upper and lower limits of the same class.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 8
Course Name: Principle of Statistics
10) Class mark (midpoint): the number in the middle of the class. It is found
by adding the upper and lower limits and dividing by two. It can also be found
by adding the upper and lower boundaries and dividing by two.
11) Cumulative frequency: the number of values less than the upper class
boundary for the current class. This is a running total of the frequencies.
12) Relative Frequency: the frequency divided by the total frequency. This
gives the percent of values falling in the class.
13) Cumulative Relative Frequency (Relative Cumulative frequency): the
running total of the Relative Frequency or the Cumulative frequency divided
by the total frequency, gives the percent of the values which are less than the
upper class boundary.
Collecting Data and Obtaining Data
Published source: book, journal, newspaper, Web site
Designed experiment: researcher exerts strict control over units
Survey: a group of people are surveyed and their responses are recorded
Observation study: units are observed in natural setting and variables of interest are recorded
Samples
A representative sample exhibits characteristics typical of those possessed by
the population of interest.
A random sample of n experimental units is a sample selected from the
population in such a way that every different sample of size n has an equal
chance of selection
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 9
Course Name: Principle of Statistics
Measures of Central Tendency
A measure of central tendency is a descriptive statistic that describes
the average, or typical value of a set of scores
There are three common measures of central tendency:
1) the mean
2) the median
3) the mode
Mean: also known as the arithmetic mean or average. Calculated by adding the
scores and dividing by the number of scores
The mean (also called the arithmetic mean) is the same as the average.
x ( for a population ) 𝒙̄ =
∑𝒙
(𝒇𝒐𝒓 𝒂 𝒔𝒂𝒎𝒑𝒍𝒆)
N 𝒏
Calculate the mean of the following data:
1 5 4 3 2
Sum the scores (X):
1 + 5 + 4 + 3 + 2 = 15
Divide the sum (X = 15) by the number of scores (N = 5):
15 / 5 = 3
Mean = X = 3
Example1: Compute the arithmetic mean of the first 6 odd, natural
numbers.
Solution: The first 6 odd, natural numbers: 1, 3, 5, 7, 9, And 11
x̄ = (1+3+5+7+9+11) / 6 = 36/6 = 6.
Thus, the arithmetic mean is 6
Example2: The data represent the number of textbooks purchased by a sample of
seven students: 10 4 7 5 7 8 9
10 + 4 + 7 + 5 + 7 + 8 + 9 𝟓𝟎
𝒙̄ = = = 7.14
𝟕 𝟕
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 10
Course Name: Principle of Statistics
Median – the number in the middle when the data is arranged in ascending or
descending order
The median is a measure of central tendency more resistant to the effects of extreme
values. The median is the value that occupies the middle position of data when data
are put in rank order by magnitude.
Let n be the number of cases in your data.
If n is odd, the median is the middle number of the data values sorted by magnitude.
th
It occupies the n + 1 position.
2
If n is even, the median is the average of the middle two numbers of the data sorted
th
n
th
n +2
by magnitude. It is the average of the numbers in the and positions.
2 2
How to Calculate the Median
Conceptually, it is easy to calculate the median
There are many minor problems that can occur; it is best to let a computer do it
Sort the data from highest to lowest
Find the score in the middle
Middle = (N + 1) / 2
If N, the number of scores, is even the median is the average of the middle two scores.
Example: Calculate the median age of the seven employees
53 32 61 57 39 44 57
To find the median, sort the data
32 39 44 53 57 57 61
The median age of the employees is 53 years.
Example (odd number of values): 1 3 4 8 10
The middle value is 4 (two values are higher, and two lower. This is the median.
Example (even number of values): 2 3 4 4 5 8 9 9
The two middle values are 4 and 5. The median is the average of these two values, or 4.5.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 11
Course Name: Principle of Statistics
Mode – the most frequent. If two numbers occur the same amount of times the set is bimodal. If
all the same, more than one mode.
Example:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57
The mode is 57 because it occurs the most times
AVERAGES (MEAN, MEDIAN, AND MODE)
A. Finding the Mean
The mean of a set of values is the sum of the values divided by the number of values. It is also
called the average.
Example: Find the mean of 19, 13, 15, 25, and 18
19 + 13 + 15 + 25 + 18 = 90 = 18
5 5
When the mean is known and you must find a missing value, some simple rules of algebra must
be applied.
Example: Ali has received the following grades this term: 75, 87, 90, 88, and 79. If he wishes to
earn an 85 average, what must he score on his final test?
Set up the problem like this: 75 + 87 + 90 + 88 + 79 + s = 85
6
To solve:
1. Add the known values.
419 + s = 85
6
2. Next, we want to try to isolate the unknown (s) on one side of the equation. To do this we
must use inverse operations to eliminate the numbers on the side of the equation with the
unknown (this means we do the opposite of what is being done).
Start with the 6. Since we are dividing the expression 419 + s by the 6, we must now multiply it
by 6.
NOTE: Whatever you do to one side of the equation, you must do to the other side of the
equation as well. Therefore, I will multiply the 85 by 6 too.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 12
Course Name: Principle of Statistics
6x 419 + s = 85 x 6
6
I can cancel the 6s on the left side of the equation. This leaves you with the equation:
419 + s = 510
Now we must eliminate the 419 from the side of the equation with the unknown. Since we are
adding 419 to s, we will subtract it from both sides of the equation.
419 + s = 510 – 419
- 419
0
This leaves us with: s = 91
Answer: The student will need to score a 91 on his last test to earn an average of 85 for the
term.
Notation:
- denotes summation of a set of values
x – Is the variable usually used to represent the individual data values
n – Represents the number of values in a sample
N – Represents the number of values in a population
∑𝒙
𝑿̄ = Is the mean of a set of sample values.
𝒏
Is the mean of all values in a population.
`
∑(𝒇𝒙)
𝑿̄ = Mean from a frequency
∑𝒇
How to Find the Sample Mean
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 13
Course Name: Principle of Statistics
Sample Question: Find the sample mean for the following set of numbers: 12, 13,
14, 16, 17, 40, 43, 55, 56, 67, 78, 78, 79, 80, 81, 90, 99, 101, 102, 304, 306, 400, 401,
403, 404, and 405
Step 1:Add up all of the numbers:
12 + 13 + 14 + 16 + 17 + 40 + 43 + 55 + 56 + 67 + 78 + 78 + 79 + 80 + 81 + 90 + 99
+ 101 + 102 + 304 + 306 + 400 + 401 + 403 + 404 + 405 = 3744.
Step 2: Count the numbers of items in your data set. In this particular data set
there are 26 items.
Step 3: Divide the number you found in Step 1 by the number you found in Step 2.
3744/26 = 144.
That’s it!
Tip: If you have to show working out on a test, just place the two numbers into the
formula. Step 1 gives you the σ and Step 2 gives you n:
x = (Σ xi) / n
= 3744/26
= 144
Assumed Mean Method Formula
Let x1, x2, x3,…,xn are mid-points or class marks of n class intervals and f1,
f2, f3, …, fn are the respective frequencies. The formula of the assumed
mean method is:
Here,
a = assumed mean
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 14
Course Name: Principle of Statistics
fi = frequency of ith class
di = xi – a = deviation of ith class
Σfi = n = Total number of observations
Xi = class mark = (upper class limit + lower class limit)/2
Assumed Mean Method Questions
If xi and fi are numerically large, the assumed mean method is preferred. Below are some examples
of calculating the mean of grouped data by this method.
Example 1:
The following table gives information about the marks obtained by 110 students in an examination.
Class 0-10 10-20 20-30 30-40 40-50
Frequency 12 28 32 25 13
Find the mean marks of the students using the assumed mean method.
Solution:
Class (CI) Frequency (fi) Class mark (xi) di = xi – a fidi
0-10 12 5 5– 25= – 20 -240
10-20 28 15 1 –25= – 10 -280
20-30 32 25 = a 25-25 = 0 0
30-40 25 35 35-25 = 10 250
40-50 13 45 45-25 = 20 260
Total Σfi =110 Σfidi = -
10
Assumed mean = a = 25
Mean of the data:
= 25 + (-10/ 110)
= 25 -( 1/11)
= (275-1)/11
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 15
Course Name: Principle of Statistics
= 274/11
=24.9
Hence, the mean marks of the students are 24.9.
Example 2: The table below gives information about the percentage distribution of
female employees in a company of various branches and a number of departments.
Percentage of female employees Number of departments
5-15 1
15-25 2
25-35 4
35-45 4
45-55 7
55-65 11
65-75 6
Find the mean percentage of female employees by the assumed mean method.
Solution:
Percentage of Number of Class mark (xi) di = xi – a fxidi
female employees departments
(CI) (fi)
5-15 1 10 -30 -30
15-25 2 20 -20 -40
25-35 4 30 -10 -40
35-45 4 40 = a 0 0
45-55 7 50 10 70
55-65 11 60 20 220
65-75 6 70 30 180
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 16
Course Name: Principle of Statistics
Total Σfi =35 Σfidi
=
360
Assumed mean = a = 40
Mean = a+ (Σfidi /Σfi)
=40+ (360/35)
= 40+ (72/7)
= 40 + 10.28
=50.28 (approx)
Hence, the mean percentage of female employees is 50.28
Formula to find arithmetic mean for a grouped data using assumed
Mean: = A + [∑fd / N]
Here A is the assumed mean.
Example 1: Calculate arithmetic mean for the following data.
X F
5 4
10 5
15 7
20 4
25 3
30 2
Solution: Now we have to use the formula given above to find the arithmetic mean.
Take the assumed mean A = 15
x F d = x-A fd
5 4 -10 -40
10 5 -5 -25
15 7 0 0
20 4 5 20
25 3 10 30
30 2 15 30
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 17
Course Name: Principle of Statistics
Total N = 25 ∑fd = 15
Arithmetic mean = A + [∑fd / N]
= 15 + (15/25)
= 15 + (3/5)
= (75 + 3)/5
= 78/5
= 15.6
Example 2: Calculate arithmetic mean for the following data.
Marks Number of students
65 6
70 11
75 3
80 5
85 4
90 7
95 10
100 4
Solution: Now we have to use the formula given above to find the arithmetic
mean. Take the assumed mean A = 80
X F d = x- fd
65 6 A -90
70 11 -15 -110
75 3 -10 -15
80 5 -5 0
85 4 0 20
90 7 5 70
95 10 10 150
100 4 15 80
Total N = 50 20 ∑fd = 115
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 18
Course Name: Principle of Statistics
Arithmetic mean = A + [∑fd / N]
= 80 + (115/50)
= 80 + (23/10)
= 80 + 2.3
= 82.3
Example 3: The following data give the number of boys of a particular age
in a class of 40 students. Calculate the mean age of the students
Age (in years) Number of students
13 3
14 8
15 9
16 11
17 6
18 3
Solution: Now we have to use the formula given above to find the
arithmetic [Link] the assumed mean A = 16
X F d = x-A Fd
13 3 -3 -9
14 8 -2 -16
15 9 -1 -9
16 11 0 0
17 6 1 6
18 3 2 6
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 19
Course Name: Principle of Statistics
Total N = 40 ∑fd = -
22
Arithmetic mean = A + [∑fd / N]
= 16 + (-22/40)
= 16 - 0.55
= 15.45
Example 4: Calculate the Arithmetic mean of the following data:
Class (X) Frequency (F)
15 12
25 20
35 15
45 14
55 16
65 11
75 7
85 8
Solution: Now we have to use the formula given above to find the arithmetic mean.
Take the assumed mean A = 45
Class Frequency(F) d = x– Fd
(x) A
15 12 -30 -360
25 20 -20 -400
35 15 -10 -150
45 14 0 0
55 16 10 160
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 20
Course Name: Principle of Statistics
65 11 20 220
75 7 30 210
85 8 40 320
Total N = 103 ∑fd = 0
Arithmetic mean = A + [∑fd / N]
= 45 + (0/103) = 45
Median – the number in the middle when the data is arranged in ascending or descending order.
The median is a measure of central tendency more resistant to the effects of extreme values. The
median is the value that occupies the middle position of data when data are put in rank order by
magnitude.
Let n be the number of cases in your data.
If n is odd, the median is the middle number of the data values sorted by magnitude. It occupies
th
n +1
the position.
2
If n is even, the median is the average of the middle two numbers of the data sorted by magnitude.
th th
It is the average of the numbers in the n and n + 2 positions.
2 2
Example (odd number of values): 1 3 4 8 10
The middle value is 4 (two values are higher, and two lower. This is the median.
Example (even number of values): 2 3 4 4 5 8 9 9
The two middle values are 4 and 5. The median is the average of these two values, or 4.5.
Find Median grouped and ungrouped data
1) 3,8,9,4,12,34,21,7,1
2) 12,14,10,22,18,20
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 21
Course Name: Principle of Statistics
Solution
1) 1,3,4,7,8,9,12,21,34
th th
n +1 9 +1 10
N=9 odd number Median =
2 2 = 2 =5th, the median is 8
2) 12,14,10,22,18,20
Solution arranged in ascending or descending order for the data
th th
n 6
3) 10,12,14,18,20,22 N =6 Even Number
2 = 2 =3th Number
th th
14 + 18 32
Median =
2 = 2 = 16
Find the Median of the following data
Marks obtained 20 29 28 33 42 38 43 25
Number of students 6 20 24 28 15 4 2 1
Solution Step1 Calculate Cumulative frequency
Class (X) Frequency (F) Cumulative Frequency(CF)
20 6 6
25 20 6+20 =26
28 24 26+24 = 50
29 28 50+28 =78
33 15 78+15 = 93
38 4 93+4 = 97
42 2 97+2 =99
43 1 99+1 =100
Σf= 100
𝑁 100
Σf= 100, Even number = = 50th number
2 2
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 22
Course Name: Principle of Statistics
th
n +1 28 + 29 57
2 = 2 = 2 = 28.5
The following Frequency distribution gives the monthly consumption of electricity of
68 consumers of a locality. Find the median of the following data.
Monthly Consumption Number of Consumers Cumulative Frequency
in (Unit)
65 – 85 4 4+0 = 4
85 – 105 5 4+5 = 9
105 – 125 13 9+13 = 22
125 – 145 20 22+20 = 42
145 – 165 14 42+14 = 56
165 – 185 8 56+8 = 64
185 –205 4 64+4 = 68
Σf= 68
L Lower limit of the Median class 𝑥 𝑏 √
=− ± 2𝑎 4𝑎𝑐
𝑏2 −
H Size of the median class
F Frequency of the median class
N Sum of frequencies
c.f. Cumulative frequency of the class just preceding the median class
N
cf M
Formula Median LM 2 xi
FM
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 23
Course Name: Principle of Statistics
N 68
2 = 2 =34, C= 125- 145, L=125, Cf =22, F=20,
I=20
68
−22 34−22 12
= 125 + ( 2 20 ) 𝑥20 = 125 + ( 20
) 𝑥20 = 125 + ( ) 𝑥20
20
= 125 + 12 Median = 137
Calculate the median from the following data:
Marks below: 10 20 30 40 50 60 70 80
No. of students 15 35 60 84 96 127 198 250
Answer
Marks (x) No. of students (f) C.F.
0-10 15 15
10-20 20 35
20-30 25 60
30-40 24 84
40-50 12 96
50-60 31 127
60-70 71 198
70-80 52 250
N = 250
𝑵 𝟐𝟓𝟎
As, N=250⇒ = =125
𝟐 𝟐
As 127 are just greater than 125, therefore median class is 50−60.
N
Cf
Median L 2
Principle of Statistics
f
xh
Collected
by: Eng Ali Sidow Osman Page 24
Course Name: Principle of Statistics
Here, l=lower limit of median class =50
C=C.F. of the class preceding the median class =96
h= higher limit - lower limit =60−50=10
f= frequency of median class =31
𝟐𝟓𝟎
−𝟗𝟔 𝟏𝟐𝟓−𝟗𝟔
𝟐
∴ Median = 𝟓𝟎 + ( ) 𝒙𝟏𝟎 = 𝟓𝟎 + (
Median ) 𝒙𝟏𝟎
𝟑𝟏 𝟑𝟏
Median= 𝟓𝟎 + 𝟗. 𝟑𝟓 Median = 𝟓𝟗. 𝟑𝟓
Find the median of the following data marks and students 10-20=7, 20-30=9,30-
40=10,40-50=12,50-60=5,60-70=7
Marks (x) 10-20 20-30 30-40 40-50 50-60 60-70
No. of students (f) 7 9 10 12 5 7
⠀⠀⠀⠀⠀⠀⠀
Median of the given data⠀⠀⠀⠀⠀
Solution: ⠀⠀⠀
⋆ TABLE: ⠀⠀⠀⠀⠀⠀
Marks (x) No. of students (f) C.F.
10-20 7 7
20-30 9 16
30-40 10 26
40-50 12 38
50-60 5 43
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 25
Course Name: Principle of Statistics
60-70 7 50
N = 50
⠀⠀⠀⠀⠀⠀
Here, Total no. of students, n = 50⠀If N = 50
Then,
⠀⠀⠀⠀
As 26 are greater than 25, therefore median class is 30-40.
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀
Hence, Median of the given data is 39
The mode is the most commonly observed value in a set of data. For the normal distribution, the mode is
also the same value as the mean and median. In many cases, the modal value will differ from the average
value in the data.
In statistics, the mode is the value which is repeatedly occurring in a given set. We can also say that the
value or number in a data set, which has a high frequency or appears more frequently, is called
mode or modal value. It is one of the three measures of central tendency, apart from mean and median.
For example, mode of the set {3, 7, 8, 8, 9}, is 8. Therefore, for a finite number of observations, we can
easily find the mode. A set of values may have one mode or more than one mode or no mode at all
How to Find the Mode or Modal Value
Finding the Mode ungrouped data
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 26
Course Name: Principle of Statistics
To find the mode, or modal value, it is best to put the numbers in order.
Then count how many of each number. A number that appears most often is
the mode.
Example: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
In order these numbers are: 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23, 29, 39, 40, 56
This makes it easy to see which numbers appear most often.
In this case the mode is 23.
Another Example2: {19, 8, 29, 35, 19, 28, 15}
Arrange them in order: {8, 15, 19, 19, 28, 29, and 35}
19 appear twice, all the rest appear only once, so 19 is the mode.
Mode by formula Formula l 2 f 1f1 f 0f 0 f 2 xh
How to Calculate Mode Step by Step?
Step 1. Find the maximum class frequency.
Step 2. Find the class corresponding to this frequency. It is called the modal class.
Step 3. Find the class size. (Upper limit – lower limit.)
Step 4. Calculate mode using the formula
Where l = the lower limit of modal class.
h = the size of class interval, (assuming classes are of equal size).
f1 = the frequency of the modal class.
F0 denotes the frequency of the class preceding the modal class.
F2 denotes the frequency of the class succeeding the modal class.
The marks obtained by 40 students of 50 in a class are given below in the table
marks obtained 42 36 30 45 50
Number of students 7 10 13 8 2
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 27
Course Name: Principle of Statistics
The Mode is 30
Mode of Grouped Data
Example: In a class of 30 students marks obtained by students in statistics out of
50 is tabulated as below. Calculate the mode of data given.
Solution:
The maximum class frequency is 12 and the class interval corresponding to this
frequency is 20 – 30. Thus, the modal class is 20 – 30.
Lower limit of the modal class (l) = 20
Size of the class interval (h) = 10
Frequency of the modal class (f1) = 12
Frequency of the class preceding the modal class (f0) = 5
Frequency of the class succeeding the modal class (f2) = 8
Substituting these values in the formula we get
𝒇𝟏−𝒇𝟎 𝟏𝟐−𝟓 𝟕
Mode= 𝒍 + ( ) 𝒙𝒉 = 𝟐𝟎 + ( ) 𝒙𝟏𝟎 = 𝟐𝟎 + ( ) 𝒙𝟏𝟎 =
𝟐𝒇𝟏−𝒇𝟎−𝒇𝟐 𝟐𝒙𝟏𝟐−𝟓−𝟖 𝟐𝟒−𝟓−𝟖
𝟕
𝟐𝟎 + ( ) 𝒙𝟏𝟎 = 𝟐𝟎 + (𝟎. 𝟔𝟑𝟔𝟑)𝒙𝟏𝟎 = 𝟐𝟎 + 𝟔. 𝟑𝟔𝟑 = 𝟐𝟔. 𝟑𝟔𝟑
𝟏𝟏
The following data gives the information on the observed lifetimes (in hour) of 225 electrical
components.
lifetimes (in 0-20 20-40 40-60 60-80 80-100 100-120
hour)
Frequency (x) 10 35 52 61 38 29
Determine the model lifetimes (in hour) components.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 28
Course Name: Principle of Statistics
𝒇𝟏−𝒇𝟎
Formula= 𝒍 + (𝟐𝒇𝟏−𝒇𝟎−𝒇𝟐) 𝒙𝒉
Where l = the lower limit of modal class.
h = the size of class interval, (assuming classes are of equal size).
f1 = the frequency of the modal class.
F0 denotes the frequency of the class preceding the modal class.
F2 denotes the frequency of the class succeeding the modal class
Modal class = 60-80, L=60, F1=61, F0=52, f2 =38, h=20.
𝟔𝟏−𝟓𝟐 𝟗
Formula= 𝟔𝟎 + (𝟐(𝟔𝟏)−𝟓𝟐−𝟑𝟖) 𝒙𝟐𝟎 = 𝟔𝟎 + (𝟑𝟐) 𝒙𝟐𝟎 = 𝟔𝟎 + 𝟓. 𝟔𝟐𝟓 = 𝟔𝟓. 𝟔𝟐𝟓
In statistics, the range is the spread of your data from the lowest to the highest value in the
distribution. It is a commonly used measure of variability.
Along with measures of central tendency, measures of variability give you descriptive statistics for
summarizing your data set.
The range is calculated by subtracting the lowest value from the highest value. While a large range
means high variability, a small range means low variability in a distribution.
Calculate the range
The formula to calculate the range is:
R = range
H = highest value
L = lowest value
The range is the easiest measure of variability to calculate. To find the range, follow these steps:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 29
Course Name: Principle of Statistics
1. Order all values in your data set from low to high.
2. Subtract the lowest value from the highest value.
This process is the same regardless of whether your values are positive or negative, or whole
numbers or fractions.
Range example your data set is the ages of 8 participants.
Participant 1 2 3 4 5 6 7 8
Age 37 19 31 29 21 26 33 36
First, order the values from low to high to identify the lowest value (L) and the highest value (H).
Age 19 21 26 29 31 33 36 37
Then subtract the lowest from the highest value.
R=H–L R = 37 – 19 = 18
The range of our data set is 18 years.
How useful is the range?
The range generally gives you a good indicator of variability when you have a distribution without
extreme values. When paired with measures of central tendency, the range can tell you about the
span of the distribution.
But the range can be misleading when you have outliers in your data set. One extreme value in the
data will give you a completely different range.
Range example with an outlier one value in your data set is replaced with an outlier.
Age 19 21 26 29 31 33 36 61
Using the same calculation, we get a very different result this time:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 30
Course Name: Principle of Statistics
R=H–L R = 61 – 19 = 42
With an outlier, our range is now 42 years.
EXAMPLE – Range: The number of cappuccinos sold at the Starbucks location in
the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year
were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of
cappuccinos sold.
Range = Largest – Smallest value
= 80 – 20 = 60
Mean Deviation Definition
The mean deviation is defined as a statistical measure which is used to calculate the average
deviation from the mean value of the given data set. The mean deviation of the data values can be
easily calculated using the below procedure.
Step 1: Find the mean value for the given data values
Step 2: Now, subtract mean value from each of the data value given (Note: Ignore the minus
symbol)
Step 3: Now, find the mean of those values obtained in step 2.
Mean Deviation Formula
The formula to calculate the mean deviation for the given data set is given below.
Mean Deviation = [Σ |X – µ|]/N
Here,
Σ represents the addition of values
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 31
Course Name: Principle of Statistics
X represents each value in the data set
Μ represents the mean value of the data set
N represents the number of data values
|| represents the absolute value, which ignores the “-” symbol
EXAMPLE – Mean Deviation: The number of cappuccinos sold at the Starbucks location in
the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50,
60, and 80. Determine the mean deviation for the number of cappuccinos sold.
Step1: Add numbers of Cappuccinos sold Daily (20+40+50+60+80)/5 =50
Example: Find mean, variance and standard deviation for the following data
below: 7, 15,12,17,20,14,9.
∑𝒙 𝟕+𝟏𝟓+𝟏𝟐+𝟏𝟕+𝟐𝟎+𝟏𝟒+𝟗 𝟗𝟒
𝒙̄ = (𝒇𝒐𝒓 𝒂 𝒔𝒂𝒎𝒑𝒍𝒆) 𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟏𝟑. 𝟒
𝒏 𝟕 𝟕
𝒙̄
(𝟕 − 𝟏𝟑. 𝟒) + (𝟏𝟓 − 𝟏𝟑. 𝟒) + (𝟏𝟐 − 𝟏𝟑. 𝟒) + (𝟏𝟕 − 𝟏𝟑. 𝟒) + (𝟐𝟎 − 𝟏𝟑. 𝟒) + (𝟏𝟒 − 𝟏𝟑. 𝟒) + (𝟗
=
𝟕
𝟒𝟎.𝟗𝟔+𝟐.𝟓𝟔+𝟏.𝟗𝟔+𝟏𝟐.𝟗𝟔+𝟒𝟑.𝟓𝟔+𝟎,𝟑𝟔+𝟏𝟗.𝟑𝟔 𝟏𝟐𝟏.𝟕𝟐
𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟏𝟕. 𝟒
𝟕 𝟕
𝑺𝑫 = √𝒗𝒂𝒓 𝒊 𝒂𝒏𝒄𝒆 = 𝑺𝑫 = √𝟏𝟕. 𝟒 = 4.2
Find Mean and the standard deviation for the following values: 78.2, 90.5, 98.1,
93.7, and 94.5, find the mean. Organize the next steps in a table.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 32
Course Name: Principle of Statistics
∑𝒙 𝟕𝟖.𝟐+𝟗𝟎.𝟓+𝟗𝟖.𝟏+𝟗𝟑.𝟕+𝟐𝟎+𝟗𝟒.𝟓 𝟒𝟓𝟒.𝟗
𝒙̄ = 𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟗𝟏
𝒏 𝟓 𝟓
(x - 𝒙̄ ) 𝟐
x
)
78.2 91 -12.8 163.84
90.5 91 0.5 0.25
98.1 91 7.1 50.41
93.7 91 2.7 7.29
94.5 91 3.5 12.25
Σ )2 =234.04
Find standard deviation
SD ( X X ) 2
n
234.04
𝑆𝐷 = √ = 6.8
5
Mean and standard deviation of ungrouped data Recovery times from shoulder injuries.
Time in weeks(x) Frequency FX 2
x .f
1 5 5 5
2 8 16 32
3 12 36 108
4 19 76 304
5 7 35 175
6 4 24 144
7 3 21 147
8 2 16 128
Σf= 60 Σfx=229 Σfx2=1043
Mean = 𝑥̄ = ∑𝑛𝑥 𝑆𝑢𝑚𝑜𝑓𝑤𝑒𝑒𝑘𝑠
= 𝑥̄ = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑝𝑎𝑡𝑒 𝑖𝑛𝑡 𝑠 𝑥̄ =
∑ 𝑥𝑓
∑𝑓
𝑥̄ =
229
60
= 3.82weeks
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 33
Course Name: Principle of Statistics
= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1) = (√∑(𝟔𝟎)(𝟏𝟎𝟒𝟑) − (𝟐𝟐𝟗)𝟐)/
∑(𝟔𝟎)(𝟔𝟎 − 𝟏) = 𝑺𝑫 =
√𝟔𝟐𝟓𝟖𝟎 − 𝟓𝟐𝟒𝟒𝟏/𝟑𝟓𝟒𝟎 = √𝟏𝟎𝟏𝟑𝟗/𝟑𝟓𝟒𝟎 = √𝟐. 𝟖𝟔𝟒𝟏𝟐𝟒𝟑 = 𝑺𝑫 =
𝟏. 𝟔𝟗𝒘𝒆𝒆𝒌𝒔
Find the variance and standard deviation for the following data.
No of Frequenc FX 2
x .f
orders x) y
10-12 4 5 5
13-15 12 16 32
16-18 20 36 108
19-21 14 76 304
Solution
No of orders x) Frequency Midpiont(x) FX 2
x .f
10-12 4 11 44 484
13-15 12 14 168 2352
16-18 20 17 340 5780
19-21 14 20 280 5600
N =50 Σfx=832 Σfx2=14216
SD= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1)
SD= (√∑(𝟓𝟎)(𝟏𝟒𝟐𝟏𝟔) − (𝟖𝟑𝟐)𝟐)/ ∑(𝟓𝟎)(𝟓𝟎 − 𝟏) = 𝑺𝑫 =
√𝟕𝟏𝟎𝟖𝟎𝟎 − 𝟔𝟗𝟐𝟐𝟐𝟒/𝟐𝟒𝟓𝟎 = √𝟏𝟖𝟓𝟕𝟔/𝟐𝟒𝟓𝟎 = √𝟕. 𝟓𝟖𝟐𝟎 = 𝑺𝑫 = 𝟐. 𝟕𝟓
Thus the standard deviation of the number of orders received at the office of this
mail-order company during the past 50 day is 2.75
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 34
Course Name: Principle of Statistics
Sample Variance
Average of the squared deviations from the arithmetic mean
EXAMPLE – Sample Variance
The hourly wages for a sample of part-time employees at Home Depot are: $12, $20,
$16, $18, and $19. What is the sample variance?
Standard Deviation Grouped Data
Score Frequency Mid(x) Fx 𝒙𝟐 𝒇𝒙𝟐
41-45 1 43 43 1849 1849
36-40 5 38 190 1444 7220
31-35 10 33 330 1089 10890
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 35
Course Name: Principle of Statistics
26-30 12 28 336 784 9408
21-25 10 23 230 529 5290
16-20 5 18 90 324 1620
11-15 3 13 39 169 507
6-10 3 8 24 64 192
1-5 1 3 3 9 9
Σf = 50 Σfx =1285 Σfx2 =36989
= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1) = (√∑(𝟓𝟎)(𝟑𝟔𝟗𝟖𝟓) − (𝟏𝟐𝟖𝟓)𝟐)/
∑(𝟓𝟎)(𝟒𝟗) = 𝑺𝑫 =
√𝟏𝟖𝟒𝟗𝟐𝟓𝟎 − 𝟏𝟔𝟓𝟏𝟐𝟐𝟓/𝟐𝟒𝟓𝟎 = √𝟏𝟗𝟖𝟎𝟐𝟓/𝟐𝟒𝟓𝟎 = √𝟏𝟗𝟖𝟎𝟐𝟓/𝟐𝟒𝟓𝟎 = 𝑺𝑫𝟖. 𝟗𝟗
Standard Deviation Grouped Data
Grade Frequenc Mid(x) Fm (m-x) (m - x )² F)2
y
50-59 3 54.5 163.5 79. -24.7 610.09 1830.27
5
60-69 5 64.5 322.5 79. -14.7 216.09 1080.45
5
70-79 9 74.5 670.5 79. -4.7 22.09 198.81
5
80-89 12 84.5 1014 79. 5.3 28.09 337.08
5
90-100 8 95 760 79. 15.8 249.64 1997.12
5
Σf = 37 Σfx =2930.5 Σfx2 =36989 5443.73
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 36
Course Name: Principle of Statistics
f.m
= ∑ 𝒇. 𝒎/f =2930.5/37 = 97.2
n
s s 2 (X X ) 2
s s 2 (m X ) 2
s
5443.73
N 1 n 1 37 1
s
5443.73 12.3
36
Summary of the calculation procedures:
1) subtract the mean from each score
2) square each result
3) sum all the square
4) Divide the sum of square by N. Now you get variance
If divide the sum of square by N-1, you will get the population variance estimate
5) Standard deviation is just the positive square root of the variance
Grouped data: the weights in kilograms, recoded by final year students
are as follow: Calculate the mean, Mean Deviation, variance and standard
deviation.
W(kg) f X F*x (x-𝒙̄ ) f(x-𝒙̄ ) f(x-𝒙̄ )^2
54-57 5 55.5 277.5 11.44 57.20 654.368
58-61 7 59.5 416.5 7.44 52.08 387.4752
62-65 10 63.5 635 3.44 34.40 118.336
66-69 12 67.5 810 0.56 6.72 3.7632
70-73 6 71.5 429 4.56 27.36 124.7616
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 37
Course Name: Principle of Statistics
74-77 5 75.5 377.5 8.56 42.80 366.368
78-81 4 79.5 318 12.56 50.24 631.0144
82-85 1 83.5 83.5 16.56 26.56 276.2336
50 3347 287.36 2560.32
Solution:
Step one:𝑴𝒆𝒂𝒏 = ∑𝒏𝒇𝒙 𝑴𝒆𝒂𝒏 =
𝟑𝟑𝟒𝟕
𝟓𝟎
=66.69kg
∑ 𝒇(x-xbarr) 𝟐𝟖𝟕.𝟑𝟔
Step two: Mean Deviation = = =5.7472
𝒏 𝟓𝟎
∑ 𝒇(x-xbarr )^2 2560.32
Variance == 𝒏
=
𝟓𝟎
= 51.2064
𝑺𝑫 = √𝟓𝟏. 𝟐𝟎𝟔𝟒 = 7.155
Standard Deviation of Grouped Data
Standard Deviation of Grouped Data – Example
Refer to the frequency distribution for the Whitner Autoplex data used earlier. Compute the standard
deviation of the vehicle selling prices.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 38
Course Name: Principle of Statistics
Ch3: Regression Analysis
6 Linear model assumptions
7 Simple linear regression
8 Multiple linear regression
9 Problems for Regression Analysis
(i) Regression equation of X on Y
(ii) Regression coefficient of Y on X
(iii) Regression equation of Y on X
Regression Analysis
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 39
Course Name: Principle of Statistics
Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables. It
can be utilized to assess the strength of the relationship between variables and for
modeling the future relationship between them.
Regression analysis includes several variations, such as linear, multiple linear, and
nonlinear. The most common models are simple linear and multiple linear. Nonlinear
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 40
Course Name: Principle of Statistics
regression analysis is commonly used for more complicated data sets in which the
dependent and independent variables show a nonlinear relationship.
Regression analysis offers numerous applications in various disciplines,
including finance.
Regression Analysis – Linear model assumptions
Linear regression analysis is based on six fundamental assumptions:
1. The dependent and independent variables show a linear relationship between
the slope and the intercept.
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all observations.
6. The residual (error) values follow the normal distribution.
Regression Analysis – Simple linear regression
Simple linear regression is a model that assesses the relationship between a
dependent variable and an independent variable. The simple linear model is
expressed using the following equation:
Y = a + bX + ϵ
Where:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 41
Course Name: Principle of Statistics
1) Y – Dependent variable
2) X – Independent (explanatory) variable
3) a – Intercept
4) b – Slope
5) ϵ – Residual (error)
Regression Analysis – Multiple linear regression
Multiple linear regression analysis is essentially similar to the simple linear model,
with the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regressions is:
Y = a + bX1 + cX2 + dX3 + ϵ
Where:
1) Y – Dependent variable
2) X1, X2, X3 – Independent (explanatory) variables
3) a – Intercept
4) b, c, d – Slopes
5) ϵ – Residual (error)
Multiple linear regressions follow the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analyses,
there is another mandatory condition for the model:
Non-co linearity: Independent variables should show a minimum of correlation with
each other. If the independent variables are highly correlated with each other, it will
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 42
Course Name: Principle of Statistics
be difficult to assess the true relationships between the dependent and independent
variables.
Problems for Regression Analysis
Example1: Calculate the regression coefficient and obtain the lines of regression for
the following data
Solution:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 43
Course Name: Principle of Statistics
Regression coefficient of X on Y
(i) Regression equation of X on Y
(ii) Regression coefficient of Y on X
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 44
Course Name: Principle of Statistics
(iii) Regression equation of Y on X
Y = 0.929X–3.716+11
= 0.929X+7.284
The regression equation of Y on X is Y= 0.929X + 7.284
Example2: Calculate the two regression equations of X on Y and Y on X from the data
given below, taking deviations from an actual means of X and Y.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 45
Course Name: Principle of Statistics
Estimate the likely demand when the price is Somali Shilling.20.
Solution:
Calculation of Regression equation
(i) Regression equation of X on Y
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 46
Course Name: Principle of Statistics
(ii) Regression Equation of Y on X
When X is 20, Y will be
= –0.25 (20) +44.25
= –5+44.25
= 39.25 (when the price is Somali Shilling. 20, the likely demand is 39.25)
Example3: Obtain regression equation of Y on X and estimate Y when X=55 from the
following
Solution:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 47
Course Name: Principle of Statistics
(i) Regression coefficients of Y on X
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 48
Course Name: Principle of Statistics
(ii) Regression equation of Y on X
Y–51.57 = 0.942(X–48.29)
Y = 0.942X–45.49+51.57=0.942 x–45.49+51.57
Y = 0.942X+6.08
The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55
Y= 0.942(55) +6.08=57.89
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 49
Course Name: Principle of Statistics
Chapter 4: Correlation Analysis
10 Types of correlation coefficient formulas
11 What is Pearson Correlation?
12 Potential problems with Pearson correlation
Correlation Analysis
Correlation is a statistical measure that expresses the extent to which two
variables are linearly related (meaning they change together at a constant rate). It's a
common tool for describing simple relationships without making a statement about
cause and effect.
Example 1: An agriculture research organization tested a particular chemical fertilizer used would
lead to a corresponding increase in the food supply
X 2 1 3 2 4 5 3
Y 4 3 4 3 6 5 5
Solution:
X Y XY X2 Y2
2 4
1 3
3 4
2 3
4 6
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 50
Course Name: Principle of Statistics
5 5
3 5
Σ= Σ= Σ= Σ= Σ=
Example 2: The table below shows the time in hours spent studying (x) of 6 grade 11
students and their scores on a test (y) solve for Pearson’s product Correlation
Coefficients.
X 1 2 3 4 5 6
Y 5 10 15 15 25 35
Solution
X Y XY X2 Y2
1 5
2 10
3 15
4 15
5 24
6 35
Σ=21 Σ =104 Σ= Σ= Σ=
Use the following correlation coefficient formula.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 51
Course Name: Principle of Statistics
Correlation coefficients are used to measure how strong a relationship is between two variables.
There are several types of correlation coefficient, but the most popular is Pearson’s. Pearson’s
correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear
regression. If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In fact,
when anyone refers to the correlation coefficient, they are usually talking about Pearson’s.
Correlation Coefficient Formula: Definition
Correlation coefficient formulas are used to find how strong a relationship is between data. The
formulas return a value between -1 and 1, where:
1) 1 indicates a strong positive relationship.
2) -1 indicates a strong negative relationship.
3) A result of zero indicates no relationship at all.
Graphs showing a correlation of -1, 0 and +1
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 52
Course Name: Principle of Statistics
Eng: Cali Siidow Osmaan
Meaning
1) A correlation coefficient of 1 means that for every positive increase in one variable, there is
a positive increase of a fixed proportion in the other. For example, shoe sizes go up in
(almost) perfect correlation with foot length.
2) A correlation coefficient of -1 means that for every positive increase in one variable, there is
a negative decrease of a fixed proportion in the other. For example, the amount of gas in a
tank decreases in (almost) perfect correlation with speed.
3) Zero means that for every increase, there isn’t a positive or negative increase. The two just
aren’t related.
The absolute value of the correlation coefficient gives us the relationship strength. The larger the
number, the stronger the relationship for example, |-.75| = .75, which has a stronger relationship
than .65.
Types of correlation coefficient formulas
There are several types of correlation coefficient formulas.
One of the most commonly used formulas is Pearson’s correlation coefficient
formula. If you’re taking a basic stats class, this is the one you’ll probably use:
Two other formulas are commonly used: the sample correlation coefficient and the
population correlation coefficient.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 53
Course Name: Principle of Statistics
Sample correlation coefficient
Sx and sy are the sample standard deviations, and sxy is the sample covariance.
Population correlation coefficient
The population correlation coefficient uses σx and σy as the population standard
deviations, and σxy as the population covariance.
What is Pearson Correlation?
Correlation between sets of data is a measure of how well they are related. The most
common measure of correlation in stats is the Pearson Correlation. The full name is
the Pearson Product Moment Correlation (PPMC). It shows the linear
relationship between two sets of data. In simple terms, it answers the question; Can I
draw a line graph to represent the data? Two letters are used to represent the
Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a
sample.
Potential problems with Pearson correlation
The PPMC is not able to tell the difference between dependent
variables and independent variables. For example, if you are trying to find the
correlation between a high calorie diet and diabetes, you might find a high correlation
of .8. However, you could also get the same result with the variables switched
around. In other words, you could say that diabetes causes a high calorie diet. That
obviously makes no sense. Therefore, as a researcher you have to be aware of the
data you are plugging in. In addition, the PPMC will not give you any information
about the slope of the line; it only tells you whether there is a relationship.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 54
Course Name: Principle of Statistics
Real Life Example
Pearson correlation is used in thousands of real life situations. For example, scientists in
China wanted to know if there was a relationship between how weedy rice populations are
different genetically. The goal was to find out the evolutionary potential of the rice.
Pearson’s correlation between the two groups was analyzed. It showed a positive Pearson
Product Moment correlation of between 0.783 and 0.895 for weedy rice populations. This
figure is quite high, which suggested a fairly strong relationship.
How to Find Pearson’s Correlation Coefficients
By Hand
Example question: Find the value of the correlation coefficient from the following
table:
SUBJECT AGE X GLUCOSE LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and
y2.
SUBJECT AGE GLUCOSE XY X 2 Y 2
X LEVEL Y
1 43 99
2 21 65
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 55
Course Name: Principle of Statistics
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the
x2 column.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 56
Course Name: Principle of Statistics
Step 4: Take the square of the numbers in the y column, and put the result in the
y2 column.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns and put the result at the bottom of
the column. The Greek letter sigma (Σ) is a short way of saying “sum of”
or summation.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 57
Course Name: Principle of Statistics
Step 6: Use the following correlation coefficient formula.
The answer is: 2868 / 5413.27 = 0.529809
From our table:
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
The correlation coefficient =
6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]
= 0.5298
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or
52.98%, which means the variables have a moderate positive correlation
Example4: Find the means of X and Y variables and the coefficient of correlation
between them from the following two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Given:
2y - x - 50 = 0
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 58
Course Name: Principle of Statistics
3y - 2x - 10 = 0
To find:
Mean of the variables X and Y
Correlation coefficient
Solution:
2y - x - 50 = 0
2y - x = 50 (i)
3y - 2x - 10 = 0
3y - 2x = 10 (ii)
Solving equation (i) and (ii) simultaneously
2y - x = 50 ×2
3y - 2x = 10
So, we get
4y - 2x = 100
3y - 2x = 10
(-) (+) (-)
y = 90
Putting value of y in equation (i)
2y - x = 50
2(90) - x = 50
180 - x = 50
x = 180 - 50
x = 130
So, we get X' = 130 and Y' = 90
Assume equation (i), regression equation of Y on X
2y - x = 50
2y = x + 50
So,
Consider equation (ii), regression equation of X on Y
3y - 2x = 10
2x = 3y - 10
So,
r = 0.866
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 59
Course Name: Principle of Statistics
So, correlation coefficient is 0.866
Example5: Find the means of X and Y variables and the coefficient of correlation
between them from the following two regression equations:
4X–5Y+33 = 0
20X–9Y–107 = 0
Solution:
To get mean values we must solve the given lines.
4X – 5Y = -33 … (1)
20X – 9Y = 107 … (2)
1× 5 ⇒ 20X – 25Y = -165
20X – 9Y = 107
Subtracting (1) and (2), -16Y = -272
Y = 272/16 = 17 i.e., Y¯ = 17
Using Y = 17 in (1)
We get,
4X – 85 = -33
4X = 52
X = 13 i.e., X¯X¯ = 13
Mean values are X¯ = 13, Y¯ = 17,
Let regression line of Y on X be
4X – 5Y + 33 = 0
5Y = 4X + 33
Y = (4X + 33) Y = 1/5(4x + 33)
Y = 4/5X+33/5 Y = 0.8X + 6.6
∴ byx = 0.8
Let regression line of X on Y be
20X – 9Y – 107 = 0
20X = 9Y + 107 X = 1/20 (9Y + 107)
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 60
Course Name: Principle of Statistics
X = 9/20Y+107/20Y
X = 0.45Y + 5.35
∴ bxy = 0.45 Coefficient of correlation between X and Y is = ±0.6 = 0.6 Both byx and bxy is
positive take positive sign.
Example6
The following table shows the sales and advertisement expenditure of a form
Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Sh. Somali. 10 crores.
Solution:
When advertisement expenditure is 10 crores i.e., Y=10 then sales X=6(10) +4=64
which implies sales is 64.
Example7
There are two series of index numbers P for price index and S for stock of the
commodity. The mean and standard deviation of P are 100 and 8 and of S are 103 and
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 61
Course Name: Principle of Statistics
4 respectively. The correlation coefficient between the two series is 0.4. With these
data obtain the regression lines of P on S and S on P.
Solution:
Let us consider X for price P and Y for stock S. Then the mean and SD for P is
considered as X-Bar = 100 and σx=8. Respectively and the mean and SD of S is
considered as Y-Bar =103 and σy=4. The correlation coefficient between the series is r(X,
Y) =0.4
Let the regression line X on Y be
Example8
For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2
=55, ∑Y2 =135, ∑XY=83 Find the equation of the lines of regression and estimate the
value of X on the first line when Y=12 and value of Y on the second line if X=8.
Solution:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 62
Course Name: Principle of Statistics
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 63
Course Name: Principle of Statistics
Y–5 = 0.8(X–3)
= 0.8X+2.6
When X=8 the value of Y is estimated as
= 0.8(8) +2.6
=9
Example9
The two regression lines are 3X+2Y=26 and 6X+3Y=31. Find the correlation
coefficient.
Solution:
Let the regression equation of Y on X be
3X+2Y = 26
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 64
Course Name: Principle of Statistics
Example9
In a laboratory experiment on correlation research study the equation of the two
regression lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means
of X and Y. Also work out the values of the regression coefficient and correlation
between the two variables X and Y.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 65
Course Name: Principle of Statistics
Solution:
Solving the two regression equations we get mean values of X and Y
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 66
Course Name: Principle of Statistics
Example10
For the given lines of regression 3X–2Y=5and X–4Y=7. Find
(i) Regression coefficients
(ii) Coefficient of correlation
Solution:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 67
Course Name: Principle of Statistics
(i) First convert the given equations Y on X and X on Y in standard form and find their
regression coefficients respectively.
Given regression lines are
3X–2Y = 5 ... (1)
X–4Y = 7 ... (2)
Let the line of regression of X on Y is
3X–2Y = 5
3X = 2Y+5
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 68
Course Name: Principle of Statistics
Coefficient of correlation
Since the two regression coefficients are positive then the correlation coefficient is also
positive and it is given by
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 69
Course Name: Principle of Statistics
Exercise1
1. from the data given below
Find (a) The two regression equations, (b) The coefficient of correlation between
marks in Economics and statistics, (c) The mostly likely marks in Statistics when the
marks in Economics are 30.
2. The heights (in cm.) of a group of fathers and sons are given below
Find the lines of regression and estimate the height of son when the height of the father
is 164 cm.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 70
Course Name: Principle of Statistics
3. The following data give the height in inches (X) and the weight in lb. (Y) of a random
sample of 10 students from a large group of students of age 17 years:
Estimate weight of the student of a height 69 inches.
4. Obtain the two regression lines from the following data N=20, ∑X=80, ∑Y=40,
∑X2=1680, ∑Y2=320 and ∑XY=480
5. Given the following data, what will be the possible yield when the rainfall is 29₹₹?
Coefficient of correlation between rainfall and production is 0.8
6. The following data relate to advertisement expenditure (in lakh of rupees) and their
corresponding sales (in cores of rupees)
Estimate the sales corresponding to advertising expenditure of Rs. 30 lakh.
7. You are given the following data:
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 71
Course Name: Principle of Statistics
If the Correlation coefficient between X and Y is 0.66, then find (i) the two regression
coefficients, (ii) the most likely value of Y when X=10
8. Find the equation of the regression line of Y on X, if the observations ( Xi, Yi) are the
following (1,4) (2,8) (3,2) ( 4,12) ( 5, 10) ( 6, 14) ( 7, 16) ( 8, 6) (9, 18)
9. A survey was conducted to study the relationship between expenditure on
accommodation (X) and expenditure on Food and Entertainment (Y) and the following
results were obtained:
Write down the regression equation and estimate the expenditure on Food and
Entertainment, if the expenditure on accommodation is Rs. 200.
10. For 5 observations of pairs of (X, Y) of variables X and Y the following results are
obtained. ∑X=15, ∑Y=25, ∑X2=55, ∑Y2=135, ∑XY=83. Find the equation of the lines
of regression and estimate the values of X and Y if Y=8; X=12.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 72
Course Name: Principle of Statistics
11. The two regression lines were found to be 4X–5Y+33=0 and 20X–9Y–107=0. Find
the mean values and coefficient of correlation between X and Y.
12. The equations of two lines of regression obtained in a correlation analysis are the
following 2X=8–3Y and 2Y=5–X. Obtain the value of the regression coefficients and
correlation coefficient.
Ch5: Empirical Probability
The Addition Rules for Probability
The Multiplication Rules and Conditional
Probability
Conditional Probability
Empirical probability, also known as experimental probability,
refers to a probability that is based on historical data. In other
words, empirical probability illustrates the likelihood of an event
occurring based on historical data.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 73
Course Name: Principle of Statistics
Example1: In the travel survey just described, find the
probability that a person will travel by airplane over the Thanks
giving holiday
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 74
Course Name: Principle of Statistics
𝒇 𝟔
Solution 𝒑(𝑬) = 𝒏 = 𝟓𝟎 =
𝟑
Example2: In a sample of 50 people, 21
𝟐𝟓
had type O blood, 22 had type A blood, 5 had
type B blood, and 2 had type AB blood. Set up
a frequency distribution and find the following
probabilities.
a. A person has type O blood.
b. A person has type A or type B blood.
c. A person has neither type A nor type O
blood.
d. A person does not have type AB blood
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 75
Course Name: Principle of Statistics
Solution
𝒇
𝒑(𝟎) = =
𝒏
𝟐𝟐 𝟓 𝟐𝟕
𝒑(𝑨𝒐𝒓𝑩) = + =
𝟓𝟎 𝟓𝟎 𝟓𝟎
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 76
Course Name: Principle of Statistics
𝟓
𝒑(𝒏𝒆𝒊𝒕𝒉𝒆𝒓𝑨𝒏𝒐𝒓𝑶) = 𝟓𝟎 +
𝟐 𝟕
= 𝒑(𝒏𝒐𝒕𝑨𝑩) = 𝟏 −
𝟓𝟎 𝟓𝟎
𝟐 𝟒𝟖
𝑷(𝑨𝑩) = 𝟏 − 𝟓𝟎 = 𝟓𝟎
Example3: Hospital records indicated that knee
replacement patients stayed in the hospital for the
number of days shown in the distribution.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 77
Course Name: Principle of Statistics
Find these probabilities.
a. A patient stayed exactly 5 days.
b. A patient stayed less than 6 days.
c. A patient stayed at most 4 days.
d. A patient stayed at least 5 days.
Solution
𝟓𝟔
𝒑(𝟓) =
𝟏𝟐𝟕
𝒑(𝒍𝒆𝒔𝒕𝒉𝒆𝒏𝟔𝒅𝒂𝒚𝒔)
𝟏𝟓 𝟑𝟐 𝟓𝟔
= + +
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝟏𝟎𝟑
=
𝟏𝟐𝟕
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 78
Course Name: Principle of Statistics
𝒑(𝒂𝒕𝒎𝒐𝒔𝒕𝟒𝒅𝒂𝒚𝒔)
𝟏𝟓 𝟑𝟐 𝟒𝟕
= + =
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝒑(𝒂𝒕𝒍𝒆𝒔𝒕𝟓𝒅𝒂𝒚𝒔)
𝟓𝟔 𝟏𝟗 𝟓
= + +
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝟖𝟎
=
𝟏𝟐𝟕
The Addition Rules for Probability
Two events are mutually exclusive events if
they cannot occur at the same time (i.e., they
have no outcomes in common).
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 79
Course Name: Principle of Statistics
Example: Determine which events are
mutually exclusive and which are not, when a
single die is rolled. a. Getting an odd number
and getting an even number
b. Getting a 3 and getting an odd number
c. Getting an odd number and getting a
number less than 4
d. Getting a number greater than 4 and getting
a number less than 4
Solution: The events are mutually
exclusive, since the first event can be 1, 3, or
5 and the second event can be 2, 4, or 6.
b. The events are not mutually exclusive,
since the first event is a 3 and the second can
be 1, 3, or 5. Hence, 3 is contained in both
events.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 80
Course Name: Principle of Statistics
The events are not mutually
exclusive, since the first event can be
1, 3, or 5 and the second can be 1, 2,
or 3. Hence, 1 and 3 are contained in
both events.
d. The events are mutually exclusive,
since the first event can be 5 or 6 and the
second event can be 1, 2, or 3.
Example: Determine which events are
mutually exclusive and which are not
when a single card is
drawn from a deck
a. Getting a 7 and getting a jack
b. Getting a club and getting a king
c. Getting a face card and getting an ace
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 81
Course Name: Principle of Statistics
d. Getting a face card and getting a spade
Solution
Only the events in parts a and c are
mutually exclusive.
Example: In a hospital unit there are
8 nurses and 5 physicians; 7 nurses
and 3 physicians are females.
If a staff person is selected, find the
probability that the subject is a
nurse or a male.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 82
Course Name: Principle of Statistics
Solution
Example:
A single card is drawn at random
from an ordinary deck of cards. Find
the probability
that it is either an ace or a black
card.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 83
Course Name: Principle of Statistics
Example
In a hospital unit there are 8 nurses
and 5 physicians; 7 nurses and 3
physicians are females.
If a staff person is selected, find the
probability that the subject is a
nurse or a male.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 84
Course Name: Principle of Statistics
Solution
The Multiplication Rules and
Conditional Probability
Two events A and B are
independent events if the fact
that A occurs does not affect
the probability of B occurring.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 85
Course Name: Principle of Statistics
Examples: A coin is flipped
and a die is rolled. Find the
probability of getting a head
on the coin and a 4 on the die.
Solution 𝒑(𝒉𝒆𝒂𝒅𝒂𝒏𝒅𝟒) =
𝟏 𝟏 𝟏
𝑷(𝒉𝒆𝒂𝒅). 𝒑(𝟒) = . =
𝟐 𝟔 𝟏𝟐
The problem in Example can also be
solved by using the sample space
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 86
Course Name: Principle of Statistics
H1 H2 H3 H4 H5 H6 T1 T2 T3 T4 T5
𝟏
T6 The solution is
𝟏𝟐
since there is only one way to get the
head-4 outcome.
Example: A card is drawn from a
deck and replaced; then a second card
is drawn. Find the probability of getting
a queen and then an ace.
Solution
𝑷(queen and ace) = P(queen) . P(ac
𝟏𝟔 𝟏
=
𝟐𝟕𝟎𝟒 𝟏𝟔𝟗
Example: An urn contains 3 red balls, 2 blue
balls, and 5 white balls. A ball is selected and
its
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 87
Course Name: Principle of Statistics
color noted. Then it is replaced. A second
ball is selected and its color noted. Find the
probability of each of these.
a. Selecting 2 blue balls
b. Selecting 1 blue ball and then 1 white ball
c. Selecting 1 red ball and then 1 blue ball
Solution
𝟐 𝟐
𝑷(blue and blue) =P(blue) .P(blue) = . =
𝟏𝟎 𝟏𝟎
𝟒
𝑷(blue and white) =P(blue) .P(white) =
𝟏𝟎
𝟐 𝟓 𝟏𝟎
. = =
𝟏𝟎 𝟏𝟎 𝟏𝟎𝟎
𝟏 𝟑 𝟐
𝑷(red and blue) = P(red) . P(blue) = .
𝟏𝟎 𝟏𝟎 𝟏𝟎
𝟔 𝟑
= When the outcome or occurrence
𝟏𝟎𝟎 𝟓𝟎
of the first event affects the outcome or
occurrence of the second event in such a
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 88
Course Name: Principle of Statistics
way that the probability is changed, the
events are said to be dependent events.
Example: Three cards are drawn from
an ordinary deck and not replaced.
Find the probability of these events.
a. Getting 3 jacks
b. Getting an ace, a king, and a queen in
order
c. Getting a club, a spade, and a heart in
order
d. Getting 3 clubs
𝟒 𝟑 𝟐
Solution 𝒂. 𝒑(𝟑𝒋𝒂𝒄𝒌) = . . =
𝟓𝟐 𝟓𝟏 𝟓𝟎
𝟐𝟒
=
𝟏𝟐𝟑𝟔𝟎𝟎
𝟏
𝒃. 𝒑((ace and king and queen) =
𝟓𝟓𝟐𝟓
𝟒 𝟒 𝟒 𝟔𝟒 𝟖
. . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎 𝟏𝟔𝟓𝟕𝟓
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 89
Course Name: Principle of Statistics
𝒄. 𝒑(club and spade and heart)
𝟏𝟑 𝟏𝟑 𝟏𝟑 𝟐𝟏𝟗𝟕 𝟏𝟔𝟗
= . . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎 𝟏𝟎𝟐𝟎𝟎
𝟏𝟑 𝟏𝟐 𝟏𝟏 𝟏𝟕𝟏𝟔
𝒅. 𝒑((3 clubs) = . . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎
𝟏𝟏
𝟖𝟓𝟎
Conditional Probability
Example: A box contains black chips
and white chips. A person selects two
chips without replacement. If the
𝟏𝟓
probability of selecting a black chip
𝟓𝟔
and a white chip is and the probability
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 90
Course Name: Principle of Statistics
of selecting a black chip on the first
𝟑
draw is
𝟖
find the probability of selecting the
white chip on the second draw, given
that the first chip selected was a black
chip.
Solution Let
B selecting a black chip
W selecting a white chip
p( B w) 15 56 15 3 15 8 5
p(W B) .
p ( B ) 3 8 56 8 56 3 7
A recent survey asked 100 people if
they thought women in the armed
forces should be permitted to
participate in combat. The results of
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 91
Course Name: Principle of Statistics
the survey are shown
Find these probabilities.
a. The respondent answered yes, given
that the respondent was a female.
b. The respondent was a male, given
that the respondent answered no.
Solution
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 92
Course Name: Principle of Statistics
50
a.P ( F )
100
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 93
Course Name: Principle of Statistics
FREQUENTLY USED FORMULAS n = sample size; N = population size
Sample mean
Population mean
Sample standard deviation
Population standard deviation
Sample mean for a frequency distribution
Sample standard deviation for a frequency distribution
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 94
Course Name: Principle of Statistics
Sample coefficient of variation
Range = Largest data value - smallest data value
Standard z value
Original x value
Central limit theorem
PROBABILITY FORMULAS
Probability of an event A
where f = frequency of occurrence of event
n = sample size
Probability of the complement of event A
P(not A) = 1 - P(A)
Multiplication rule for independent events
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 95
Course Name: Principle of Statistics
General multiplication rules
Addition rule for mutually exclusive events
P(A or B) = P(A) + P(B)
General addition rule
P(A or B) = P(A) + P(B) - P(A and B)
Permutation rule
Combination rule
Mean of a discrete probability distribution
Standard deviation of a discrete probability distribution
where r = number of successes;
BINOMIAL DISTRIBUTION FORMULAS p = probability of success; q = 1 –
p
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 96
Course Name: Principle of Statistics
Formula for a binomial probability distribution
Mean for a binomial distribution
Standard deviation for a binomial distribution
CONFIDENCE INTERVALS
Confidence interval for a mean (large samples)
Confidence interval for a mean (Small samples)
Confidence interval for a proportion (where np > 5 and nq > 5)
SAMPLE SIZE
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 97
Course Name: Principle of Statistics
Sample size for estimating means
Sample size for estimating proportions
REGRESSION AND CORRELATION
In all these formulas
Least squares line
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 98
Course Name: Principle of Statistics
Standard error of estimate
Pearson product-moment correlation coefficient
Coefficient of determination
r2
Confidence interval for y
yp - E < y < yp + E where yp is the predicted y value for x
Spearman Rank correlation coefficient
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 99