Inferential
Statistics
1
Objectives
At the end of this course students will be able to:
Define Inferential statistics
Know statistical estimation
Understand hypothesis testing & the “types of
errors” in decision making.
Use test statistics to examine hypothesis about
population parameter
2
Inference
Use a random sample to
learn something about a
larger population
3
Inferential Statistics
Inferential Statistics: Are statistical methods used
for drawing conclusions about a population based on
the information obtained from the a sample of
observations drawn from that population
4
Inferential Statistics
Involves
– Estimation Population?
Population?
– Hypothesis
testing
Purpose
– Make decisions about
population
characteristics
Inferential Statistics
Inferential statistics
Hypothesis testing Estimation
One sample Point estimation
Two samples Interval estimation
6
Inferential process
7
Statistical Estimation
Estimation is the process of determining a likely value
of population parameter, based on information
collected from the sample
Estimation is the use of sample statistics to estimate the
corresponding population parameters
The objective of estimation is to determine the
approximate value of unknown population parameter
on the basis of a sample statistic 8
Sample Statistics as Estimators of Population
Parameters
A sample statistic is a A population parameter
numerical measure of a is a numerical measure of
summary characteristic of a summary characteristic
a sample. of a population.
An estimator of a population parameter is a sample statistic used to
estimate or predict the population parameter
An estimate of a parameter is a particular numerical value of a
sample statistic obtained through sampling.
9
Estimation
Every member of the
population has the
same chance of being
Population selected in the sample
Parameter
Random sample
Estimation
Statistic
10
Estimation
Estimation
Point Interval
estimation estimation
11
Point and Interval Estimates
A point estimate is a single value used as an estimate of a population
parameter
Interval estimate is a range or interval of numbers believed to include
unknown population parameter with a certain degree of assurance
Point estimate is always within the interval estimate
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Interval estimate
12
Estimation Process
Interval estimate
Population Point estimate
Mean I am 95%
confident that
Mean, , is X = 50
is between 40 &
unknown 60.
RandomSample
13
Point estimation
A single numerical value used to estimate the
corresponding population parameter
Gives little information about how close the value is to
the unknown population parameter
Example: Sample mean X= 3 is point estimate of
unknown population mean
14
Sample statistic &their corresponding
population parameter
Statistic Parameter
Mean: X estimates
Variance: s2 estimates 2
Standard
deviation:
s estimates
Proportion: p estimates
From entire
From sample
population
15
Properties of good estimate
a) Unbiasedness: An estimator is said to be unbiased
if its expected value is equal to the population
parameter it estimates.
For example: when E ( X ) ,the sample mean is an
unbiased estimator of the population mean
The mean of any single sample will probably not
equal to the population mean, but the average of the
means of repeated independent samples from a
population will equal to the population mean.
16
Properties of good estimate
b) Minimum variance: An estimate which has
a minimum standard error is a good estimator
For symmetrical distribution the mean has a minimum
standard error and
If the distribution is skewed the median has a minimum
standard error
17
Properties of good estimate
C) Consistency:
C) Consistency: An
An estimator
estimator isis said
said to
to be
be consistent
consistent ifif its
its
probability of
probability of being
being close
close to
to the
the parameter
parameter itit estimates
estimates increases
increases as
as
thesample
the samplesize
sizeincreases
increases
Consistency
n = 10 n = 100
18
Interval estimation
A single-valued estimate conveys little information
about the actual value of the population parameter,
about the accuracy of the estimate
The probability of getting a sample statistic value
that is exactly equal to the corresponding population
parameter is usually quite small
19
Interval estimation
It is not reasonable to assume that a sample statistic
value is exactly equal to the corresponding population
parameter
An interval estimate which locates the population
parameter within an interval, with a level of
confidence is needed
20
Confidence Interval or Interval Estimate
Confidence
Confidence interval
interval oror interval
interval estimate
estimate isis aa range
range or
or
interval of
interval of numbers
numbers believed
believed to to include
include anan unknown
unknown
population parameter
population parameter
Confidence
Confidence interval:
interval: provide
provide aa range
range of
of values
values of of the
the
estimate likely
estimate likely to
to include
include the
the “true”
“true” population
population parameter
parameter
with aa given
with given probability
probability
A confidence interval or interval estimate has two
components:
A range or interval of values
An associated level of confidence
21
Confidence Level
1. Probability that the unknown population
parameter falls within interval
2. Denoted (1 –
• is probability that parameter is not within
interval
3. Typical values are 99%, 95%, 90%
22
CI for population mean:
There are different conditions to be considered to construct confidence intervals of the
population mean,
1. Large-sample size and when is known
For sufficiently large sample size n >30, the sampling
distribution of the sample mean, is approximately
normal
A 100(1‐α)% σ C.I. for μ is: σ σ
x z /2 (x - z /2 , x + z /2 )
n n n
α is to be chosen by the researcher, most
common values of α are 0.05, 0.01, 0.001 and 0.1
23
CI for population mean:
2. Large-sample size and when is unknown
Whenever is not known (and the population is assumed
normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. However, for large degrees
of freedom, the t distribution is approximated well by the
Z distribution
A large sample 100(1‐α)% C.I. for μ is:
s
x z
2 n
Note that: when is unknown, s is a good approximation
of 24
Example
An epidemiologist studied the blood glucose level of
a random sample of 100 patients. The mean was 170,
with a SD of 10. Construct the 95% CI for the
population mean.
25
Solution
s
X Z /2
n
10
170 1.96
100
(168.04, 171.96)
We are 95% sure that the mean blood glucose level
of the population lies between 168.04 and 171.96
26
CI for population mean:
3. Small sample size (n<30) and when is
unknown
If population standard deviation is unknown, then
the sample means from samples of size n are t-
distributed with n-1 degrees of freedom
A 100(1‐α)% C.I. for μ is:
s
X t /2, n-1
n
27
Example: The average earnings per share (EPS)
for 10 industrial stocks randomly selected from
those listed on the Dow-Jones Industrial
Average was found to be X = 1.85 with a
standard deviation of S=0.395. Calculate a 99%
confidence interval for the average EPS of all
the industrials listed on the DJIA.
Solution:
28
29
Example: A random sample of 900 workers
showed an average height of 67 inches with a
standard deviation of 5 inches.
A. Find a 95% confidence interval of the mean
height of all workers
B. Find a 99% confidence interval of the mean
height of all workers
Solution:
30
31
32
Example: Suppose we want to estimate a 95%
confidence interval for the average quarterly returns
of all fixed-income funds in the Ethiopia. We draw a
sample of 100 observations and calculate the sample
mean to be 0.05 and the standard deviation 0.03. We
assume that those returns are normally distributed
with known variance.
Solution:
33
34
Example:
1. An economist is interested in studying the incomes of
consumers in a particular country. The population standard
deviation is known to be $1,000. A random sample of 50
individuals resulted in a mean income of $15,000. Construct
the 95% confidence interval ?
2. An auditor, examining a total of 820 accounts receivable of a
corporation, took a random sample of 60 of them. The sample
mean was $127 and the sample standard deviation was $43.
Find a 99% confidence interval for the population mean. 35
CI for a population proportion: Large-sample size
For sufficiently large samples, the sampling distribution of the
proportion p is approximately normal
A 100(1‐α)% CI for π is:
p±zα/2 p(1-p)
n
A sample is considered large enough when both n p and n q are greater
than 5, where q =1-p.
36
Example
• In a sample of 400 people who were questioned
regarding their participation in sports, 160 said that
they did participate. Construct a 98 % confidence
interval for P, the proportion of P in the population
who participate in sports.
37
38
Exercise:
1. In a survey of 300 automobile drivers in one city, 123
reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
2. In a survey of 300 automobile drivers in one city, 123
reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
39
Sample size
determination
40
Sample size determination
Common questions:
– “How many subjects should I study?”
– Too small sample: -Waste of time and resources
-Results have no practical use
– Too large sample: -Waste of resources
-Data quality compromised
-Any small difference can be
statistically significant
41
When deciding on sample size:
Precision is related to confidence level & CI
42
43
Margin of Error
44
Factors Affecting Margin of Error
Margin of error is determined by n, s and α
– As n increases, the width of CI decreases.
– As s increases, the width of CI increases
– As the confidence level increases (αdecreases),the
width of CI increases 45
Reducing the Margin of Error
σ
ME zα/ 2
n
The margin of error can be reduced if
– the standard deviation is lower (s ↓)
– The sample size is increased (n↑)
– The confidence level is decreased, (1 – ) ↓
46
Sample size determination depends on:
Objective of the study
Design of the study
Degree of precision or accuracy – the allowed
deviation from the true population parameter (can be
within 1% to 5%)
Degree of confidence level required
Availability of resources
47
Estimation of single mean
(zα/ 2 ) 2 2
n=
d2
Where:
n = sample size
= population standard deviation if known,
d = desired degree of precision = half of the
width of confidence interval
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
48
Example
Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would
like a 95% CI of 5 years wide. If the population SD is 12
years, how large should our sample be?
(zα/ 2 )2 2 (1.96)2 (144)
n= 2
2
88.5 89
d (2.5)
49
But the population is most of the time unknown
As a result, it has to be estimated from:
Pilot or preliminary sample:
– Select a pilot sample and estimate with the sample
standard deviation, s
Similar studies
50
Estimation of single proportion
(zα/ 2 ) 2 pq
n=
d2
Where:
n = sample size
P = percentage
q = 1-p
d = desired degree of precision
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
51
Example
A) Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% with 95% confidence
(zα/ 2 )2 pq (1.96)2 (0.2)(0.8)
n= 2
2
683
d (0.03)
52
Example
B) If the above sample is to be taken from a relatively small
population (say N = 3000) , the required minimum sample will
be obtained from the above estimate by making some
adjustment (if the population is less than 10,000 then a smaller
sample size may be required).
n 683
n final = 557
n 683
1+ 1
N 3000
53
An estimate of p is not always available
However, the formula may also be used for sample size
calculation based on various assumptions for the values of p.
Note: if no prior information about the proportion (p),
assume p=q=0.5
54
• Example 1: Calculate the sample size for a population of
100000. Take confidence level as 95% and margin of error
as 5%.
Solution:
• Sample size for 100000 population.
We will calculate the sample size first by calculating it for
infinite size and then adjusting it to the required size.
Given: Z = 1.960, P = 0.5, M = 0.05
• Using the sample size formula, adjust the sample size for the
required population in solved example 1.
55
• Example 3: Using the Sample Size Formula, find the
sample size for a survey where confidence level = 95%,
standard deviation = 0.5, and margin of error = +/- 5%.
Solution:
• The Sample Size can be calculated as = (Z-score)2 *
SD*(1-SD) / (margin of error)2= ((1.96)2 x 0.5(0.5)) /
(0.05)2= (3.8416 x 0.25) / 0.0025=0 .9604 /0 .0025=
384.16
• Thus, you will be needing 385 respondents for this survey.
56
Hypothesis
testing
57
What is a Hypothesis?
A hypothesis is a
I claim the mean of GPA
claim (assumption) about of this class is 3.5
the true value of unknown
population parameter
- Parameter may be
population mean, proportion,
correlation coefficient,...
– Must be stated
before analysis
58
Hypothesis testing
The purpose of hypothesis testing is to determine whether
enough statistical evidence exists to enable us to conclude that
a belief or hypothesis about a parameter is reasonable
Examples
– Is a new drug effective in curing a certain disease? A
sample of patient is randomly selected. Half of them are
given the new drug where half are given the standard drug .
Then, the improvement in the patients conditions is
measured and compared
59
Hypothesis Testing Process
Assume the
population
mean age is 50.
( H 0 : 50) Identify the Population
Is X 20 likely if ?
Take a Sample
No, not likely!
REJECT H0
X 20
60
Steps in hypothesis testing
1) State the statistical hypotheses
There are two hypotheses:
-Null hypotheses
- Alternative hypotheses
61
State the null hypotheses
Null hypothesis – called the hypothesis of no
difference or no association or no effect
States that ‘’there’s no difference’’ between the
hypothesized value and the population parameter
value
Is always about a population parameter, not
about a sample
62
Null Hypothesis:
H0
The null hypothesis (denoted by H0) is a statement that
the value of a population parameter (such as proportion,
mean, or standard deviation) is equal to some claimed
value.
Always contains the “=” “≤” or “” sign
We test the null hypothesis directly.
Either reject H0 or fail to reject H0.
63
State the alternative hypotheses
Alternate to null hypothesis
Says’’ there’s a difference between the
hypothesized value and the population parameter
value
It is what we are trying to prove, i.e. the reason for
the research question.
64
Alternative Hypothesis:
H1 or HA
The alternative hypothesis (denoted by H1 or HA) is
the statement that the parameter has a value that
somehow differs from the null hypothesis.
The symbolic form of the alternative hypothesis
must use one of these symbols: , < or >.
May or may not be accepted
65
Hypothesis
Example: Consider population mean
H0: μ = μ0
HA: μ μ0
Two- tailed
66
Example:
A. Is the mean SBP of the population is different from 120
mmHg?
- H0 : The mean SBP of the population is not different from
120 mmHg (H0: m = 120).
- HA : The mean SBP of the population is different from
120 mmHg (H1: m ≠ 120).
67
Errors in making Decision
[Link] I Error
– Probability of rejecting true null hypothesis
– Probability of accepting a false alternative hypothesis
– Probability of Type I Error is (Alpha)
• Called level of significance
[Link] II Error
– Probability of failing to reject a false null hypothesis
– Probability of rejecting a true alternative hypothesis
– Probability of Type II Error is (Beta)
68
Type I & II Errors Have an Inverse
Relationship
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.
69
Factors Affecting Type II Error
Significance level
– Increases when decreases
Population standard deviation
– Increases when increases
Sample size
– Increases when n decreases n
70
Controlling Type I and
Type II Errors
For any fixed , an increase in the sample
size n will cause a decrease in
For any fixed sample size n, a decrease in
will cause an increase in . Conversely, an
increase in will cause a decrease in .
To decrease both and , increase the
sample size.
71
Power of a statistical test
The power of a statistical test is the probability of
rejecting Ho, when Ho is really false. Thus power =
1-β.
Clearly if the test maximizes power, it minimizes the
probability of Type 2 error β.
72
Summary:
Elements of a Hypothesis Test
Null Hypothesis (H0)
– A theory about the values of one or more population
parameters. The status quo.
Alternative Hypothesis (Ha)
– A theory that contradicts the null hypothesis. The theory
generally represents that which we will accept only when
sufficient evidence exists to establish its truth.
Test Statistic
– A sample statistic used to decide whether to reject the null
hypothesis. In general,
Estimate-Hypothesized Parameter
test statistic=
Standard Error
73
Summary:
Elements of a Hypothesis Test
Critical Value
– A value to which the test statistic is compared at some
particular significance level. (usually at =.01, .05, .10)
Rejection Region
– The numerical values of the test statistic for which the null
hypothesis will be rejected.
– The probability is that the rejection region will contain the
test statistic when the null hypothesis is true, leading to a
Type I error. is usually chosen to be small (.01, .05, .10)
and is the level of significance of the test. 74
Summary of One- and Two-Tail Tests
One-Tail Test Two-Tail Test One-Tail Test
(left tail) (right tail)
H0: μ μ0 H0: μ ≤ μ0
HA: μ > μ0
HA: μ < μ0
75
Summary: Rejection Regions
1. Rejection Regions (In Grey)
.5
.5
Form of Ha: 0 2 2
2-tail hypothesis
2 2
If |z|>|z/2|
0
Then reject the null hypothesis.
Form of Ha: <0 .5
1-tail hypothesis
.5
If z< z
0
Then reject the null hypothesis.
Form of Ha: >o
.5
1-tail hypothesis
.5
If z> z
Then reject the null hypothesis
76
Summary :Type I and Type II
Errors
77
Example: Two-Tail Test
Q. Does an average box of
cereal contain 368 grams of
cereal? A random sample of
25 boxes showed X = 372.5.
The company has specified s
368 gm.
to be 15 grams. Test at the a
= 0.05 level.
78
General steps in hypothesis testing:
1. State the null and the alternative hypotheses.
2. Specify the level of significance, i.e. choose α (this
always given)
3. Identify the critical regions (s): the region in which
the null hypothesis is rejected.
4. Computation of the test statistic.
5. Making decision.
6. Conclusion
79
Summary of Decision Rules
80
Example Solution: Two-Tail Test
H0: m = 368
H1: m ¹ 368 Test Statistic:
s= 15
n = 36
Z –test is appropriate
a = 0.05 Decision: Do not reject
Critical Value: ±1.96 H0 at a = .05
Reject Reject
.025 .025
Conclusion: There is
No evidence that the
-1.96 0 1.96
Z true Mean is not 368
1.60
81
Example: Two-Tailed Test
Does an average box of cereal
contain 368 grams of cereal?
A random sample of 25 boxes
had a mean of 372.5 and a
standard deviation of 12
368 gm.
grams. Test at the .05 level of
significance.
Solution
Test Statistic:
• H0: = 368
• HA: 368
• = 0.05
• df = 25-1=24
• Critical Value: ±2.042 0.02 < p-value < 0.05
Decision: Reject Ho since p-
Reject H0 Reject H0 value < = .05 and t* > t-
.025 .025 critical
Conclusion: There is evidence
-2.042 0 2.042 t population average is not 368.
Example: One Tail Test
Q. Does an average box of
cereal contain more than
368 grams of cereal? A
random sample of 36
boxes showed X = 372.5.
The company has 368 gm.
specified s to be 15 grams.
Test at the a = 0.05 level. H0: m £ 368
H1: m > 368
84
Solution
H0: m £ 368
H1: m > 368 Test Statistic:
a = 0.05 X
Z 1.50
n = 36
Critical Value: 1.645 n
Reject
.05 Do Not reject H0 at a = .05
Decision:
0 1.645
Z No evidence that true
Conclusion:
mean is more than 368
1.50 85
• The p-value is the probability of obtaining a
value of the test statistic as extreme as, or more
extreme than, the actual value obtained, when
the null hypothesis is true.
• The p-value is the smallest level of significance,
, at which the null hypothesis may be rejected
using the obtained value of the test statistic.
• If P-value a, reject the null hypothesis.
• If P-value a, do not reject the null hypothesis.
86
Example: An automatic bottling machine fills cola into two liter (2000
cc) bottles. A consumer advocate wants to test the null hypothesis that
the average amount filled by the machine into a bottle is at least 2000 cc.
A random sample of 40 bottles coming out of the machine was selected
and the exact content of the selected bottles are recorded. The sample
mean was 1999.6 cc. The population standard deviation is known from
past experience to be 1.30 cc.
Compute the p-value for this test.
•• H0:2000
H0: 2000 x 0 1999.6 - 2000
z =
•• H1:2000
H1: 2000 1.3
n
•• nn==40, 0==2000,
40,0 2000,x-bar
x-bar==1999.6,
1999.6, 40
==1.3
1.3 = 1.95
p - value P(Z -1.95)
•• The
Thetest
teststatistic
statisticis:
is: 0.5000 - 0.4744
0.0256
87
p -Value Solution
Since (p-Value = 0.0256) (a = 0.05)
Reject H0 .
88
Example: One-ailed Test
Is the average capacity of the
batteries less than 140 ampere-
hours? A random sample of 20
batteries had a mean of 138.47 and a
standard deviation of 2.66. Assume
a normal distribution. Test at the .05
level of significance.
Solution
Test Statistic:
• H0: =>140
X 138.47 140
Ha: < 140 t 2.57
*
•
S 2.66
• = 0.05
n 20
• df = 20-1=19
• Critical Value: For t* , P-value <.05
Reject H0 Decision: Reject Ho since
p-value < a and t* < t-critical
.05
Conclusion: There is an evidence
population average is less than
-1.729 0 t
140
Example: An insurance company believes that, over the last few years,
the average liability insurance per board seat in companies defined as
“small companies” has been $2000. Using α = 0.01, test this hypothesis
using Growth Resources, Inc. survey data.
1. H0: = 2000 Vs H1: 2000
2. For = 0.01, critical values of z are ±2.576
3. The test statistic is:
4. Do not reject H0 if: [-2.576 z 2.576]
5. Reject H0 if: [z <-2.576] or z 2.576]
n = 100
x = 2700
s = 947
x 0 2700 - 2000
z =
s 947
n 100
700
= 7 .39 Reject H
94.7 0
6. Conclusion: Since the test statistic falls in
the upper rejection region, H0 is rejected, and
we may conclude that the average insurance
liability per board seat in “small companies”
is more than $2000.
Example:
1. A company that delivers packages within a large metropolitan
area claims that it takes an average of 28 minutes for a package
to be delivered from your door to the destination. Suppose that
you want to carry out a hypothesis test of this claim. Claim this
the minutes for a package to be delivered is equal to 28 at 0.05
level of significance.
2. The University uses thousands of fluorescent light bulbs each
year. The brand of bulb it currently uses has a mean life of 900
hours. A manufacturer claims that its new brand of bulbs, which
cost the same as the brand the university currently uses, has a
mean life of more than 900 hours. The university has decided to
purchase the new brand if, when tested, the test evidence
supports the manufacturer’s claim at the 0.05 significance level.
Suppose 64 bulbs were tested with the following results: = 920
hours S = 80 hours. Will the University purchase the new brand
of fluorescent bulbs?94
Measures of association
95
Chi-Square
Test two variables (Categorical variables) for
independence
Consider rxc contingency table:
Variable A Variable B
B1 B2 B3 B4 Totals
A1
A2
A3
Totals Grand total
where:
r = number of rows (number of categories of variable A)
c = number of columns (number of categories of variable B)
Chi-Square
Hypothesis to be tested:
H0: There is no association between the
row and column variables
HA: There is an association
or
H0: The row and column variables are
independent
HA: The two variables are dependent
Test Statistic: χ 2 - test with df= (r -1)x(c -1)
97
Chi-Square( 2) - test
where:
Oij -Observed frequency of i th row and jth column
i th row total×jth column total R i ×C j
E ij = =
grand total n
R i -Marginal total of the i th row
C j -Marginal total of the jth column
n-Grand total
98
An alternative method to calculate Chi-Square
for 2×2 table
Outcome
Exposure Yes No Total
Yes a b r1
No c d r2
Total c1 c2 n
n ( ad bc ) 2
2
r1r2c1c2
Remember that Chi-Square test should be applied
to counts and not percentages 99
Characteristics of the Chi-Square Distribution
1. It is not symmetric.
[Link] shape of the chi-square distribution depends upon the
degrees of freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom increases, the chi-
square distribution becomes more symmetric as is
illustrated in the following Figure (see next slide) .
4. The values are non-negative. That is, the values of are
greater than or equal to 0. 100
The Chi-Square Distribution
101
Assumption 2 - test
For the chi-square independence test to be used, the
following must be true
o The observed frequencies must be obtained by using
a random sample
o No expected frequency should be less than 1,
and no more than 20% of the expected
frequencies should be less than 5.
102
Critical values for chi-square:
.
Critical values are found in Table by first locating
the row corresponding to the appropriate number of
degrees of freedom (where df = n –1). Next, the
significance level is used to determine the correct
column.
0 2 (df , ) 2
103
Steps
Step 1: Determine the null and alternative
hypothesis
HO: The two variables are independent
HA : The two variables are associated
104
Step 2: Select a level of significance α based upon
the seriousness making Type I error. The level of
significance is used to determine the critical value.
All Chi-Square tests for independence are right-
tailed tests, so the critical value is with (r -1)x(c-1)
degrees of freedom. The shaded region at the right
represents the critical region or rejection region.
105
106
Step 3: Calculate the expected frequencies for
Contingency Table Cells and Verify the requirements are
satisfied.
(Su m of r ow r ) (Su m of colu m n c )
E xpect ed fr equ en cy E r ,c
Sa m ple size
(1) all expected frequencies are greater than or equal to 1
(all Eij > 1)
(2) no more than 20% of the expected frequencies are less than 5.
If the conditions listed above are satisfied, then…
107
Step 4: Compute the test statistic
2
2 (O E )
χ
E
where O represents the observed frequencies and
E represents the expected frequencies
108
Step 5: Make a decision to reject or fail to
reject the null hypothesis
- Compare the critical value to the test statistic
Step 6: State the conclusion
109
Example
A researcher wishes to determine whether there is a
relationship between the gender of an individual
and the amount of alcohol consumed. A sample of
68 people is selected, and the following data are
obtained. At = 0.10, can a researcher conclude
that alcohol consumption is related to gender?
110
Example
Results of observed frequencies
Alcohol consumption
Low Moderate High Row
Gender total
Male 10 9 8 27
Female 13 16 12 41
Column 23 25 20 68
total 111
Solution
Step 1 State the hypothesis
H0: The amount of alcohol that a person consumes
is independent of the individual’s gender
HA: The amount of alcohol that a person
consumes is dependent of the individual’s
gender
112
Step 2 Find the critical value: the critical value is
4.605, since the degrees of freedom are (2-1)(3-1)=2
Step 3 compute the test value: First, compute the
expected frequency.
(41)(23)
(27)(23) E 2,1 = 13.87
E1,1 = 9.13 68
68
(41)(25)
(27)(25) E 2,2 = 15.07
E1,2 = 9.93 68
68
(41)(20)
(27)(20) E 2,3 = 12.06
E1,3 = 7.94 68
68
113
The completed table of expected frequencies :
Alcohol consumption
Row
Low Moderate High
Gender total
Male 9.13 9.93 7.94 27
Female 13.87 15.07 12.06 41
Column 23 25 20 68
total
114
Then, the test value is
(O E ) 2
2
all cells E
(10 9.13) 2 (9 9.93) 2 (8 7.94) 2
9.13 9.93 7.94
(13 13.87) 2 (16 15.07) 2 (12 12.06) 2
13.87 15.07 12.06
0.283
115
Step 4 Make the decision: Do not reject the null
hypothesis, since 0.283 < 4.605
Step 5 Conclusion: There is no enough evidence to
support the claim that the amount of alcohol a person
consumes is dependent of the individual’s gender
116
Example: Random samples of 200 men, all retired were
classified according to educational level and their
number of children is as shown below. Test at α= 0.05
level of significance that is a relationship between
number of children and educational level of men?
117
Example: A psychologist selected 100 people from each of
three income groups and asked them if
they were “very happy.” The percent for each group who
responded yes and the
number from the survey are shown in the table. At a 0.05
test the claim that there is
no difference in the proportions.
HH income >33% 34-67% >67% Total
Very happy 24 33 38 95
Not very happy 76 67 62 205
Total 100 100 100 300
118