Analysis of Variance
Learning Objectives
In this chapter, you learn:
How to use one-way analysis of variance to
test for differences among the means of
several populations (also referred to as
“groups”)
How to use two-way analysis of variance and
interpret the interaction effect
How to perform multiple comparisons in a
one-way analysis of variance and a two-way
analysis of variance
Chapter Overview
Analysis of Variance (ANOVA)
One-Way Two-Way
ANOVA ANOVA
F-test
Interaction
Tukey- Effects
Kramer
Multiple
Comparisons Tukey Multiple
Comparisons
Levene‘s Test:
Homogeneity
of Variance
General ANOVA Setting
Investigator controls one or more factors of
interest
Each factor contains two or more levels
Levels can be numerical or categorical
Different levels produce different groups
Observe effects on the dependent variable
Are the groups the same?
Completely Randomized
Design
Experimental units (subjects) are
assigned randomly to the different
levels (groups)
Subjects are assumed homogeneous
Only one factor or independent
variable
With two or more levels (groups)
Analyzed by one-factor analysis of
variance (one-way ANOVA)
One-Way Analysis of
Variance
Evaluate the differences among the means of
three or more groups
Examples: Accident rates for 1st, 2nd, and
3rd shift
Expected mileages for five
brands of tires
Hypotheses: One-Way
ANOVA
All population means are equal
i.e., no treatment effect (no variation in means among
groups)
H1 : Not all of the population means are the same
At least one population mean is different from
the others
i.e., there is a treatment (groups) effect
Does not mean that all population means are
different (at least one of the means is different
from the others)
Hypotheses: One-Way
ANOVA
𝐻 0 : 𝜇 1=𝜇 2=𝜇 3=…=𝜇𝑐
H1 : Not all μ j are the same
All Means are the same:
The Null Hypothesis is
True
(No Group Effect)
μ1 μ 2 μ 3
Hypotheses: One-Way
ANOVA
𝐻 0 : 𝜇 1=𝜇 2=𝜇 3=…=𝜇𝑐 At least one mean is
different:
H1 : Not all μj are the same The Null Hypothesis is NOT
true
(Treatment Effect is present)
or
μ1 μ2 μ3 μ1 μ2 μ3
Partitioning the Variation
Total variation can be split into two parts:
SST = SSA +
SSW
SST = Total Sum of Squares
(Total variation)
SSA = Sum of Squares Among
Groups
(Among-group variation)
SSW = Sum of Squares Within
Groups
Partitioning the Variation
SST = SSA +
SSW
Total Variation = the aggregate dispersion of the
individual data values around the overall (grand)
mean of all factor levels (SST)
Among-Group Variation = dispersion between the
factor sample means (SSA)
Within-Group Variation = dispersion that exists
among the data values within the particular factor
levels (SSW)
The Total Sum of Squares
SST = SSA +
SSW c n j
SST ( X ij X ) 2
j 1 i 1
Where:
SST = Total sum of squares
c = number of groups
nj = number of records in group j
Xij = ith value from group j
X = grand mean (mean of all data values)
The Total Sum of Squares
2 2 2
SST ( X 11 X ) ( X 12 X ) ( X cn X )
c
R esponse, X
G roup 1 G roup 2 G roup 3
Among-Group Variation
SST = SSA +
SSW c
SSA n j ( X j X ) 2
Where: j 1
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
Among-Group Variation
SSA n 1 (X1 X) n 2 (X 2 X) n c (X c X)
2 2 2
R esponse, X
X3
X2 X
X1
G roup 1 G roup 2 G roup 3
Among-Group Variation
c
SSA n j ( X j X)2
j1
Variation Due to
Differences
GroupsAmong
i j
Within-Group Variation
SST = SSA +
SSW
c nj
SSW ( X ij X j ) 2
Where: j 1 i 1
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith value in group j
Within-Group Variation
SSW (X11 X1 ) (X12 X 2 ) (X cnc X c )
2 2 2
R esponse, X
X3
X2
X1
G roup 1 G roup 2 G roup 3
Within-Group Variation
c nj
SSW ( Xij X j )2
j1 i1
Summing the variations
within each group and
then adding over all
groups
μj
Obtaining the Mean
Squares
The Mean Squares are obtained by dividing the
various sum of squares by their associated
degrees of freedom
SSA Mean Square Among
MSA (d.f. = c-1)
c 1
SSW
MSW Mean Square Within
n c (d.f. = n-c)
SST
MST Mean Square Total
n 1 (d.f. = n-1)
One-Way ANOVA Table
Source of df SS MS F-Ratio
Variation (Variance
)
MSA
Among c-1 SSA MSA F
MSW
Groups
Within n-c SSW MSW
Groups
Total n-1 SST =
SSA+SS
W
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
One-Way ANOVA
Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are
Test different
statistic
MSA
F
MSW
MSA is mean squares among variances
MSW is mean squares within variances
Degrees of freedom
df1 = c – 1 (c = number of groups)
df2 = n – c (n = sum of all sample sizes)
One-Way ANOVA
Test Statistic
The F statistic is the ratio of the among
variance to the within variance
The ratio must always be positive
df1 = c -1 is typically small
df2 = n - c is typically large
FINV(,c-1,n-c)
Decision Rule:
Reject H0 if F > F α,
otherwise do not reject = .05
H0 0 Do not Reject
reject
H0 H0
Fα
One-Way ANOVA
Example
Class 1 Class 2 Class 3
We want to see if three 254 234 200
different classes yield 263 218 222
different academic 241 235 197
results. We randomly 237 227 206
select five measurements 251 216 204
for trials from each class.
At the .05 significance
level, is there a
difference in mean
overall scores?
One-Way ANOVA
Example
Exam
Class 1 Class 2 270
Score
Class 3 260 •
254 234 200
263 218 222
250 • X1
•
241 235 197 240 •
• •
237 227 206 230 •
•X2
251 216 204 220
• •
X
•
210 X3
x1 249.2 x 2 226.0 x 3 205.8 200 •
•
•
•
x 227.0 190
1 2
3 Club
One-Way ANOVA
Example
Class 1 Class 2 X1 = n1 = 5
Class 3
249.2
254 234 200 n2 = 5
263 218 222 X2 =
241 235 197 n3 = 5
237 227 206 226.0
251 216 204 n = 15
X3 =
c=3
205.8
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 =
4716.4 X = 2 +…+ (204 – 205.8)2 =
SSW = (254 – 249.2)2 + (263 – 249.2)
1119.6 227.0
MSA = 4716.4 / (3-1) = 2358.2
2358.2 FSTAT 25.275
MSW = 1119.6 / (15-3) = 93.3
93.3
One-Way ANOVA
Example
H0: μ1 = μ2 = μ3 Test Statistic:
H1: μj not all equal
MSA 2358.2
= 0.05 FSTAT 25.275
MSW 93.3
df1= 2 df2 = 12
Critical Decision:
Value:
Reject H0 at =
Fα =
3.89 Conclusion:
0.05
= .05 There is evidence that
0 at least one μj differs
Do not Reject
reject
H0 H0
Fα = 3.89 from the rest
One-Way ANOVA
SUMMARY
Groups Count Sum Average Variance
Class 1 5 1246 249.2 108.2
Class 2 5 1130 226 77.5
Class 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Among
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
The Tukey-Kramer
Procedure
Tells which population means are significantly
different
e.g.: μ1 = μ2 ≠ μ3
Done after rejection of equal means in ANOVA
Allows pairwise comparisons
Compare absolute mean differences with
critical range
μ 1= μ 2 μ3 x
Tukey-Kramer Critical
Range
MSW 1 1
Critical Range QU
2 n j n j'
where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of (see appendix
D.11 & D.12 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’
The Tukey-Kramer
Procedure
1. Compute absolute mean
Class 1 Class 2 Class 3 differences:
254 234 200 x1 x 2 249.2 226.0 23.2
263 218 222
241 235 197 x1 x 3 249.2 205.8 43.4
237 227 206
x 2 x 3 226.0 205.8 20.2
251 216 204
2. Find the QU value from the table in appendix
D.11 with
c = 3 and (n – c) = (15 – 3) = 12 degrees of
freedom for the desired level of ( = .05 used
here): QU 3.77
The Tukey-Kramer
Procedure
3. Compute Critical
Range:
MSW 1 1 93.3 1 1
Critical Range Q U 3.77 16.285
2 n j n j' 2 5 5
4. x1 x 2 23.2 x1 x 3 43.4 x 2 x 3 20.2
Compare:
5. All of the absolute mean differences are
greater than the critical range. Therefore, there
is a significant difference between each pair of
means at the 5% level of significance. Thus,
with 95% confidence, we can conclude that
the mean overall score of class 1 is greater
than those of class 2 and 3, and the mean
overall score of class 2 is greater than that
ANOVA Assumptions
Randomness and Independence
Select random samples from the c
independent groups (or randomly
assign the levels)
Normality
The sample values from each group
are from a normal population
Homogeneity of Variance
All populations sampled from have
the same variance
ANOVA Assumptions
Levene Homogeneity of Variance Test
Tests the assumption that the variances of all
groups are equal.
First, define the null and alternative
hypotheses:
H0: σ21 = σ22 = …=σ2c
H1: Not all σ2j are equal
Second, compute the absolute values of the
differences between each value and the median
of each group.
Third, perform a one-way ANOVA on these
absolute differences.
Levene’s Test Example
H0: σ21 = σ22 = σ23
H1: Not all σ2j are equal
Calculate Medians Calculate Absolute Differences
Class Class Class Class 1 Class 2 Class 3
1 2 3 14 11 7
237 216 197 10 9 4
241 218 200 0 0 0
Media 3 7 2
251 227 204 n
12 8 18
254 234 206
263 235 222
Levene’s Test Example
ANOVA: Single Factor
SUMMARY
Su Averag Varianc
Groups Count m e e Since the p-
Class 1 5 39 7.8 36.2 value is
Class 2 5 35 7 17.5 greater than
0.05, there
Class 3 5 31 6.2 50.2
Source of P- F
is
Variation SS df MS F value crit insufficient
evidence of
0.09 3.88
Among Groups 6.4 2 3.2 2 0.912 5 a difference
in the
415. 34.
variances
Within Groups 6 12 6
Total 422 14
Two-Way ANOVA
Examines the effect of
Two factors of interest on the
dependent variable
e.g., Percent carbonation and line
speed on soft drink bottling process
Interaction between different levels
of these two factors
e.g., Does the effect of one particular
carbonation level depend on which
level the line speed is set?
Two-Way ANOVA
Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications for each
cell
n = total number of observations in
all cells (n = rcn’)
Xijk = value of the kth observation of
level i of factor A and level j of
factor B
Two-Way ANOVA
Sources of Variation
SST = SSA + SSB + SSAB + Degrees
of
SSE Freedom:
SSA r–1
Factor A Variation
SST
SSB c–
Total Variation Factor B Variation
1
SSAB
Variation due to interaction (r – 1)(c –
between A and B 1)
n-1
SSE rc(n’ –
Random variation (Error)
1)
Two-Way ANOVA
Equations
SSA
MSA Mean square factor A
r 1
SSB
MSB Mean square factor B
c 1
SSAB
MSAB Mean square interactio n
(r 1)(c 1)
SSE
MSE Mean square error
rc (n' 1)
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: μ1.. = μ2.. = • • • = MSA
F Reject H0
μr..
MSE if F > FU
H1: Not all μi.. are
equal F Test for Factor B Effect
H0: μ.1. = μ.2. = • • • =
μ.c. MSB Reject H0
F
H1: Not all μ.j. are MSE if F > FU
equal
F Test for Interaction
H0: the interaction of A and B
Effect
is equal to zero MSAB
H1: interaction of A and B isn’t F Reject H0
MSE
zero if F > FU
Two-Way ANOVA:
Summary Table
Sources Degrees Sum of F
Mean
of of Square Statisti
Squares
Variation Freedom s cs
MSA MSA/
Factor A r–1 SSA
= SSA /(r – 1) MSE
MSB MSB/
Factor B c–1 SSB
= SSB /(c – 1) MSE
AB MSAB
(r – 1)(c – MSAB/
(Interacti SSAB = SSAB / (r – 1)(c –
1) MSE
on) 1)
MSE
Error rc(n’ – 1) SSE
= SSE/rc(n’ – 1)
Total n–1 SST
Two-Way ANOVA:
Features
Degrees of freedom always add up
n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)
Total = error + factor A + factor B +
interaction
The sums of squares always add up
SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B +
interaction
The denominator of the F Test is always the
same but the numerator is different
Two-Way ANOVA:
Interaction
No Significant Interaction is
Interaction: present:
Factor B Level
Mean Response
Mean Response
1 Factor B Level
Factor B Level 1
3
Factor B Level
Factor B Level 2
2 Factor B Level
3
Factor A Factor A Levels
Levels
Multiple Comparisons:
The Tukey Procedure
Unless there is a significant
interaction, you can determine the
levels that are significantly different
using the Tukey procedure
Consider all absolute mean differences
and compare to the calculated critical
range X1.. X 2..
Example: Absolute differences
X1.. X3..
for factor A, assuming three levels:
X 2.. X3..
Multiple Comparisons:
The Tukey Procedure
Critical Range for Factor A:
MSE
Critical Range Qα
c n'
(where Qα is from Table E.7 with r and rc(n’–1)
d.f.)
Critical Range for Factor B:
MSE
Critical Range Qα
r n'
(where Qα is from Table E.7 with c and rc(n’–1)
d.f.)
Chapter Summary
In this chapter, we have
Described one-way analysis of variance
The logic of ANOVA
ANOVA assumptions
F test for difference in c means
The Tukey-Kramer procedure for multiple
comparisons
Described two-way analysis of variance
Examined effects of multiple factors
Examined interaction between factors
Check Your Understanding
One-way ANOVA can be applied to:
a) Regression model with several dummy
variables (created for a qualitative
independent variable) to test the overall
usefulness of the model
b) Regression model with several quantitative
independent variables to test the overall
usefulness of the model
c) Both of the above
d) None of the above
48