One or More Way Analysis of Variance
Wolde M.
Look the following research questions:
Are the birth weights of children in different geographical
regions the same?
Are the responses of patients to different medications and
placebo different?
Are people with different age groups have different
proportion of body fat?
Do people from different ethnicity have the same BMI?
One way-Analysis Of Variance
(One-way ANOVA)
All the above research questions have one common
characteristic: That is each of them has two variables: one
categorical and one quantitative
Main question: Are the averages of the quantitative variable
across the groups (categories) the same?
Because there is only one categorical independent variable
which has two or more categories (groups), the name one
way ANOVA comes.
One-way ANOVA cont…
One way ANOVA is also called Completely
Randomized Design
Experimental units (subjects) are assigned randomly
to treatments/groups. Here subjects are assumed to
be homogeneous
4
Analysis of variance cont…
One way ANOVA is a method for testing the hypothesis:
There is no difference between two or more population
means (usually at least three); or there is no difference
between a number of treatments
More formally, we can state hypotheses as:
H0: There is no difference among the mean of treatments effects
HA: There is difference at least between two treatments effects
or
Ho: µ1 = µ2 = µ3 =…. = µa (if there are „a‟ groups)
HA: at least one group mean is different
5
Why Not Just Use t-tests?
Since t-test considers two groups at a time, it will be tedious
when many groups are present
Conducting multiple t-tests can lead also to severe inflation of
the Type I error rate (false positives) and is not recommended
However, ANOVA is used to test for differences among several
means without increasing the Type I error rate
The ANOVA uses data from all groups at a time to estimate
standard errors, which can increase the power of the analysis
Assumptions of One Way ANOVA)
The data are normally distributed or the samples have come
from normally distributed populations and are independent.
The variance is the same in each group to be compared (equal
variance) homosidasity
Moderate departures from normality may be safely ignored,
but the effect of unequal standard deviations may be serious.
In the latter case, transforming the data may be useful.
Analysis of variance cont…
We test the equality of means among groups by using the
variance
The difference between variation within groups and
variation between groups may help us to compare the
means
If both are equal, it is likely that the observed difference is
due to chance and not real difference
Note that:
Total Variability = Variability between + Variability within
Analysis of variance
Basic model: Data are deviations
μ from the global mean, μ:
Xij = μ + Ɛij
Sum of vertical deviations squared is
G-1 G-2 the total sum of squares = SSt
One way model: Data are deviations
A2
from treatment means, Ais:
Xij = μ + Ai + Ɛij
Sum of vertical deviations squared = SSe
A1
Note that ΣAi = ΣƐij = 0
G-1 G-2
Decomposing the total variability
n a n a na
Total SS = Σ Σ (xij – )2 = ΣiΣjxij2 - (ΣiΣjxij)2 /na = SST
i=1 j=1
n a n a a n
Within SS = Σ Σ (xij – j ) 2 = Σ Σ x 2 - Σ (Σ x )2/n
i j ij j i ij = SSW
i=1 j=1
n a a n
Between SS = Σ Σ ( i j– )2 = Σj(Σixij)2/n - (ΣiΣjxij)2 /na = SSB
i=1 j=1
This is assuming each of the „a‟ groups has equal size, „n‟.
SST = SSW + SSB
10
Data of one way ANOVA
Groups/variable
G-1 G-2 G-3 ….. G-a
X11 X12 X13 ….. X1a
…..
Participants
X21 X22 X23 X2a
X31 X32 X33 ….. X3a
. . . . .
. . . . .
. . . . .
Xn1 Xn2 Xn3 …. Xna
Totals T.1 T.2 T.3 …. T.a
Computational formula
T= ΣiΣjxij2 Correction Factor = CF = (ΣiΣjxij)2 /na = T2../na
A = Σj(Σixij)2/n = Σj(T.j)2/n if the groups‟ (cells‟) size are equal, or
A = Σj(Σixij)2/nj = Σj(T.j)2/nj ; if unequal group size
Where, Xij = ith observation in the jth group of the table
i = 1, 2, 3,…, nj, j = 1, 2, 3,…,a, Σ jn j = N
Sum of squares and ANOVA Table
Source of df. SS MS F
variation
Between groups a-1 SSB = A - CF SSB/(a-1) MSB/MSW
Within groups na-a SSW = T - A SSW/(na –a)
Total na-1 SST = T - CF
If there are real differences among groups‟ means, the between
groups variation will be larger than the within variation
Example on one-way ANOVA
The following table shows the red cell folate levels (μg/l) in three groups of
cardiac bypass patients who were given three different levels of nitrous oxide
ventilation. (Level of nitrous oxide for group I > group II‟s > group III‟s)
Group I Group II Group III
(n=8) (n=9) (n=5)
243 206 241
251 210 258
275 226 270
291 249 293
347 255 328
354 273
380 285
392 295
309
Total=2533 2308 1390
Mean =316.6 256.4 278.0
SD = 58.7 37.1 33.8
Example Cont….
We can see the box plot just to have some
impression about it
Example cont…
Ho: μ1 = μ2 = μ3
HA: Differences exist between at least two of the means
Source of variation df SS Mean F P
square
Between groups 2 15516 7758
3.71 0.044
Within groups 19 39716 2090
Total 21 55232
Since the P-value is less than 0.05, the null hypothesis is rejected
Pair-wise comparisons of group means
post hoc tests or multiple comparisons
ANOVA test tells us only whether there is statistically significant
difference among groups means, but
It doesn‟t tell us which particular groups are significantly
different
To identify them, we use either a priori (pre-planed) or post hoc
tests
Pair-wise comparisons of group means
(post hoc tests) cont…
Whether to use a priori or post hoc tests depends on whether the
researcher has previously stated the hypotheses to test.
If you have honestly stated beforehand the comparisons between
individual pairs of means which you intend to make, then you are
entitled to use a priori test such as a t-test.
In this case, only one pair of groups or few will be tested
However, when you look at the data it may seem worth comparing
all possible pairs. In this case, a post hoc test such as Scheffe,
Benferroni (modified t-test), Tuckey methods, Least Significant
Difference (LSD), Duncan, Dannett, etc. will be employed.
Benferroni method or Modified t-test (Steps)
I. Find tcalc for the pairs of groups of interest (to be compared)
II. The modified t-test is based on the pooled estimate of
variance from all the groups (which is the residual variance
in the ANOVA table), not just from pair being considered.
III. If we perform k paired comparisons, then we should
multiply the P value obtained from each test by k; that is, we
calculate P' = kP with the restriction that P' cannot exceed 1.
Where, , that is the number of possible comparisons
Benferroni method or Modified t-test
Returning to the red cell folate data given above, the residual
standard deviation is = 45.72.
(a) Comparing groups I and II
t = (316.6 - 256.4) / (45.72 x √(1/9 +1/8)
= 2.71 on 19 degrees of freedom.
The corresponding P-value = 0.014 and the
corrected P value is P' = 0.014x3
= 0.042
Group I and II are different
Benferroni method or Modified t-test
(b) Comparing groups I and III
t = (316.6 - 278.0) / (45.72 x √(1/8+1/5)
= 38.6/26.06
= 1.48 on 19 degrees of freedom.
The corresponding P value = 0.1625 and
The corrected P value is P' = 0.1625x3
= 0.4875
Group I and III are not different
Benferroni method or Modified t-test
(c) Comparing Groups II and III
t = (278 - 256.4) / (45.72 x √(1/5+1/9)
= 21.6/25.5
= 0.85 on 19 degrees of freedom.
The corresponding P value = 0.425 and the corrected P value
is P' = 1.00
Group I and III are not different
Therefore, the main explanation for the difference between
the groups that was identified in the ANOVA is thus the
difference between groups I and II.
One way ANOVA’s limitations
This technique is only applicable when there is one
treatment used.
Note that this single treatment can have 3, 4,… ,many
levels.
Thus nutrition trial on children weight gain with 4
different feeding styles could be analyzed this way,
but a trial of BOTH nutrition and mothers health
status could not
Two-way ANOVA (axb Factorial Design)
Two-way ANOVA (axb Factorial Design)
Suppose we have two treatments „A‟ and „B‟ in an
experimental question.
Thus, the interest is in determining the combined effect of „A‟
and „B‟ as well as „A‟ considered separately and „B‟
considered separately.
That means it will have the same results as the separate one-
way ANOVA on each variable, except that here the interaction
effect is also included in the model
Two-way ANOVA (axb Factorial Design)
Again, here experimental units (subjects) are assigned at
random to each of the axb combinations, and
The subjects to be assigned are also assumed to be
homogeneous
„A‟ and „B‟ are the set of treatments and are called factors or
main effects and have „a‟ and „b‟ different treatments
respectively called treatment levels or groups
Main effects, simple effects, interaction
Main effect is a difference in population means for a factor
collapsed (or pooled) over the levels of all other factors in the
design
Thus, a significant main effect demonstrates that an independent
variable influences the dependent variable.
However, it does not establish which of the independent variable
levels are significantly different from another
To identify those significantly different levels, we use a priori
(pre-planed) or post hoc tests
Main effects, simple effects, cont…
An interaction occurs when the effect of one independent
variable is affected by another independent variable; or
When the effect of one factor is not the same at the levels of
another
Or it is when the effect on the dependent variable at different
levels of an independent variable is influenced by the
corresponding levels of another variable.
E.g. Grade score for females will be greater than males‟ when
tutor is present than absent
Main effects, simple effects, interaction
cont…
A simple effect is the effect of one independent variable on
the dependent variable at one particular level of another
independent variable.
We can test the differences between effects by two levels of
an independent variable at each level of another independent
variable separately
If the difference is significant ,we can confirm the existence
of simple effects at that particular level of second variable.
Usually we only look at simple effects if you have found a
significant interaction.
Two way ANOVA cont…
Calculate total Sum of Squares (for equal cell size =n)
T = ΣiΣjΣkxijk2 S = ΣiΣjTij.2/n CF = T2…/abn
A = ΣiT2i../na B = ΣjT.j.2/nb
Source df SS MS F
Factor A a-1 SSa =A - CF SSa/(a-1) MSa/MSe
Factor B b-1 SSb = B - CF SSb/(b-1) MSb/MSe
AxB (a-1)(b-1) SSab= S-A-B+CF SSab/(a-1)(b-1) MSab/MSe
Error ab(n-1) SSe = T - S SSe/ab(n-1)
Total abn-1 SSt = T - FC
Where “a” and “b”, are the number of groups in treatment A and
B respectively and „n‟ is the repilication
The two-way Linear model
The formal model underlying two-Way ANOVA, with 2
treatments A and B is: Xijk = μ + Ai + Bj +ABij + Ɛijk
Xijk is the kth replicate of treatment A level i and treatment B level j
Ai is the effect of the ith level of treatment A (= difference
between μ and mean of all data in this treatment).
Bj is the effect of the jth level of treatment B (= difference
between μ and mean of all data in treatment B).
Ɛijk is the unexplained error in Observation Xijk
Two way ANOVA cont…
The null hypotheses in a two-way ANOVA are these:
– The population means for the dependent variable are
equal across levels of the first factor
– The population means for the dependent variable are
equal across levels of the second factor
– The effects of the first and second factors on the
dependent variable are independent of one another
Two way ANOVA cont….
Handling treatment
(B)
B1 B2
Example (Steel & Torrie p. 343)
8.53 17.53
20.53 21.07
Effect of 2 treatments on blood 12.53 20.80
phospholipids in lambs. The A1 14.0 17.33
10.8 20.07
first was a handling treatment, Time
and the second one was the time treatment
(A) 39.14 32.00
treatment. 26.20 23.80
31.33 28.87
A2 45.80 25.06
40.20 29.33
Two-Way ANOVA con…
Start by a preliminary eyeballing of the data:
They are continuous, plausibly normally distributed.
There are 2 handling treatments and 2 time treatments,
Both are combined in a factorial design so that each of the 4
combinations is replicated 5 times.
The basic findings are the following:
n = 20
T… = ΣΣΣxijk = 484.92
T = ΣΣΣxijk2 = 13676.7
T…2/N = (ΣΣΣxijk)2/N = (484.92)2 / 20 = 11757.37
SST = 13676.7 – 11757.37= 1919.33
Now get totals for treatments A and B
B1 B2 Ti.
A1 66.39 96.80 163.19
A2 182.67 139.06 321.73
T.j 249.06 235.86 484.92
Hence the sums of squares for A and B can be calculated:
SSa = 163.192/10 + 321.732 / 10 - 11757.37 = 1256.75
SSb = 249.062/10 + 235.862/10 - 11757.37 = 8.712
A alone
Source Df SS MS F
A 1 1256.75 1256.75 34.14**
error 18 662.58 36.81
total 19 1919.33
B alone
Source Df SS MS F
B 1 8.71 8.71 0.08
error 18 1910.62 106.15
total 19 1919.33
Pooled
Source Df SS MS F
A 1 1256.75 1256.75 32.67**
B 1 8.71 8.71 0.24
error 17 653.87 38.86
total 19 1919.33
Interaction terms
We now meet a unique, powerful feature of factorial ANOVA.
It can examine data for interactions between treatments -
synergism or antagonism.
Some treatments intensify each others‟ effects:
The classic examples come from pharmacology.
Alcohol alone is lethal (toxic) at the 20-40 unit range.
Barbiturates are lethal. Together they are a vastly more lethal
combination, as the 2 drugs synergize.
In ecology, SO2 + NO2 is more damaging than the additive
effects of each gas alone - a synergism.
Antagonism.
It is the opposite - two treatments nullifying each
other.
Drought antagonizes effects of air pollution on plants,
as drought leads to closed stomata excluding the
noxious/harmful gas.
No interaction
Response B2
I
I Treatment B categories
I
I B1
I
I
I
1 2 3 Treatment A
Synergism
Response
Antagonism I
I
I I
I
I I
I I
I I
1 2 3 Treatment A 1 2 3
Incorporating interaction into the
ANOVA model
First, work out a sum of squares caused by ALL treatments at
ALL levels.
Thus for a 2*2 design, there are actually 4 treatments and for
3*3 design there are 9 treatments, etc. Call this SStrt
Now we can partition this Sum of squares:
SStrt = SSa + SSb + SSi , Where, SSi is for Sum Square
interaction
We know SSa, we know SSb, so we get SSi by subtraction after
calculating the SStrt using the computational formula
But SStrt = ΣΣxij.2/n - CF
For the lamb blood data:
We have 4 separate treatments: A1B1, A1B2, A2B1, A2B2
The data within these 4 groups add to: 66.39, 182.67,
96.80 and 139.06.
There are 5 replicates in each treatment
SStrt = 66.392/5 + 182.672/5 + 96.82/5 + 139.062/5 -
11757.37 = 1539.407
Two-Way ANOVA table with
interaction
Source df SS MS F
All treat 3 1539.07 …………………………..
A 1 1256.75 1256.75 52.93**
B 1 8.71 8.71 0.37
AxB 1 273.95 273.95 11.54**
Error 16 379.92 379.92
Total 19 1919.33 ………………………
Example Two Way ANOVA
(4x3 factorial ANOVA)
Twenty four men each weighting 35 kg overweight, are
assigned to the 24 treatments that arise from 4 diets and
3 levels of jogging with replication of 2. Each man
consumes the same amount of food per day, but the
diets differ in their proportion of proteins, fat, and
carbohydrate. The aim of the experiment is to reduce
weight .
(Data in the following slide)
4x3 Factorial Design
Diet
Equal High High Fat High Total
Protein Carbohyd.
0’ 8.5 15.5 8.5 15.5 97
11.5(20) 16.5(32) 7.5(16) 13.5 (29)
Data
Jogging
14 20 13 21 136
30’ 16(30) 23(43) 11(24) 18 (39)
(in minutes) 24.5 27 22 24.5 196
60’ 19.5(44) 24(51) 27(49) 27.5 (52)
Total 94 126 89 120 429
Example-Three way ANOVA
(2x2x3 factorial ANOVA)
Maxwell and Delaney (1990) studied the effect of three different
treatments:
1. Drug (a medication having three levels: X, Y and Z)
2. Biofeed (a psychological feedback : present and absent)
3. Diet (special diet as given or not given)
Seventy-two subjects suffering from hypertension were
recruited to the study with being randomly allocated to each of
the 12 treatment combinations. Blood pressure measurements
were made on each subject after treatments.
(Data in the following slide)
Data…. which is a 3X2x2 Design
Treatments Special Diet
Biofeedback Drug No Yes
X 170,175,165,180,160,158 161,173,157,152,181,190
.
Present Y 186,194,201,215,219,209 164,166,159,182,187,174
Z 180,187,199,170,204,194 162,184,183,156,180,173
X 173,194,197,190,176,198 164,190,169,164,176,175
Y 189,194,217,206,199,195 171,173,196,169,199,180,203
Absent
Z 202,228,190,206,224,204 205,199,170,160,179,179