Dr Aneeqa Noor
STATISTICAL TESTS Clinical Biostatistics
Lecture 10 - 25.11.2025
Z Test and T Test
z-test is a statistical tool used for the
comparison or determination of the significance
of several statistical measures, particularly the
mean in a sample from a normally distributed
population or between two independent
Z Test for samples.
Like t-tests, z tests are also based on normal
Statistical probability distribution.
Z-test is the most commonly used statistical tool
Inference in research methodology, with it being used for
studies where the sample size is large (n>30).
In the case of the z-test, the variance is known.
A z-score is a number indicating how many
standard deviations above or below the mean of
the population is.
One Sample Z test formula
where x̄ is the mean of the sample, and µ is the assumed mean, σ is the standard
deviation, and n is the number of observations.
Z Test for Two Populations
Z Test when H0: µ1 - µ2 = 3.2
difference in H1: µ1 - µ2 > 3.2
population
mean is not
equal to 0
Paired Z Test
The Paired Samples Z-Test is a statistical test
used to determine if 2 paired groups are
significantly different from each other on your
variable of interest. Your variable of interest
should be continuous, be normally distributed,
and have a similar spread between your 2
groups. Your 2 groups should be paired (often
two observations from the same group) and you
should have enough data (more than 30 values in
each group) or know your population variance.
Paired Z Test
ANOVA
ANOVA, which stands for Analysis of
Variance, is a statistical test used to analyze
the difference between the means of more
than two groups.
A one-way ANOVA uses one independent
variable, while a two-way ANOVA uses two
independent variables.
As a biomedical researcher, you want to test
the effect of three different nanoparticle
mixtures on disease duration. You can use a
one-way ANOVA to find out if there is a
difference between the three groups.
ANOVA
ANOVA determines whether the groups created by the levels of
the independent variable are statistically different by
calculating whether the means of the treatment levels are
different from the overall mean of the dependent variable.
If any of the group means is significantly different from the
overall mean, then the null hypothesis is rejected.
ANOVA uses the F test for statistical significance. This allows
for comparison of multiple means at once, because the error is
calculated for the whole set of comparisons rather than for
each individual two-way comparison (which would happen with
a t test).
The F test compares the variance in each group mean from
the overall group variance. If the variance within groups is
smaller than the variance between groups, the F test will find
a higher F value, and therefore a higher likelihood that the
difference observed is real and not due to chance.
The assumptions of the ANOVA test are the same as the
general assumptions for any parametric test:
Independence of observations: the data were collected
using statistically valid sampling methods, and there are
no hidden relationships among observations. If your data
Assumptions of fail to meet this assumption because you have a
confounding variable that you need to control for
statistically, use an ANOVA with blocking variables.
ANOVA Normally-distributed response variable: The values of
the dependent variable follow a normal distribution.
Homogeneity of variance: The variation within each
group being compared is similar for every group. If the
variances are different among the groups, then ANOVA
probably isn’t the right fit for the data.
ANOVA
Two-way ANOVA
Using the two-factor analysis of
variance, you can now answer
three things:
Does factor 1 have an effect on
the dependent variable?
Does factor 2 have an effect on
the dependent variable?
Is there an interaction between
factor 1 and factor 2?
Two-way ANOVA
For a two-factor analysis of variance to be calculated
without repeated measures, the following assumptions
must be met:
Dependent variable should be numeric.
Independence: The measurements should be independent,
i.e. the measured value of one group should not be
Assumptions influenced by the measured value of another group. If this
were the case, we would need an analysis of variance with
repeated measures.
Homogeneity: The variances in each group should be
approximately equal. This can be checked with Levene's
test.
Normal distribution: The data within the groups should be
normally distributed.
Two-way
ANOVA
T- Test:
[Link]
QOhSk1sQMA
Performing T- One-way ANOVA:
test and ANOVA [Link]
GDAetOrFo
on GraphPad
Two-Way ANOVA:
[Link]
4lPpKyNyE
Repeated Measures ANOVA
Repeated Measures ANOVA is used when;
Independence: Each of the observations should be independent.
Normality: The distribution of the response variable is normally distributed.
Sphericity: The variances of the differences between all combinations of related
groups must be equal.
Repeated Measures ANOVA is used when the same participants are measured
multiple times across different conditions or time points, making it ideal for
longitudinal studies, pre-post-follow-up designs, and experiments where each
subject experiences all levels of the independent variable. It is applied when
the dependent variable is continuous and when researchers want to control for
individual differences, which increases statistical power compared to
independent designs.
This test is appropriate when three or more related conditions are being
compared, and when the repeated observations are correlated because they
come from the same subjects. It is particularly useful for evaluating changes
over time, effects of training or treatment, and performance across repeated
trials.
Repeated Measures ANOVA
Repeated Measures ANOVA
The Analysis of Variance (ANOVA) test has long been an important
tool for researchers conducting studies on multiple experimental
groups and one or more control groups.
However, ANOVA cannot provide detailed information on
differences among the various study groups, or on complex
combinations of study groups. To fully understand group
Multiple differences in an ANOVA, researchers must conduct tests of the
differences between particular pairs of experimental and control
groups.
Comparisons Tests conducted on subsets of data tested previously in another
analysis are called post hoc tests.
A class of post hoc tests that provide this type of detailed
information for ANOVA results are called "multiple comparison
analysis" tests. The most commonly used multiple comparison
analysis statistics include the following tests: Tukey, Newman-
Keuls, Scheffee, Bonferroni and Dunnett.
Multiple Comparison’s Problem
Multiple Comparison’s
Problem
The table below shows how increasing the number of groups in your
study causes the number of comparisons to rise, which in turn raises
the family-wise error rate. Notice how quickly the quantity of
comparisons increases by adding just a few groups! Correspondingly,
the experiment-wise error rate rapidly becomes problematic.
In post hoc tests, you set the experiment-wise error rate you want
for the entire set of comparisons. Then, the post hoc test calculates
the significance level for all individual comparisons that produces the
familywise error rate you specify.
The adjusted p-value identifies the group comparisons that are
significantly different while limiting the family error rate to your
significance level.
Simply compare the adjusted p-values to your significance level.
When adjusted p-values are less than the significance level, the
difference between those group means is statistically significant.
Importantly, this process controls the family-wise error rate to your
significance level. We can be confident that this entire set of
comparisons collectively has an error rate of 0.05.
The Bonferroni correction counteracts the family-wise
error rate problem by adjusting the alpha value based
on the number of tests.
To find your adjusted significance level, divide the
significance level (α) for a single test by the number of
tests (n). Hence, Bonferroni Correction = α / n
Bonferroni For example, if your original, single test alpha is 0.05
and you have a set of five hypothesis tests, your
adjusted significance level of 0.05 / 5 = 0.01. Your
results are statistically significant when your p-value is
less than or equal to the adjusted significance level.
This adjusted alpha value helps control the overall rate
of false positives, ensuring your findings are more
reliable.
The Tukey Honestly Significant Difference (HSD) test is a
post hoc test used in statistics to compare all possible pairs
of means after an analysis of variance (ANOVA) has
identified significant differences among group means.
The main principle of Tukey HSD is to control the
familywise error rate, which is the probability of making
one or more Type I errors (false positives) in a set of
comparisons.
Tukey The Tukey HSD test achieves this by using a critical value
based on the studentized range distribution. It compares
the difference between each pair of means to this critical
value, and if the difference is greater than the critical
value, it is considered statistically significant.
In summary, the Tukey HSD test allows for a comprehensive
examination of pairwise differences between group means
while maintaining control over the overall Type I error rate.
Tukey
Once you have the results
of ANOVA, the next step is
the calculation q statistic.
Calculate the dF within
groups.
Use studentized
distribution/ q table to find
the q value.
In the given example: q is
3.77, MSE is 15.9 and n is 5
(samples per group)
Find the critical value using
the given formula (6.72 in
this example)
Tukey
Tukey
Post Hoc Tests
Non-Parametric Tests
The Wilcoxon test, which can refer to
either the rank sum test or the signed
rank test version, is a nonparametric
statistical test that compares two
paired groups.
The tests essentially calculate the
difference between sets of pairs and
analyze these differences to establish if
they are statistically significantly
Wilcoxon test
different from one another.
These models assume that the data
comes from two matched, or
dependent, populations, following the
same person or stock through time or
place. The data is also assumed to be
continuous as opposed to discrete.
Because it is a nonparametric test, it
does not require a particular probability
distribution of the dependent variable in
the analysis.
Wilcoxon test
In order to investigate whether adults
report verbally presented material
more accurately from their right than
from their left ear, a dichotic listening
task was carried out. The data were
found to be positively skewed. The
number of words heard accurately are
shown in the table.
For conducting this test, the following
steps must be carried out;
Find the differences
Rank the differences. Omit the
participants with no (0)
difference.
Wilcoxon test
Ignoring the sign of the difference
(whether it's positive or negative), the
lowest difference is – 1, of which
there are 4 instances. So, we add up
the ranks they would take e.g., 1 + 2
+ 3 + 4 = 10, and then divide this by
the number of ranks, so 10 / 4 = 2.5.
The next lowest rank is 2 (there are
both positive and negative differences
here but ignore the signs). So, add
together ranks 5 + 6 = 11. The ranks
assigned would therefore actually be
11 / 2 = 5.5.
Wilcoxon test
Add together the ranks belonging to scores
with a positive sign (5.5 + 7.5 = 13 ).
Add together the ranks belonging to scores
with a negative sign (7.5 + 2.5 + 2.5 + 9 +
2.5 + 2.5 + 5.5 + 10 + 11 = 53 ).
Whichever of these sums is the smaller, is
our value of W
N is the number of differences (omitting
“0” differences).
We have 12 – 1 = 11 differences.
Use the table of critical Wilcoxon values.
With an N of 11, what is the critical value
for a two-tailed test at the 0.05
significance level.
With the Wilcoxon test, an obtained W is
significant if it is LESS than the critical
value.
Wilcoxon test
Obtained W = 13
Critical value = 14
As the obtained value of 13 is
less than 14, and so we can
conclude that there is a
significant difference between
the number of words recalled
from the right ear and the
number of words recalled from
the left ear.
ONE SAMPLE?
The Mann-Whitney U test is a
commonly used non-parametric test for
comparing two independent samples.
Mann-Whitney It assesses whether the distributions of
the two groups are identical or if one
U test tends to have higher values than the
other.
This test is invaluable when dealing
with skewed data.
The test compares two populations.
Assumptions of this test include:
1. The dependent variable should be
measured on an ordinal scale or a
continuous scale.
Mann-Whitney 2. The independent variable should be
categorical group.
U test 3. Observations should be independent.
In other words, there should be no
relationship between the two groups
or within each group.
4. Observations are not normally
distributed. However, they should
follow the same shape (i.e. both are
bell-shaped and skewed left).
Mann-Whitney U test
To perform the Mann-Whitney U test for two
independent samples, the rankings of the
individual values must first be determined.
Rank
Mann-Whitney U test
These rankings are then added up for the two groups. In the
example above, the rank sum T1 of the women is 37 and the
rank sum of the men T2 is 29.
Mann-Whitney U test
In the next step, the U-values are calculated from the
rank sums T1 and T2. The following formula is used:
where n1, n2 are the number of elements in the first and
second group respectively.
Mann-Whitney U test
Mann-Whitney U test
Is the difference between reaction times of males and females significant?
Mann-Whitney U test
Mann-Whitney U test
This test is used when the assumptions for ANOVA
aren’t met (like the assumption of normality).
The Kruskal Wallis test will tell you if there is a
significant difference between groups. However, it
won’t tell you which groups are different. For
Kruskal Wallis that, you’ll need to run a Post Hoc test.
This assumptions of this test are as follows:
H test Three or more levels (independent groups). The test
is more commonly used when you have three or more
levels. For two levels, consider using the Mann
Whitney U Test instead.
All groups should have the same shape distributions
Kruskal Wallis H
test Where:
n = sum of sample sizes for all groups
c = number of groups
Tj = sum of ranks in the jth sample,
nj = size of the jth sample.
Kruskal Wallis H
test 1. Sort the data for all groups/samples into ascending
order in one combined set.
2. Assign ranks to the sorted data points. Give tied
values the average rank.
3. Add up the different ranks for each group/sample.
Kruskal Wallis H test
H = 6.72
Find the critical value, with c-1 degrees of freedom. For 3 – 1
degrees of freedom and an alpha level of .05, the critical chi
square value is 5.9915.
If the critical chi-square value is less than the H statistic, reject
the null hypothesis that the medians are equal.