0% found this document useful (0 votes)
15 views5 pages

Understanding Chi-Square and T-Tests

The Chi-Square (χ²) test is a non-parametric statistical method used to compare observed and expected frequencies in categorical data to determine independence or association between variables. It is commonly applied in goodness-of-fit tests, significance of association, and homogeneity tests, with a focus on frequency distributions. The Student's t-test is another statistical method used to compare means between populations, introduced by William Sealy Gosset, and includes one-sample, two-sample, and paired tests to assess significant differences.

Uploaded by

punamofficial02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Understanding Chi-Square and T-Tests

The Chi-Square (χ²) test is a non-parametric statistical method used to compare observed and expected frequencies in categorical data to determine independence or association between variables. It is commonly applied in goodness-of-fit tests, significance of association, and homogeneity tests, with a focus on frequency distributions. The Student's t-test is another statistical method used to compare means between populations, introduced by William Sealy Gosset, and includes one-sample, two-sample, and paired tests to assess significant differences.

Uploaded by

punamofficial02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CHI-SQUARE (Χ²) TEST

The Chi-Square (χ²) test is one of the most important tests of significance used in statistics. Symbolically
written as χ², it is a statistical measure used in sampling analysis to compare an observed variance with a
theoretical variance. Since it is a non-parametric test, it is used to determine whether categorical data shows
dependency or whether two classifications are independent. It is also helpful for comparing theoretical
populations with actual data, especially when categories are involved.
The chi-square test is widely applied to:

1. Test the goodness of fit


2. Test the significance of association between two attributes
3. Test homogeneity or significance of population variance

 It examines whether the experimentally observed frequency distribution deviates significantly from a
proposed theoretical frequency distribution, making it an important tool in the analysis of
frequencies.
 The test compares observed results with what is expected theoretically under a hypothesis. The
formula for χ² is:

Where:

 fₒ = observed frequency
 fₑ = expected frequency

A smaller χ² indicates close agreement between observation and theory, whereas a larger χ² suggests a real
divergence.
The degrees of freedom (df) are calculated as:

Logic of the Chi-Square (χ²) Test


The χ² test is used to determine how well the observed distribution fits an assumed theoretical distribution.
It measures the divergence between actual and expected frequencies. In sampling studies, small differences
between observed and expected values may occur due to chance, and the χ² test helps identify whether these
differences are significant or can be ignored.
If there is no difference between observed and expected frequencies, χ² = 0, meaning perfect agreement. Hence,
the χ² test evaluates the discrepancy between theory and observation.

Characteristics of the Chi-Square (χ²) Test


 It is based on events or frequencies, not on parameters like mean or standard
deviation.
 It is mainly used for drawing inferences, especially in hypothesis testing, but
not for estimation.
 It can be applied to the entire set of observed and expected frequencies.
 With every increase in degrees of freedom, a new χ² distribution is formed.
 It can be applied to complex contingency tables with several classes, making it
highly useful in research.
 It is a non-parametric test, requiring no strict assumptions about population
type, parameter values, or complex mathematics.

Assumptions of the Chi-Square (χ²) Test


 No assumption of normality of the population distribution is required.
 No pre-computed statistic or parameter estimate is used in its calculation.
 It is applicable to very small samples.
 It can be used for discrete, nominal, or ordinal variables.
 It determines whether an association between two categorical variables in a
sample reflects a real association in the population.
 It is used with frequency data, or data that can be converted into frequencies
(including proportions or probabilities).

Chi-Square (χ²) Goodness-of-Fit Test


The goodness-of-fit test checks how well an assumed theoretical distribution fits observed data. After fitting
a theoretical distribution to given data, the χ² value helps determine whether the fit is good.

 If the calculated χ² is less than the table value at a chosen significance level,
the fit is considered good, meaning the differences are due to sampling
fluctuations.
 If the calculated χ² is greater than the table value, the fit is not good, indicating
significant deviation from the expected distribution.

This test compares the observed frequencies (fₒ) with expected frequencies (fₑ) based on theoretical
distributions like normal, binomial, Mendelian ratios, or equal probability distributions.

The χ² computed from (fₒ – fₑ) values is significant if it equals or exceeds the critical value; in such cases, the
observed distribution differs significantly from the proposed distribution. If not significant, the observed
frequencies fit the proposed distribution.

The classical formula based on (fₒ – fₑ) is used here.


The alternative formula (used for contingency tables) cannot be used as no contingency table is involved.
Yates’ correction must be applied when expected frequency in any class is below 5 and degrees of freedom
= 1.

STUDENT’S T-TEST
A t-test is a statistical hypothesis test in which the test statistic follows the Student’s t-distribution under the
null hypothesis.

It is mainly used when the test statistic would normally follow a normal distribution, but the scaling term
(usually the standard deviation) is unknown and must be estimated from sample data. Under these conditions,
the resulting statistic follows a Student’s t-distribution.

The t-test is commonly used to determine whether two sets of data differ significantly from each other.

History of the t-test


The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist at the Guinness brewery in Dublin.
He published his work under the pseudonym “Student” because company policy did not allow chemists to
publish research under their own names. Gosset developed the t-test as an economical way to monitor stout
quality.

During 1906–1907, he worked in Karl Pearson’s Biometric Laboratory at University College London, where
his identity became known to Pearson and other statisticians. His work was later published in the journal
Biometrika.

Uses of Student’s t-test


The most common types of t-tests include:

1. One-sample t-test
Used to test whether the mean of a population equals a value specified in the null hypothesis.

2. Two-sample (independent) t-test


Used to test whether the means of two populations are equal.
This version assumes the variances of both populations are equal. When this assumption is not made, the
Welch’s t-test is used.
These are also known as unpaired or independent samples t-tests, used when the two samples do not
overlap.

3. Paired (dependent) t-test


Used when two responses are measured on the same statistical unit, and the test checks whether the mean
difference is zero.
Example: measuring tumor size before and after treatment in the same patients.

4. Test for regression slope


Used to test whether the slope of a regression line is significantly different from zero.
Assumptions of the t-test
Most t-test statistics follow the form:

Where:

 Z is a function of the data and is sensitive to the alternative hypothesis


 s is a scaling parameter, allowing the distribution of t to be determined

For a one-sample t-test, the formula is:

Where:

 = sample mean
 n = sample size
 μ = population mean
 σ = population standard deviation
 s = standard error of the mean

Unpaired and Paired Two-sample t-tests


1. Independent (Unpaired) Samples
Used when two separate, independent samples are collected from two populations.
Example: 100 subjects are selected, and 50 each are assigned to a treatment group and a control group.
Randomization is not essential. For example, comparing mean age across genders using phone-survey data
also uses an independent t-test.

2. Paired Samples
Paired t-tests use dependent samples, where the same statistical unit provides two related measurements.
Paired tests act as a form of blocking, increasing power when paired units share similar “noise factors.”
They also help reduce the effects of confounding factors in observational studies.

Types of t-tests
Test Purpose Example

1-sample Tests whether the mean of one population Is the mean height of female
t-test equals a target value college students > 5.5 ft?

Tests whether the difference between Do male and female college


2-sample
means of two independent populations students differ significantly in
t-test
equals a target value height?

Paired t- Tests whether the mean of differences Is weight loss significant before
Test Purpose Example

between paired observations equals a target


test vs. after a weight-loss pill?
value

You might also like