0% found this document useful (0 votes)

26 views30 pages

Chi-Square Goodness-of-Fit Analysis

This document discusses categorical data analysis and nonparametric statistics. It introduces goodness-of-fit tests to determine if sample data fits a hypothesized distribution. The chi-square goodness-of-fit test is demonstrated using an example of call volume by day of the week. Population parameters may be estimated from the data. The Jarque-Bera test can test if data fits a normal distribution based on sample skewness and kurtosis. Contingency tables are used to classify observations by two attributes and form the basis for tests of independence.

Uploaded by

emina.hajdarevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views30 pages

Chi-Square Goodness-of-Fit Analysis

Uploaded by

emina.hajdarevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

IBU Analysis of

International Burch University

Categorical Data
Introduction

◼ Nonparametric Statistics
◼ Fewer restrictive assumptions about data
levels and underlying probability distributions
◼ Population distributions may be skewed
◼ The level of data measurement may only be
ordinal or nominal
Goodness-of-Fit Tests:
Specified Probabilities

◼ Does sample data conform to a hypothesized

distribution?
◼ Examples:

◼ Do sample results conform to specified expected

probabilities?
◼ Are technical support calls equal across all days of
the week? (i.e., do calls follow a uniform
distribution?)
◼ Do measurements from a production process
follow a normal distribution?
Chi-Square Goodness-of-Fit Test
(continued)
◼ Are technical support calls equal across all days of the
week? (i.e., do calls follow a uniform distribution?)
◼ Sample data for 10 days per day of week:

Sum of calls for this day:

Monday 290
Tuesday 250
Wednesday 238
Thursday 257
Friday 265
Saturday 230
Sunday 192

 = 1722
Logic of Goodness-of-Fit Test

◼ If calls are uniformly distributed, the 1722 calls

would be expected to be equally divided across
the 7 days:
1722
= 246 expected calls per day if uniform
7
◼ Chi-Square Goodness-of-Fit Test: test to see if
the sample results are consistent with the
expected results
Observed vs. Expected
Frequencies
Observed Expected
Oi Ei
Monday 290 246
Tuesday 250 246
Wednesday 238 246
Thursday 257 246
Friday 265 246
Saturday 230 246
Sunday 192 246
TOTAL 1722 1722
Chi-Square Test Statistic
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform

◼ The test statistic is

K
(O − E ) 2
2 =  i i
(where d.f. = K − 1)
i=1 Ei
where:
K = number of categories
Oi = observed frequency for category i
Ei = expected frequency for category i
The Rejection Region
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform

K
(O − E ) 2
2 =  i i

i=1 Ei

◼ Reject H0 if  
2 2
α


(with k – 1 degrees
of freedom) 0 2
Do not Reject H0
reject H0 2 
Observed vs. Expected
Frequencies
Observed Expected
Oi Ei (Oi – Ei) (Oi – Ei)2 (Oi – Ei)2/Ei
Monday 290 246 44 1936 7.870
Tuesday 250 246 4 16 0.065
Wednesday 238 246 -8 64 0.260
Thursday 257 246 11 121 0.492
Friday 265 246 19 361 1.467
Saturday 230 246 -16 256 1.041
Sunday 192 246 -54 2916 11.854

TOTAL 1722 1722 2 = 23.049

Chi-Square Test Statistic
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform

(290 − 246) 2
(250 − 246) 2
(192 − 246) 2
2 = + + ... + = 23.049
246 246 246
K – 1 = 6 (7 days of the week) so
use 6 degrees of freedom:
2.05 = 12.5916
 = .05
Conclusion:
2 = 23.05 > 2 = 12.5916 so
reject H0 and conclude that the 0 2
Do not Reject H0
distribution is not uniform reject H0

2.05 = 12.5916
Goodness-of-Fit Tests:
Population Parameters Unknown

Idea:
◼ Test whether data follow a specified distribution

(such as binomial, Poisson, or normal) . . .

◼ . . . without assuming the parameters of the

distribution are known

◼ Use sample data to estimate the unknown

population parameters
Goodness-of-Fit Tests:
Population Parameters Unknown
(continued)
◼ Suppose that a null hypothesis specifies category
probabilities that depend on the estimation (from the
data) of m unknown population parameters
◼ The appropriate goodness-of-fit test is the same as in
the previously section . . .
K
(Oi − Ei )2
 =
2

i=1 Ei
◼ . . . except that the number of degrees of freedom for
the chi-square random variable is
Degrees of Freedom = (K − m − 1)
◼ Where K is the number of categories
Test of Normality

◼ The assumption that data follow a normal

distribution is common in statistics

◼ Evidence of normality was assessed in prior

chapters
(for example, with normal probability plots in
Chapter 5)
◼ Here, a chi-square test is developed
Test of Normality
(continued)

◼ Two population parameters can be estimated using

sample data:
n

 i
(x − x) 3

Skewness = i−1
ns3
n

 i
(x − x) 4

Kurtosis = i−1
ns 4
◼ For a normal distribution,
Skewness = 0
Kurtosis = 3
Jarque-Bera
Test for Normality
◼ Consider the null hypothesis that the population
distribution is normal
◼ The Jarque-Bera Test for Normality is based on the closeness the
sample skewness to 0 and the sample kurtosis to 3
◼ The test statistic is

 (Skewness) 2 (Kurtosis − 3)2 

JB = n + 
 6 24 
◼ as the number of sample observations becomes very large, this
statistic has a chi-square distribution with 2 degrees of freedom
◼ The null hypothesis is rejected for large values of the test statistic
Jarque-Bera
Test for Normality
(continued)
◼ The chi-square approximation is close only for very
large sample sizes
◼ The test statistic is compared to significance points from
text Table 14.9
Sample 10% 5% Sample 10% 5%
size n point point size n point point
20 2.13 3.26 200 3.48 4.43
30 2.49 3.71 250 3.54 4.61
40 2.70 3.99 300 3.68 4.60
50 2.90 4.26 400 3.76 4.74
75 3.09 4.27 500 3.91 4.82
100 3.14 4.29 800 4.32 5.46
125 3.31 4.34 ∞ 4.61 5.99
150 3.43 4.39
Example: Jarque-Bera
Test for Normality
◼ The average daily temperature has been recorded for
200 randomly selected days, with sample skewness
0.232 and kurtosis 3.319
◼ Test the null hypothesis that the true distribution is
normal
 (Skewness) 2 (Kurtosis − 3)2   (0.232)2 (3.319 − 3)2 
JB = n +  = 200  +  = 2.642
 6 24   6 24 

◼ From Table 14.9 the 10% critical value for n = 200 is

3.48, so there is not sufficient evidence to reject that the
population is normal
Contingency Tables

Contingency Tables
◼ Used to classify sample observations according
to a pair of attributes
◼ Also called a cross-classification or cross-
tabulation table
◼ Assume r categories for attribute A and c
categories for attribute B
◼ Then there are (r x c) possible cross-classifications
r x c Contingency Table

Attribute B

Attribute A 1 2 ... c Totals

1 O11 O12 … O1c R1

2 O21 O22 … O2c R2
. . . … . .
. . . … . .
. . . … . .
r Or1 Or2 … Orc Rr
Totals C1 C2 … Cc n
Test for Association
◼ Consider n observations tabulated in an r x c
contingency table
◼ Denote by Oij the number of observations in
the cell that is in the ith row and the jth column
◼ The null hypothesis is
H0 : No association exists
between the two attributes in the population
◼ The appropriate test is a chi-square test with
(r-1)(c-1) degrees of freedom
Test for Association
(continued)
◼ Let Ri and Cj be the row and column totals
◼ The expected number of observations in cell row i and
column j, given that H0 is true, is

R iC j
Eij =
n
◼ A test of association at a significance level  is based
on the chi-square distribution and the following decision
rule
r c (Oij − Eij )2
Reject H0 if χ 2 =   χ (r2 −1)(c −1),α
i=1 j=1 Eij
Contingency Table Example

Left-Handed vs. Gender

▪ Dominant Hand: Left vs. Right
▪ Gender: Male vs. Female

H0: There is no association between

hand preference and gender
H1: Hand preference is not independent of gender
Contingency Table Example
(continued)

Sample results organized in a contingency table:

Hand Preference
sample size = n = 300:
Gender Left Right
120 Females, 12
were left handed Female 12 108 120
180 Males, 24 were
left handed Male 24 156 180

36 264 300
Logic of the Test
H0: There is no association between
hand preference and gender
H1: Hand preference is not independent of gender

◼ If H0 is true, then the proportion of left-handed females

should be the same as the proportion of left-handed
males
◼ The two proportions above should be the same as the
proportion of left-handed people overall
Finding Expected Frequencies
120 Females, 12 Overall:
were left handed
180 Males, 24 were P(Left Handed)
left handed = 36/300 = .12
If no association, then
P(Left Handed | Female) = P(Left Handed | Male) = .12

So we would expect 12% of the 120 females and 12% of the 180
males to be left handed…

i.e., we would expect (120)(.12) = 14.4 females to be left handed

(180)(.12) = 21.6 males to be left handed
Expected Cell Frequencies
(continued)

◼ Expected cell frequencies:

th th
R iC j
(i Row total)(j Column total)
Eij = =
n Total sample size

Example:
(120)(36)
E11 = = 14.4
300
Observed vs. Expected
Frequencies

Observed frequencies vs. expected frequencies:

Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4

36 264 300
The Chi-Square Test Statistic

The Chi-square test statistic is:

r c (Oij − Eij )2
 = 
2
with d.f . = (r − 1)(c − 1)
i=1 j=1 Eij

◼ where:
Oij = observed frequency in cell (i, j)
Eij = expected frequency in cell (i, j)
r = number of rows
c = number of columns
Observed vs. Expected
Frequencies
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4

36 264 300

(12 − 14.4)2 (108 − 105.6)2 (24 − 21.6)2 (156 − 158.4)2

 =
2
+ + + = 0.7576
14.4 105.6 21.6 158.4
Contingency Analysis

 2 = 0.7576 with d.f. = (r - 1)(c - 1) = (1)(1) = 1

Decision Rule:
If 2 > 3.841, reject H0,
otherwise, do not reject H0
Here, 2 = 0.7576
 = 0.05 < 3.841, so we
do not reject H0
2.05 = 3.841  and conclude that
gender and hand
Do not reject H0 Reject H0 preference are not
associated

Chi-Square Tests for Goodness of Fit
No ratings yet
Chi-Square Tests for Goodness of Fit
32 pages
Chi-Square Test in Business Analysis
No ratings yet
Chi-Square Test in Business Analysis
28 pages
Chi-Square Tests in Statistics II
No ratings yet
Chi-Square Tests in Statistics II
23 pages
Chi-Square Goodness-of-Fit Tests
No ratings yet
Chi-Square Goodness-of-Fit Tests
30 pages
Chi-Square Tests and Contingency Analysis
No ratings yet
Chi-Square Tests and Contingency Analysis
26 pages
Understanding Chi Square Method and Tests
No ratings yet
Understanding Chi Square Method and Tests
34 pages
Understanding Non-Parametric Tests
No ratings yet
Understanding Non-Parametric Tests
102 pages
Chi-Square Test Fundamentals
No ratings yet
Chi-Square Test Fundamentals
40 pages
Chi-Squared Test for Independence Guide
No ratings yet
Chi-Squared Test for Independence Guide
100 pages
Chi-Square Tests for Qualitative Data
No ratings yet
Chi-Square Tests for Qualitative Data
13 pages
Chi-Square Test for Population Proportions
No ratings yet
Chi-Square Test for Population Proportions
18 pages
Chi-Square Test Overview and Examples
100% (2)
Chi-Square Test Overview and Examples
75 pages
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
No ratings yet
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
37 pages
Chi-Square Test: Concepts and Examples
No ratings yet
Chi-Square Test: Concepts and Examples
75 pages
Understanding Chi-Square Tests in Statistics
No ratings yet
Understanding Chi-Square Tests in Statistics
45 pages
Chi-Square Test on Fruit Fly Traits
No ratings yet
Chi-Square Test on Fruit Fly Traits
5 pages
Overview of Chi-Square Test in Statistics
No ratings yet
Overview of Chi-Square Test in Statistics
23 pages
Understanding Chi-Square Distribution
No ratings yet
Understanding Chi-Square Distribution
35 pages
Understanding Chi-Square Tests
No ratings yet
Understanding Chi-Square Tests
31 pages
Chi-Square Test and Analysis Guide
No ratings yet
Chi-Square Test and Analysis Guide
16 pages
Chi-Square Test Overview and Rules
No ratings yet
Chi-Square Test Overview and Rules
14 pages
Understanding Chi-Square Tests
No ratings yet
Understanding Chi-Square Tests
82 pages
Chi-Square Statistics Review Guide
No ratings yet
Chi-Square Statistics Review Guide
3 pages
Chi-Square Tests in Statistics
No ratings yet
Chi-Square Tests in Statistics
128 pages
Non-Parametric Tests Overview
No ratings yet
Non-Parametric Tests Overview
25 pages
Chi-Square Test Overview and Applications
No ratings yet
Chi-Square Test Overview and Applications
22 pages
Non-Parametric Tests Overview
No ratings yet
Non-Parametric Tests Overview
22 pages
Chi-Square Test in Applied Statistics
No ratings yet
Chi-Square Test in Applied Statistics
16 pages
Chi-Square Test for Independence
No ratings yet
Chi-Square Test for Independence
9 pages
Cross-Tabulation and Chi-Square Analysis
No ratings yet
Cross-Tabulation and Chi-Square Analysis
33 pages
Chi-Square Goodness of Fit Test Guide
No ratings yet
Chi-Square Goodness of Fit Test Guide
15 pages
Chi-Square Test in Social Statistics
No ratings yet
Chi-Square Test in Social Statistics
16 pages
Understanding the Chi-Square Test
No ratings yet
Understanding the Chi-Square Test
32 pages
Hunting Rifle Purchase Month Analysis
No ratings yet
Hunting Rifle Purchase Month Analysis
10 pages
Understanding Chi-Square Tests
No ratings yet
Understanding Chi-Square Tests
23 pages
Chi-Square Goodness of Fit Test Guide
No ratings yet
Chi-Square Goodness of Fit Test Guide
6 pages
Understanding the Chi-Square Test
No ratings yet
Understanding the Chi-Square Test
5 pages
Understanding the Chi-Square Distribution
No ratings yet
Understanding the Chi-Square Distribution
6 pages
Chi-Square Distribution Overview
No ratings yet
Chi-Square Distribution Overview
13 pages
Categorical Data Analysis Techniques
No ratings yet
Categorical Data Analysis Techniques
22 pages
Chi-Square Tests for Categorical Data
No ratings yet
Chi-Square Tests for Categorical Data
49 pages
Goodness of Fit & Chi-Square Tests
No ratings yet
Goodness of Fit & Chi-Square Tests
31 pages
Chi-Square Hypothesis Testing Guide
No ratings yet
Chi-Square Hypothesis Testing Guide
34 pages
Chi-Square Tests in Statistics Lecture
No ratings yet
Chi-Square Tests in Statistics Lecture
128 pages
Chi-Square Test in Managerial Statistics
No ratings yet
Chi-Square Test in Managerial Statistics
20 pages
Chi Square Lesson
No ratings yet
Chi Square Lesson
11 pages
Chi-Square Test Overview and Applications
No ratings yet
Chi-Square Test Overview and Applications
32 pages
Chi-Square Test in Zoology Seminar
No ratings yet
Chi-Square Test in Zoology Seminar
16 pages
Understanding Chi Square Tests
No ratings yet
Understanding Chi Square Tests
30 pages
Categorical Data Analysis in STAT1371
No ratings yet
Categorical Data Analysis in STAT1371
41 pages
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
No ratings yet
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
4 pages
Chi-Square Goodness-of-Fit Overview
No ratings yet
Chi-Square Goodness-of-Fit Overview
36 pages
Chi-Square Goodness-of-Fit Analysis
No ratings yet
Chi-Square Goodness-of-Fit Analysis
28 pages
Chi-Square Test Applications in Statistics
No ratings yet
Chi-Square Test Applications in Statistics
35 pages
Chi-Square Test Hypothesis Explained
No ratings yet
Chi-Square Test Hypothesis Explained
5 pages
Chi-Square Test in Business Research
No ratings yet
Chi-Square Test in Business Research
31 pages
Chi-Square Goodness of Fit Analysis
No ratings yet
Chi-Square Goodness of Fit Analysis
9 pages
Business Mathematics & Statistics Exam Paper
No ratings yet
Business Mathematics & Statistics Exam Paper
16 pages
Univariate Descriptive Statistics Overview
No ratings yet
Univariate Descriptive Statistics Overview
12 pages
Seasonality Patterns in Tanker Spot Frei PDF
No ratings yet
Seasonality Patterns in Tanker Spot Frei PDF
36 pages
Mediation and Moderation in PLS-SEM
No ratings yet
Mediation and Moderation in PLS-SEM
43 pages
Residual Plot Analysis in Regression
No ratings yet
Residual Plot Analysis in Regression
117 pages
Kwame Nkrumah University of Science & Technology, Kumasi STAT 253: Engineering Jonathan Kweku Afriyie (PH.D)
No ratings yet
Kwame Nkrumah University of Science & Technology, Kumasi STAT 253: Engineering Jonathan Kweku Afriyie (PH.D)
3 pages
Joint Density and Sampling Concepts
100% (1)
Joint Density and Sampling Concepts
35 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
22 pages
Intermittent Demand Forecasting Guide
100% (27)
Intermittent Demand Forecasting Guide
15 pages
Estimation and Hypothesis Testing Overview
No ratings yet
Estimation and Hypothesis Testing Overview
10 pages
Joint Persistence in SWedge Analysis
No ratings yet
Joint Persistence in SWedge Analysis
16 pages
Estimation Theory in Statistical Inference
No ratings yet
Estimation Theory in Statistical Inference
39 pages
Data Analytics in Non-Life Insurance
No ratings yet
Data Analytics in Non-Life Insurance
249 pages
Data Analysis by Buddhannada Banerjee
No ratings yet
Data Analysis by Buddhannada Banerjee
11 pages
Analisis Hubungan Visus dan Game
100% (1)
Analisis Hubungan Visus dan Game
13 pages
Quantitative Techniques Test Bank
No ratings yet
Quantitative Techniques Test Bank
12 pages
Econometrics Notes - University of Utah (370 Pages)
No ratings yet
Econometrics Notes - University of Utah (370 Pages)
370 pages
Lagrange Multiplier Test Explained
No ratings yet
Lagrange Multiplier Test Explained
6 pages
Testbank For Basic Statistics For Business and Economics Canadian 6th Edition Lind
100% (2)
Testbank For Basic Statistics For Business and Economics Canadian 6th Edition Lind
210 pages
Bayesian Framework for Soft Soil Settlement
No ratings yet
Bayesian Framework for Soft Soil Settlement
9 pages
Mixing Techniques Impact on Cement Strength
No ratings yet
Mixing Techniques Impact on Cement Strength
11 pages
Sampling Distributions and Estimation Guide
No ratings yet
Sampling Distributions and Estimation Guide
302 pages
Comprehensive Guide to Statistics Concepts
No ratings yet
Comprehensive Guide to Statistics Concepts
4 pages
PowerMap User Manual for Neuroimaging
No ratings yet
PowerMap User Manual for Neuroimaging
12 pages
WJEC A Level Maths Formula Booklet
No ratings yet
WJEC A Level Maths Formula Booklet
11 pages
Understanding User Input in LUIS
No ratings yet
Understanding User Input in LUIS
14 pages
H1 Math Syllabus 2023 Overview
No ratings yet
H1 Math Syllabus 2023 Overview
4 pages
Scan 0001
No ratings yet
Scan 0001
1 page
One-Way ANOVA Procedure and Examples
No ratings yet
One-Way ANOVA Procedure and Examples
25 pages
Edward: A Probabilistic Programming Overview
No ratings yet
Edward: A Probabilistic Programming Overview
34 pages

Chi-Square Goodness-of-Fit Analysis

Uploaded by

Chi-Square Goodness-of-Fit Analysis

Uploaded by

IBU Analysis of

International Burch University

◼ Does sample data conform to a hypothesized

◼ Do sample results conform to specified expected

Sum of calls for this day:

◼ If calls are uniformly distributed, the 1722 calls

◼ The test statistic is

TOTAL 1722 1722 2 = 23.049

(such as binomial, Poisson, or normal) . . .

◼ . . . without assuming the parameters of the

◼ Use sample data to estimate the unknown

◼ The assumption that data follow a normal

◼ Evidence of normality was assessed in prior

◼ Two population parameters can be estimated using

 (Skewness) 2 (Kurtosis − 3)2 

◼ From Table 14.9 the 10% critical value for n = 200 is

Attribute A 1 2 ... c Totals

1 O11 O12 … O1c R1

Left-Handed vs. Gender

H0: There is no association between

Sample results organized in a contingency table:

◼ If H0 is true, then the proportion of left-handed females

i.e., we would expect (120)(.12) = 14.4 females to be left handed

◼ Expected cell frequencies:

Observed frequencies vs. expected frequencies:

The Chi-square test statistic is:

(12 − 14.4)2 (108 − 105.6)2 (24 − 21.6)2 (156 − 158.4)2

 2 = 0.7576 with d.f. = (r - 1)(c - 1) = (1)(1) = 1

You might also like