Descriptive Statistics Overview
Descriptive Statistics Overview
Engineers
Sixth Edition
Douglas C. Montgomery George C. Runger
Chapter 6
Descriptive Statistics
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Numerical Summaries of Data
• Statistics id the science of data.
• An important aspect of dealing with data is
organizing and summarizing the data in
ways that facilitate its interpretations and
subsequent analysis (descriptive stats).
x i
12.6 12.9 ... 13.1
i
1
xi
12.6
x average i 1
2 12.9
8 8 3 13.4
104 4 12.3
13.0 pounds 5 13.6
8 6 13.5
7 12.6
8 13.1
13.00
= AVERAGE($B2:$B9)
Table 6‐1
i 1
x xi
2
i
i 1
n i xi xi2
s
2 1 12.6 158.76
n 1 2 12.9 166.41
3 13.4 179.56
1,353.60 104.0 8
2 4 12.3 151.29
5 13.6 184.96
7 6 13.5 182.25
7 12.6 158.76
1.60 8 13.1 171.61
0.2286 pounds 2 sums = 104.0 1,353.60
7
Value of indexed
f Index item quartile
i th (i+1)th
0.25 20.25 143 144 143.25
0.50 40.50 160 163 161.50
0.75 60.75 181 181 181.00
Starting point = 70
(b) Symmetric distribution has identical mean, median and mode measures.
Figure 6-17 A digidot plot of the compressive strength data in Table 6-2.
In order to find a
relationship between two
variables
60
50
40
30
20
10
1
150 175 200 225 250
Battery Life (x) in Hours
Sixth Edition
Douglas C. Montgomery George C. Runger
Chapter 7
Point Estimation of Parameters and Sampling Distributions
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Point Estimation
• A point estimate is a reasonable value of a
population parameter.
• X1, X2,…, Xn are random variables.
• Functions of these random variables, x-bar
and s2, are also random variables called
statistics.
• Statistics have their unique distributions
which are called sampling distributions.
25 30 29 31
x 28.75
4
2
13
12 12
2
13 1
X2 Figure 7‐5 The distribution of
n 40 120
X and X for Example 7‐2.
Sec 7-2 Sampling Distributions and the Central Limit Theorem 45
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Sampling Distribution of a Difference in Sample Means
• If we have two independent populations with means μ1
and μ2, and variances σ12 and σ22, and
• If X-bar1 and X-bar2 are the sample means of two
independent random samples of sizes n1 and n2 from
these populations:
• Then the sampling distribution of:
• Moments
• Maximum likelihood
• Bayesian Estimation
2 1 n 2
X i
2
n n iX
i 1
X i X 2 i 1
n i 1 n
n
2
n
X X
2
n Xi
1 i
X i2 i 1 i 1
(biased)
n i 1 n n
Sec 7-4.1 Method of Moments 60
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Example 7-9: Gamma Distribution Moment Estimators-1
Suppose that X1, X2, …, Xn is a random sample from a gamma distribution with
parameter r and λ where E(X) = r/ λ and E(X2) = r(r+1)/ λ2 .
r
E X X is the mean
E X 2 E X is the variance or
r 2
2
r r 1
E X 2 and now solving for r and :
2
X2
r n
1/ n Xi2 X 2
i 1
X
n
1/ n Xi2 X 2
i 1
2
Using the time to failure data in xi
11.96
xi
143.0416
the table. We can estimate the 5.03 25.3009
67.40 4542.7600
parameters of the gamma 16.07 258.2449
31.50 992.2500
distribution. x‐bar = 21.646 7.73 59.7529
11.10 123.2100
2
ΣX = 6645.4247 22.38 500.8644
X2 21.6462
r 1.29
n
1/ n X i2 X 2 1 8 6645.4247 21.646 2
i 1
X 21.646
0.0598
n
1/ n X i2 X 2 1 8 6645.4247 21.646 2
i 1
Sec 7-4.1 Method of Moments 62
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Maximum Likelihood Estimators
• Suppose that X is a random variable with
probability distribution f(x;θ), where θ is a single
unknown parameter. Let x1, x2, …, xn be the
observed values in a random sample of size n.
Then the likelihood function of the sample is:
n
n xi n
xi
p xi 1 p 1 p
1 xi n
p i1 i 1
i 1
n n
ln L p xi ln p n xi ln 1 p
i 1 i 1
n
n
d ln L p
xi
n i 1
xi
i 1
dp p 1 p
L e i
i1 2
n
1
xi
2
1 2
e2 i1
2 2 n2
n 1 n
ln L ln 2 2 xi
2 2
2 2 i1
d ln L 1 n
2 xi
d i1
i 1
n
ln L n ln xi
i 1
d ln L n n
xi
d i 1
n
1
xi
2
1 2
e 2 i 1
2 2 n2
n 1
ln L , ln 2 2 2
n
xi
2 2
2 2 i 1
ln L , 2 1 n
2
x 0
i 1
i
ln L , 2 n 1 n
xi
2
0
2
2 2 2 4 i 1
xi X
2
2
X and i 1
Notes:
• Mathematical statisticians will often prefer MLEs because of
these properties. Properties (1) and (2) state that MLEs are
MVUEs.
• To use MLEs, the distribution of the population must be known
or assumed.
i 1 r
n n
nr ln r 1 ln xi n ln r xi
i 1 i 1
ln L r , n
'r
n ln ln xi n
r i 1 r
ln L r , nr n
xi
i 1
Equating the above derivative to zero we get
Figure 7‐11 Log likelihood for the gamma distribution using the failure
time data. (a) Log likelihood surface. (b) Contour plot.
f x1 , x2 ,..., xn ,θ
f θ | x1 , x2 ,..., xn
f x1 , x2 ,..., xn
Let X1, X2, …, Xn be a random sample from a normal distribution unknown mean μ and
known variance σ2. Assume that the prior distribution for μ is:
1 1
2 2 0 02 2 02
f μ
0
2
2 02
e e
2 0 2 2
0
( x )
n
2 2
1 2
1
f x1 , x2 ,..., xn |
i
i 1
e
2 2 n2
n 2
n
1 1 2 2
xi 2
xi n 2
e i 1 i 1
2 2 n2
2 2 n2
2 0
1
1/2 2 2 2
1 2 0 x h ( x , x ,..., x , 2 , , 2 )
2 2 1 1 2 n 0 0
0 0
e n n
0 n
1 2 2
1 1
V 2 2 2
0 n 0 2
n
To illustrate:
– The parameters are: μ0 = 0, σ02= 1
– Sample: n = 10, x-bar = 0.75, σ2 = 4
n 2
0 2
0 x
02 2 n
4 10 0 1 0.75
0.536
1 4 10
Sixth Edition
Douglas C. Montgomery George C. Runger
Chapter 8
Statistical Intervals for a Single Sample
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
8-1.1 Confidence Interval and its Properties
l ≤ ≤ u,
The required quantities are zα/2 = z0.025 = 1.96, n = 10, = l, and x 64.46 .
2
z
n (8-2)
E
x z / n (8-3)
x z / n l (8-4)
The same data for impact testing from Example 8-1 are used
to construct a lower, one-sided 95% confidence interval for
the mean impact energy.
Recall that zα = 1.64, n = 10, = l, and x 64.46 .
x z
n
10
Sec 8-1 Confidence Interval on the Mean of a Normal, σ2 Known 87
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
8-1.4 A Large-Sample Confidence Interval for
Because n > 40, the assumption of normality is not necessary to use in Equation
8-5. The required values are n 53, x 0.5250, s 0.3486 , and z0.025 = 1.96.
ˆ z /2 ˆ ˆ z /2 ˆ
X
T (8-6)
S/ n
x
If and s are the mean and standard deviation of a random sample
from a normal distribution with unknown variance 2, a 100(1 )
confidence interval on is given by
where t2,n1 the upper 1002 percentage point of the t distribution with
n 1 degrees of freedom.
The resulting CI is
x t /2, n 1 s / n x t /2, n 1 s / n
13 . 71 2 . 080 ( 3 . 55 ) / 22 13 . 71 2 . 080 3 . 55 / 22
13 . 71 1 . 57 13 . 71 1 . 57
12 . 14 15 . 28
Interpretation: The CI is fairly wide because there is a lot of variability in the
measurements. A larger sample size would have led to a shorter interval.
2
n 1 S 2
X
2
(8-8)
( n 1) s 2 ( n 1) s 2 (8-9)
n 1 n 1
where
and
n 1 n 1
are the upper and lower
100/2 percentage points of the chi-square distribution with
n – 1 degrees of freedom, respectively.
(n 1)s 2 ( n 1) s 2
and (8-10)
n 1 n 1
n 1 s 2
2
2 2 0 1 0 .0 1 5 3
1 0 .1 1 7
2 0 .0 2 8 7
A confidence interval on the standard deviation can be obtained by taking the square
root on both sides, resulting in
0.17
Sec 8-3 Confidence Interval on σ2 & σ of a Normal Distribution 98
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
8-4 A Large-Sample Confidence Interval For a Population Proportion
X np Pˆ p
Z
np (1 p ) p (1 p )
n
The quantity p(1 p) /isncalled the standard error of the point estimator . P̂
pˆ (1 pˆ ) pˆ (1 pˆ )
pˆ z p pˆ z (8-11)
n n
A point estimate of the proportion of bearings in the population that exceeds the
roughness specification is pˆ x / n 10 / 85 0 ..12
pˆ 1 pˆ pˆ 1 pˆ
pˆ z0.025 p pˆ z0.025
n n
0.12 0.88 0.12 0.88
0.12 1.96 p 0.12 1.96
85 85
0.0509 p 0.2243
Interpretation: This is a wide CI. Although the sample size does not appear to
be small (n = 85), the value of is fairly small, which leads to a large standard
error for contributing to the wide CI.
2
z
n 0 .25 (8‐13)
E
pˆ 1 pˆ pˆ1 pˆ (8-14)
pˆ z p and p pˆ z
n n
respectively.
Interpretation: The two CIs would agree more closely if the sample size were
larger.
1 1
x t n 1 s 1 X n 1 x t n 1 s 1 (8-15)
n n
The prediction interval for Xn+1 will always be longer than the confidence
interval for .
1 1
x t n 1 s 1 X n 1 x t n 1 s 1
n n
1 1
13 .71 2 .080 3 .55 1 X 23 13 .71 2 .080 3 .55 1
22 22
6 .16 X 23 21 .26
Interpretation: The prediction interval is considerably longer than the CI. This is
because the CI is an estimate of a parameter, but the PI is an interval estimate of a
single future observation.
x ks, x ks
From Appendix Table XII, the tolerance factor k for n = 22, = 0.90, and 95%
confidence is k = 2.264.
x ks , x ks
[1 3 . 7 1 2 .2 6 4 3 .5 5 , 1 3 .7 1 2 .2 6 4 3 .5 5 ]
(5 .6 7 , 2 1 .7 4 )
Interpretation: We can be 95% confident that at least 90% of the values of load at
failure for this particular alloy lie between 5.67 and 21.74.
Sixth Edition
Douglas C. Montgomery George C. Runger
Chapter 9
Tests of Hypotheses for a Single Sample
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-1 Hypothesis Testing
9-1.1 Statistical Hypotheses
A statistical hypothesis is a statement about the parameters of one or
more populations.
Let H0 : μ = 50 centimeters per second and H1 : μ 50 centimeters per second
The statement H0 : μ = 50 is called the null hypothesis.
The statement H1 : μ 50 is called the alternative hypothesis.
Figure 9-1 Decision criteria for testing H0: = 50 centimeters per second
versus H1: 50 centimeters per second.
P ( X 48 .5 when 50 ) P ( X 51 .5 when 50 )
The z-values that correspond to the critical values 48.5 and 51.5 are
48 .5 50 51 .5 50
z1 1 .90 and z2 1 .90
0 .79 0 .79
Therefore
which implies 5.74% of all random samples would lead to rejection of the
hypothesis H0: μ = 50.
P ( 48 . 5 X 51 . 5 when 52 )
The z-values corresponding to 48.5 and 51.5 when 52 are
48.5 52 51.5 52
z1 4.43 and z2 0.63
0.79 0.79
Hence,
P(4.43 Z 0.63) P(Z 0.63) P(Z 4.43)
= 0.2643 0.0000
0.2643
which means that the probability that we will fail to reject the false
null hypothesis is 0.2643.
• For example, consider the propellant burning rate problem when we are testing
H 0 : μ = 50 centimeters per second against H 1 : μ not equal 50 centimeters per
second . Suppose that the true value of the mean is μ = 52.
H0: 0 H0: 0
H1: > 0 or H1: < 0
x 51 . 3
H0: 0
H1: 0
X 0
Z0 (9-1)
/ n
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 124
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-2 Tests on the Mean of a Normal Distribution,
Variance Known
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 125
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 9-2 Propellant Burning Rate
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 126
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 9-2 Propellant Burning Rate
4. Test statistic: The test statistic is
x 0
z 0
/ n
5. Reject H0 if: Reject H0 if the P-value is less than 0.05. The boundaries of
the critical region would be z0.025 1.96 and z0.025 1.96.
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 127
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-2 Tests on the Mean of a Normal Distribution,
Variance Known
9-2.2 Type II Error and Choice of Sample Size
Finding the Probability of Type II Error
Consider the two-sided hypotheses test H0: 0 and H1: 0
Suppose the null hypothesis is false and the true value of the mean is
0 , where 0.
X 0 X 0 n
The test statistic Z0 is Z 0
/ n / n
n n
z /2 z /2
(9-3)
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 128
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-2 Tests on the Mean of a Normal Distribution,
Variance Known
Sample Size for a Two‐Sided Test
( z / 2 z ) 2 2
n ~ where 0 (9-4)
2
( z z ) 2 2
n where 0 (9-5)
2
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 129
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 9-3 Propellant Burning Rate Type II Error
Consider the rocket propellant problem of Example 9-2. The true burning rate
is 49 centimeters per second. Find for the two-sided test with 0.05,
2, and n 25?
25 25
1.96 1 . 96
0.54 4.46 0.295
The probability is about 0.3 that the test will fail to reject the null hypothesis
when the true burning rate is 49 centimeters per second.
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 130
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 9-3 Propellant Burning Rate Type II Error - Continuation
Suppose that the analyst wishes to design the test so that if the true mean
burning rate differs from 50 centimeters per second by as much as 1 centimeter
per second, the test will detect this (i.e., reject H0: 50) with a high probability,
say, 0.90. Now, we note that 2, 51 50 1, 0.05, and 0.10.
Since z/2 z0.025 1.96 and z z0.10 1.28, the sample size required to detect
this departure from H0: 50 is found by Equation 9-4 as
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 131
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-2 Tests on the Mean of a Normal Distribution,
Variance Known
9-2.3 Large Sample Test
A test procedure for the null hypothesis H0: 0 assuming that the
population is normally distributed and that 2 known is developed. In
most practical situations, 2 will be unknown. Even, we may not be
certain that the population is normally distributed.
Sec 9-2 Tests on the Mean of a Normal Distribution, Variance Known 132
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-3 Tests on the Mean of a Normal Distribution,
Variance Unknown
9-3.1 Hypothesis Tests on the Mean
One-Sample t-Test
Consider the two‐sided hypothesis test
Test statistic: X 0
T0
S/ n
Sec 9-3 Tests on the Mean of a Normal Distribution, Variance Unknown 133
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 9-6 Golf Club Design
An experiment was performed in which 15 drivers produced by a particular club maker
were selected at random and their coefficients of restitution measured. It is of interest to
determine if there is evidence (with 0.05) to support a claim that the mean coefficient
of restitution exceeds 0.82.
The observations are:
0.8411 0.8191 0.8182 0.8125 0.8750
0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
The sample mean and sample standard deviation are x 0 . 83725 and s = 0.02456. The
objective of the experimenter is to demonstrate that the mean coefficient of restitution
exceeds 0.82, hence a one-sided alternative hypothesis is appropriate.
The seven‐step procedure for hypothesis testing is as follows:
Sec 9-3 Tests on the Mean of a Normal Distribution, Variance Unknown 135
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-3 Tests on the Mean of a Normal Distribution,
Variance Unknown
9-3.2 Type II Error and Choice of Sample Size
The type II error of the two-sided alternative would be
P ( t /2 , n 1 T 0 t /2 , n 1 | 0 )
P ( t /2 , n 1 T 0 t /2 , n 1 )
Curves are provided for two‐sided alternatives on Charts VIIe and VIIf . The abscissa scale
factor d on these charts is defined as
0
d
For the one‐sided alternative 0 or
, use0 charts VIIg and VIIh. The abscissa scale
factor d on these charts is defined as
0
d
Sec 9-3 Tests on the Mean of a Normal Distribution, Variance Unknown 136
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-3 Tests on the Mean of a Normal Distribution,
Variance Unknown
EXAMPLE 9-7 Golf Club Design Sample Size
Consider the golf club testing problem from Example 9-6. If the mean
coefficient of restitution exceeds 0.82 by as much as 0.02, is the
sample size n 15 adequate to ensure that H0: 0.82 will be rejected
with probability at least 0.8?
Sec 9-3 Tests on the Mean of a Normal Distribution, Variance Unknown 137
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and
Standard Deviation of a Normal Distribution
9-4.1 Hypothesis Test on the Variance
Suppose that we wish to test the hypothesis that the variance of a normal
population 2 equals a specified value, say , or equivalently, that the
standard deviation is equal to 0. Let X1, X2,... ,Xn be a random sample
of n observations from this population. To test
2 2
H 0 : 0
2 2 (9-6)
H 1 : 0
( n 1) S 2
X 02 (9-7)
02
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 138
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and Standard
Deviation of a Normal Distribution
2
n 1 or if 2 n 1
2 2
where n 1 and n 1 are the upper and lower 100 /2
percentage points of the chi-square distribution with n 1 degrees of
freedom, respectively. Figure 9‐17(a) shows the critical region.
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 139
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and
Standard Deviation of a Normal Distribution
9-4.1 Hypothesis Test on the Variance
The same test statistic is used for one-sided alternative hypotheses. For
the one-sided hypotheses.
H 0 : 2 02 (9-8)
H 1 : 2 02
we would reject H0 if
2
n 1, whereas for the other one-sided
hypotheses
H 0 : 2 02
(9-9)
H 1 : 2 02
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 141
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and Standard
Deviation of a Normal Distribution
EXAMPLE 9-8 Automated Filling
An automated filling machine is used to fill bottles with liquid detergent.
A random sample of 20 bottles results in a sample variance of fill
volume of s2 = 0.0153 (fluid ounces)2. If the variance of fill volume
exceeds 0.01 (fluid ounces)2, an unacceptable proportion of bottles will
be underfilled or overfilled. Is there evidence in the sample data to
suggest that the manufacturer has a problem with underfilled or
overfilled bottles? Use = 0.05, and assume that fill volume has a
normal distribution.
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 142
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and
Standard Deviation of a Normal Distribution
Example 9-8
4. Test statistic: The test statistic is
n 1s 2
02
2
5. Reject H0: Use = 0.05, and reject H0 if 30 . 14.
6. Computations:
19 (0.0153 )
29 .07
0.01
2
7. Conclusions: Since 29.07 30.14 , we conclude that
there is no strong evidence that the variance of fill volume exceeds 0.01
(fluid ounces)2. So there is no strong evidence of a problem with
incorrectly filled bottles.
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 143
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and
Standard Deviation of a Normal Distribution
0
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 144
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-4 Hypothesis Tests on the Variance and
Standard Deviation of a Normal Distribution
EXAMPLE 9‐9 Automated Filling Sample Size
Consider the bottle‐filling problem from Example 9‐8. If the variance of the filling
process exceeds 0.01 (fluid ounces)2, too many bottles will be underfilled. Thus, the
hypothesized value of the standard deviation is 0 = 0.10. Suppose that if the true
standard deviation of the filling process exceeds this value by 25%, we would like to
detect this with probability at least 0.8. Is the sample size of n = 20 adequate?
To solve this problem, note that we require
0 .125
1 .25
This is the abscissa parameter forChart
0 .10 From this chart, with n = 20 and
0VIIk.
= 1.25, we find that . Therefore, there is only about a 40% chance that the null
~ 0if.6the true standard deviation is really as large as = 0.125
hypothesis will be rejected
fluid ounce.
To reduce the ‐error, a larger sample size must be used. From the operating
characteristic curve with = 0.20 and = 1.25, we find that n = 75, approximately.
Thus, if we want the test to perform as required above, the sample size must be at
least 75 bottles.
Sec 9-4 Tests of the Variance & Standard Deviation of a Normal Distribution 145
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-5 Tests on a Population Proportion
9-5.1 Large-Sample Tests on a Proportion
H 0 : p p0
H 1 : p p0
An appropriate test statistic is
X np 0
Z0 (9-10)
np 0 (1 p 0 )
5. Reject H0 if: Reject H0: p = 0.05 if the p‐value is less than 0.05.
X/n p0 Pˆ p 0
Z0 or Z0
p0 (1 p0 )/n p 0 (1 p 0 ) /n
2
z /2 p 0 (1 p 0 ) z p (1 p )
n (9-14)
p p0
2
z p 0 (1 p 0 ) z p (1 p )
n (9-15)
p p0
Thus, the probability is about 0.7 that the semiconductor manufacturer will fail to
conclude that the process is capable if the true process fraction defective is
p = 0.03 (3%). That is, the power of the test against this particular alternative is only
about 0.3. This appears to be a large -error (or small power), but the difference
between p = 0.05 and p = 0.03 is fairly small, and the sample size n = 200 is not
particularly large.
The required sample size can be computed from Equation 9-15 as follows:
2
1 . 645 0 . 05 0 . 95 1 . 28 0 . 03 0 . 97
n
0 . 03 0 . 05
~ 832
where we have used p = 0.03 in Equation 9-15.
Conclusion: Note that n = 832 is a very large sample size. However, we are
trying to detect a fairly small deviation from the null value p0 = 0.05.
154
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-7 Testing for Goodness of Fit
• The test is based on the chi-square distribution.
• Assume there is a sample of size n from a population whose probability
distribution is unknown.
• Let Oi be the observed frequency in the ith class interval.
• Let Ei be the expected frequency in the ith class interval.
The test statistic is
k
(O i E i ) 2
X 2
0 Ei
(9-16)
i 1
Poisson Distribution
The number of defects in printed circuit boards is hypothesized to follow a
Poisson distribution. A random sample of n = 60 printed boards has been
collected, and the following number of defects observed.
Number of Observed
Defects Frequency
0 32
1 15
2 9
3 4
Expected
Number of Defects Probability Frequency
0 0.472 28.32
1 0.354 21.24
2 0.133 7.98
3 (or more) 0.041 2.46
k
o i E i 2
Ei
i 1
6. Computations:
32 28 . 32 2
15 21 . 24 2
28 . 32 21 . 24
13 10 . 44 2 2 . 94
10 . 44
2
7. Conclusions: We find from Appendix Table III that 2 . 71 and
2
3.84 . Because 2 . 94 lies between these values, we conclude
that the P-value is between 0.05 and 0.10. Therefore, since the P-value
exceeds 0.05 we are unable to reject the null hypothesis that the distribution
of defects in printed circuit boards is Poisson. The exact P-value computed
from Minitab is 0.0864.
c
1
uˆ i
n
O ij
j 1
r
1 (9-17)
vˆ j
n
O ij
i 1
r c ( O ij E ij ) 2 (9-19)
E ij
i 1 j 1
The opinions of a random sample of 500 employees are shown in Table 9-3.
Table 9-3 Observed Data for Example 9-14
Health Insurance Plan
Job
1 2 3 Totals
Classification
Salaried workers 160 140 40 340
Hourly workers 40 60 60 160
Totals 200 200 100 500
To find the expected frequencies, we must first compute uˆ1 (340/500) 0.68,
uˆ2 (160/500) 0.32, vˆ1 (200/500) 0.40, vˆ2 (200/500) 0.40, and
vˆ 3 (100 /500) 0 . 20
For example, the expected number of salaried workers favoring health insurance
plan 1 is
E 11 n uˆ1 vˆ1 500 0 . 68 0 . 40 136
160 136 2
140 136 2
40 68 2
136 136 68
40 64 2
60 64 2
60 32 2
64 64 32
49 . 63
• The sign test is used to test hypotheses about the median of a continuous distribution.
• Test procedure: Let X1, X2,... ,Xn be a random sample from the population of interest.
Form the differences Xi 0 , i =1,2,…,n.
• An appropriate test statistic is the number of these differences that are positive, say R+.
• P-value for the observed number of plus signs r+ can be calculated directly from the
binomial distribution.
• If the computed P-value is less than or equal to the significance level α, we will reject H0 .
• If the computed P-value is less than the significance level α, we will reject H0 .
3. Alternative hypothesis: H 1 : 2 0 0 0 p si
4. Test statistic: The test statistic is the observed number of plus differences in
Table 9-5, i.e., r+ = 14.
5. Reject H0 : If the P-value corresponding to r+ = 14 is less than or equal to
α= 0.05
6. Computations : r+ = 14 is greater than n/2 = 20/2 = 10.
1
P-value : P 2 P R
14 w hen p
2
20
20
0 . 5 0 . 5
20 r
2
r
r 14 r
0 .1 1 5 3
7. Conclusions: Since the P-value is greater than α= 0.05 we cannot reject
the null hypotheses that the median shear strength is 2000 psi.
Sec 9-9 Nonparametric Procedures 171
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
9-9 Nonparametric Procedures
9-9.2 The Wilcoxon Signed-Rank Test
• Rank the absolute differences X i 0 in ascending order, and give the ranks
to the signs of their corresponding differences.
• Let W+ be the sum of the positive ranks and W– be the absolute value of the sum of the
negative ranks, and let W = min(W+, W−).
Critical values of W, can be found in Appendix Table IX.
• If the computed value is less than the critical value, we will reject H0 .
Let’s illustrate the Wilcoxon signed rank test by applying it to the propellant shear strength
data from Table 9-5. Assume that the underlying distribution is a continuous symmetric
distribution. Test the hypothesis that the median shear strength is 2000 psi, using
α = 0.05.
3. Alternative hypothesis: H 1 : 2 0 0 0 p si
Observation Differences
Signed Rank
i xi - 2000
16 53.50 1
4 61.30 2
1 158.70 3
11 165.20 4
18 200.50 5
5 207.50 6
7 –215.30 –7
13 –220.20 –8
15 –234.70 –9
20 –246.30 –10
10 256.70 11
6 –291.70 –12
3 316.00 13
2 –321.85 –14
14 336.75 15
9 357.90 16
12 399.55 17
17 414.40 18
8 575.10 19
19 654.20 20
7. Conclusions: Since W = 60 is not ≤ 52 we fail to reject the null hypotheses that the mean
or median shear strength is 2000 psi.
Sec 9-9 Nonparametric Procedures 174
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.