Statistical Inference II Course Outline
Statistical Inference II Course Outline
Statistical Inference II
1
STA 221 – STATISTICAL INFERENCE II
COURSE OUTLINE
➢ Sampling and Sampling Distributions
➢ Point Estimation and Interval Estimation
➢ Principles of Hypotheses Testing
➢ Tests of Hypotheses concerning population means, proportions and
variances for Large and Small Samples
➢ Goodness-of-fit Tests
➢ Analysis of Variance
Texts
(1) Afonja B., Olubusoye O. E., Ossai E. and Arinola J. (2014):
“Introductory Statistics – A Learner’s Motivated Approach.” Evans
Brothers Ltd. Ibadan.
(2) Hamburg Morris (1970). “Statistics Analysis for Decision
Making.” New York: Harcourt, Brace & World, Inc.
(3) Hogg R. V and Craig A. T (1970). “Introduction to Mathematical
Statistics.” 3rd Edition. New York: Macmillan Publishing Co., Inc.
London: Collier Macmillan Publishers.
(4) Hogg R. V and Craig A. T (1995). “Introduction to Mathematical
Statistics.” 5th Edition. London: Prentice-Hall, Inc.
(5) Hogg R. V and Tanis E. A (1993). “Probability and Statistical
Inference.” New York: Macmillan Publishing Company.
(6) Larson Harold J (1982). “Introduction to Probability Theory and
Statistical Inference.” Third Edition. New York: John Wiley &
Sons.
(7) Lindgren Bernard W (1976). “Statistical Theory” Third Edition.
New York: Macmillan Publishing Co., Inc.
(8) Montgomery C. Douglas and Runger C. George (2003).
“Applied Statistics and Probability for Engineers.” 3rd Ed. John
Wiley & Sons, Inc. New York.
2
(11) Ross S (1994). “A First Course in Probability.” 4th Ed., New Jersey:
Prentice-Hall, Inc. A Simon & Schuster Company, Englewood
Cliffs.
(12) Ross M. Sheldon (2004). “Introduction to Probability and statistics
for engineers and scientists.” Third Edition. Elsevier Academic
Press, San Diego, USA.
(13) Roussas George G. (1972). “A First Course in Mathematical
Statistics.” Reading Massachusetts: Addisson-Wesley publishing
Company.
(14) Schay G´eza (2007). “Introduction to Probability with Statistical
Applications.” Birkh¨auser, Boston, USA.
(15) Shangodoyin D. K, Olubusoye O. E, Shittu O. I and Adepoju A.
A (2002). “Statistical Theory and Methods” Nigeria: Joytal Printing
Press, Ibadan. ISBN: 978-2906-23-9.
(16) Spanos Aris (2003). “Probability Theory and Statistical Inference:
Econometric Modeling with Observational Data.” Cambridge
University Press, New York.
(17) Spiegel Murray R (1972). “Theory and problems of Statistics.” SI
(metric) Edition. New York: McGraw-Hill Book Company (UK)
Limited.
3
LECTURE ONE
Sampling
Introduction
Have you ever tasted a hot soup and decided whether the soup was tasty or not?
If yes, then you are a sampler. Sampling is a part of our day-to-day life, which
we use either advertently or inadvertently. Another example is a pathologist
who takes a few drops of blood and tests for any abnormality in the blood of the
whole body. The process of using information obtained from the smaller
quantity to make statement about the larger quantity is called sampling. In this
lecture, we shall examine why this process is sometimes necessary and the
various techniques for doing it. We shall first learn some fundamental concepts,
which are related to sampling.
Objectives
At the end of this lecture, you should be able to:
1. distinguish between census, population and sample;
2. discuss the reasons for sampling; and
3. discuss the various procedures of sampling.
Pre-Test
1. Have you heard of census before? What do you understand by it?
2. Mention different kinds of statistical investigations you are familiar
with.
4
CONTENT
A. Some Basic Concepts
A1. Census
A census involves a complete count (or a complete enumeration) of every
individual member of the population of interest, such as persons in a country,
households in a town, shops in a city, students in a college, and so on. Apart
from the cost and the large amount of resources (such as enumerators, clerical
assistance, etc.) that are required, the main problem is the time required to
process the data. Thus, the results are not known immediately.
A2. Population
In statistical sense, population is a group of items, units or subjects, which is
under reference of study. It is often referred to as universe by a number of
statisticians and scientists. The inhabitants of a region, number of cars in a city,
workers in a factory, students in a university, insects in a field, etc., are few
examples of populations. Generally, populations or universe is classified into
four categories:
Finite population- the number of items or units is fixed, limited and countable,
e.g. workers in a factory.
Infinite population- the number of items or units is uncountable, e.g. stars in the
sky.
Real population- the items or units in the population are all physically present
or visible.
Hypothetical population- the population results from repeated trials, e.g. the
tossing of a coin repeatedly results into a hypothetical population of heads and
tails, rolling of a die again and again gives rise to a hypothetical population of
numbers from 1 to 6, etc.
A3. Sample
A sample is a part or fraction of a population selected on some basis. In
principle, a sample should be such that it is a true representative of the
population. The process of selecting a sample from the population is called
sampling, and the manner or scheme through which the required number of
units is selected is called the sampling method. The foremost purpose of
sampling is to gather maximum information about the population under
consideration at minimum cost, time and resources. Precisely, sampling is
inevitable in the following situations:
➢ when population is infinite;
➢ when the item or unit is destroyed under investigation;
5
➢ when the results are required in a short time;
➢ when resources are limited particularly in respect of money and trained
persons;
➢ when population is either constantly changing or in a state of movement;
➢ when the items or units are scattered.
B. Sampling Methods
Several sampling methods are available, which are classified into two
categories:
8
Summary
In this lecture, we have defined some fundamental concepts such as census,
population and sample. To understand the characteristics of any population with
absolute accuracy, we need to have all the possible relevant information about
every member of that population. This is usually not possible because such
information is either not available or the task of data collection is not desirable
in terms of time and/or money. So we settle for a part or a fraction of the
population which is called sample. Sampling has been defined as a process of
selecting units from the population, and there are several techniques or methods
of doing this. The methods are classified into random sampling and non-
random sampling.
Post-Test
1. Briefly explain:
a. The fundamental reason for sampling
b. Some of the reasons why a sample is chosen instead of testing the
entire population.
2. Distinguish between sampling and non-sampling errors. What are their
sources? How can these errors be controlled?
3. To study the average effect of fish on human cholesterol level (in
blood), a researcher randomly selects 500 males of 25 years of age who
have never taken fish more than once a week and measures their
cholesterol level. The researcher then serves all the individuals 8 ounces
of fish everyday for one year. After one year the researcher measures the
cholesterol level of each individual again, and calculates the difference
with the year before value (difference=pre-diet level minus post diet
level). Determine the
a. population
b. sample
c. variable under study and
d. the parameter of interest
4. List and explain the various sampling methods.
5. Define the following:
a. sampling unit
b. sampling frame
c. sampling interval
d. sampling method
e. sampling error
9
LECTURE TWO
Introduction
In the previous lecture, we discussed population and sampling. Imagine that
you have a large population to study and the description of its characteristics is
not possible by census method. Then, in order to make statistical inference,
samples of given size are drawn repeatedly from the population and ‘statistic’
computed for each sample. The computed value of a particular statistic will
differ from sample to sample. This implies that, if the same statistic is
computed for each of the samples, the value is likely to vary from sample to
sample. Thus, it would be theoretically possible to construct a frequency table
showing the values assumed by the statistic and their frequency of occurrence.
This distribution of values of a statistic is called a sampling distribution,
because the values are the outcome of a process of the sampling. Since the
values of statistic are the results of several simple random samples, therefore
they are random variables.
Objectives
At the end of this lecture, you should be able to:
1. explain the concept of a sampling distribution; and
2. explain the concept of standard error and differentiate it from standard
deviation;
Pre- Test
1. Distinguish between parameter and statistic and give example each.
2. Give the formula for sample mean of n observations.
3. Sample standard deviation of the variate values x1 , x2 ,..., xn can be
computed from the formula.
10
CONTENT
Population Distribution
The population distribution is the distribution of values of its members and has
mean denoted by μ, variance σ2 and standard deviation σ. For example, a
population consisting of the numbers 0, 2, 4 and 6 has mean μ = 3 and standard
deviation σ = 5 .
Example
A population consists of the numbers 0, 2, 4 and 6, List all possible samples of
size 2 that can be drawn
1. with replacement
2. without replacement
Solution
1. The population size N = 4 and sample size n = 2, therefore, 42 =16
possible samples can be drawn with replacement. The list of the possible
samples is given as follow:
sample number sample elements
1 0, 0
2 0, 2
3 0, 4
4 0, 6
5 2, 0
6 2, 2
7 2, 4
8 2, 6
9 4, 0
11
10 4, 2
11 4, 4
12 4, 6
13 6, 0
14 6, 2
15 6, 4
16 6, 6
2. The population size N = 4 and sample size n = 2, therefore,
4C2 = 2!(44!−2)! = 6 possible samples can be drawn without replacement. The
list of the possible samples is given as follow:
sample number sample elements
1 0, 2
2 0, 4
3 0, 6
4 2, 4
5 2, 6
6 4, 6
Sampling Distribution of a sample statistic
If a particular statistic (e.g. sample mean, sample standard deviation, etc.) is
computed for each of the possible samples, the value of the statistic will differ
from sample to sample. Thus, it would be theoretically possible to construct a
frequency table showing the values assumed by the statistic and their frequency
of occurrence. This distribution of values of a statistic is called a sampling
distribution. Thus, we see that there would be an overall mean (where it is
centered), a standard deviation (representing the spread) and a shape if the
histogram is plotted. So, we can talk of the mean of sampling distribution of a
statistic (denoted m if m is the statistic), and standard deviation of sampling
distribution of a statistic (denoted m if m is the statistic). These properties help
lay down rules for making statistical inferences about a population on the basis
of a single sample drawn from it, that is, without even repeating the sampling
process.
12
deviation of sampling distribution measures the variability among values of the
statistic due to sampling error. Standard error is a measure of a reasonable
difference between a particular sample statistic and the population parameter. It
is used in tests of whether a particular sample could have been drawn from a
given parent population. It is also used in working out confidence limits and
confidence intervals.
Summary
In this lecture, we have learnt that:
1. there are Nn possible samples of size n that can be drawn with
replacement from a population having N elements;
N!
2. there are NCn = possible samples of size n that can
n !( N − n)!
be drawn without replacement from the population having N
elements.;
3. sampling distribution is the probability distribution of all
possible values of a given statistic from all the distinct
possible samples of equal size drawn from a population.; and
4. standard error of statistic measures the amount of chance error
in the sampling process.
Post- Test
1. A population consists of the following numbers 12, 7, 9, 11, and 13.
a. Calculate the population mean μ.
b. Calculate the population standard deviation σ.
c. List all possible samples of size 2 that can be taken with replacement
from the population.
d. List all possible samples of size 2 that can be taken without
replacement from the population.
2. Explain the concept of standard error. Discuss the relevance of standard
error in statistical inference.
3. What is the distinction between a standard deviation and a standard
error?
13
LECTURE THREE
Introduction
The sample mean is referred to as the point estimate of the population mean.
For example, if you are interested in the mean rent charged for a 2-bedroom
apartment in the Bodija area of Ibadan, you may obtain a random sample and
from that sample you obtain the sample mean. This sample mean is one number
which estimates the population mean rent for 2-bedroom apartment in the area.
The sampling distribution of the mean refers to the distribution of all the
possible sample means that could be obtained if you select all possible samples
of a given size. In general, the sampling distribution of the sample mean
depends on the distribution of the population from which the sample is drawn.
If a population is normally distributed, then the sampling distribution of the
sample mean is also normally distributed regardless of the sample size. Even if
the population is not distributed normally, the sampling distribution of the
sample mean tends to be distributed normally as the sample size is sufficiently
large.
Objectives
At the end of this lecture, you should be able to:
1. list the properties of the sampling distribution of the sample mean;
2. determine the sampling distribution of mean when population has
normal distribution;
3. determine the sampling distribution of mean when population has non-
normal distribution; and
4. determine the sampling distribution of the difference between two
sample means.
Pre-Test
1. What are the parameters of normal distribution? What information is
provided by these parameters?
14
2. What are the chief properties of normal distribution? Describe briefly
the importance of normal distribution in statistical analysis.
CONTENT
A. Properties of the Sampling Distribution of the Sample Mean
There are three very important properties associated with the sampling
distribution of the sample mean. These properties are the centre, spread and
shape of the sampling distribution.
1. Centre: The sample mean is an unbiased estimator
The arithmetic mean X of sampling distribution of mean values (also called
mean of means) is equal to the population mean μ regardless of the form of
population distribution, that is, X = μ.
Example 1
A population consists of the numbers 0, 2, 4 and 6. The population mean μ = 3.
Now, consider all possible samples of size 2 without replacement from the
population and their means as shown in the following table.
sample number sample elements sampling distribution of mean
1 0, 2 1
2 0, 4 2
3 0, 6 3
4 2, 4 3
5 2, 6 4
6 4, 6 5
The arithmetic mean of sampling distribution of mean value
is X = 1+2+3+63+ 4+5 = 3 .
z=
(X 1 (
− X 2 ) − X1 − X 2 ) = (X 1 − X 2 ) − ( 1 − 2 )
X −X1 2
X −X
1 2
where
X1 − X 2 = X1 − X 2 = 1 − 2 mean of sampling distribution of difference of
two means
12 2
X − X = X2 + X2 =
1 2 1 2 n1 + n22 standard error of sampling distribution of
difference of two means
n1 and n2 independent random samples drawn from first and second
population, respectively.
Example
The strength of the wire produced by company A has a mean of 4,500 kg and a
standard deviation of 200 kg. Company B has a mean of 4,000 kg and a
standard deviation of 300 kg. If 50 wires of company A and 100 wires of
company B are selected at random and tested for strength, what is the
probability that the sample mean strength of A will at least 600 kg more than
that of B?
Solution
We are given the following information:
Company A: μA = 4,500, σA = 200 and nA = 50
Company B: μB = 4,000, σB = 300 and nB = 100.
Thus,
X A − X B = X A − X B = A − B = 4,500 – 4000 = 500
and
A2 2
X A−XB
= nA + nBB = 40,000
50 + 90,000
100 = 41.23
17
Summary
In this lecture, we have learnt the following:
1. Properties of the Sampling Distribution of the Sample Mean are:
a. The sample mean is an unbiased estimator of the population
mean.
b .Standard error of the mean equals to the population standard
deviation divided by the square root of the sample size.
c .The sampling distribution of sample mean values from normally
distributed population is the normal distribution for samples of all
size.:
2. If the sample size is at least 30, the sampling distribution of mean
X is assumed to be normally distributed, regardless of the form of
the population distribution.
Post-Test
1. What are the properties of the sampling distribution of the sample mean?
2. Random samples of size 2 are taken from the finite population which
consists of the numbers 0, 2, 4, 6, 8, and 10.
a. Show that the mean and the standard deviation of this population are
μ = 5 and σ = 35 3 .
b. List the 15 possible samples of size 2 that can be taken from this
finite population and calculate their respective means.
c. Calculate the mean and the standard deviation of the sampling
distribution of means obtained in b.
3. The finite population in 2 above can be converted into an infinite
population if we sample with replacement.
a. List the 36 possible samples of size 2 that can be drawn with
replacement from the population.
b. Calculate the mean of each of the 36 samples obtained in part a, and
construct the sampling distribution of the mean.
c. Calculate the mean and standard deviation of the sampling
distribution of means obtained in b.
4. Assume that the heights of 300 soldiers in an army battalion are
normally distributed with mean 68 inches and standard deviation 3
inches. If 80 samples consisting of 25 soldiers each are taken, what
would be the expected mean and standard deviation of the resulting
sampling distribution of means if the sampling is done (a) with
replacement and (b) without replacement?
18
LECTURE FOUR
Introduction
There are many situations in which each individual member of the population
can be classified into two mutually exclusive categories, such as success or
failure, accept or reject, head or tail of a coin, and so on. For instance, the
population could be registered voters living in a city, and the attribute is “plans
to vote for party A in presidential elections”. We take a random sample from
the population and observe the number in the sample planning to vote party A
in presidential elections. There are two possible outcomes, “success” and
“failure.” Success on voter i means voter i plans to vote party A and failure
vice versa. The sample proportion can then be defined as the number of
successes divided by the sample size. With the same logic of sampling
distribution of mean, the sampling distribution of sample proportion can be
derived.
Objectives
At the end of this lecture, you should be able to:
1. define sample proportion;
2. list the properties of the sampling distribution of sample proportion; and
3. determine the sampling distribution of the difference of two proportions.
Pre-Test
1. Toss a coin ten times. If the outcome of a head is a “success”, count the
number of successes observed. What is the proportion of heads?
2. Randomly select a sample of 10 students from your class register. Count
the number of successes if your sex is a “success”. What is the
proportion of successes? Select another round of ten students randomly
and determine the proportion of successes in the sample? Repeat the
exercise many times, say 30 and construct the frequency distribution for
the proportions.
19
CONTENT
A. Sample Proportion
The sample proportion p is defined as:
Number of successes, x
p= .
Sample size, n
The sample proportion p having the characteristic of interest is the best statistic
to use for statistical inferences about the population proportion parameter p .
For example, a company writing industrial accident insurance might estimate as
0.71 the proportion of its policyholders who file at least one claim per year, if a
sample check of 200 policies shows that 142 had at least one claim filed during
2006.
p = pq
n = p (1− p )
n
if sampling is with replacement and
p = pq
n
N −n
N −1 if sampling is without replacement.
3. If a large sample size ( n 30 ) satisfies the following two conditions:
i. np 5 ;
ii. n(1 − p) 5 ;
then the sampling distribution of the sample proportion is approximately
normally distributed. Thus, to standardize sample proportion p , the
standard normal variable
p− p p− p
z= p = P (1− p ) n
Example
A manager in the billing section of a mobile phone company checks on the
proportion of customers who are paying their bills late. Company policy
20
dictates that this proportion should not exceed 20 per cent. Suppose that the
proportion of all invoices that were paid late is 20 per cent. In a random sample
of 140 invoices, determine the probability that more than 28 per cent invoices
are paid late.
Solution
Given p = p = 0.20, n = 140;
p = p (1− p )
n = 0.200.80
140 = 0.033
P p 0.28 = P z p−pp
= P z 0.280.033
− 0.20
=P z 2.42 = 0.0082
If the sample size n1 and n2 are large, that is, n1 30 and n2 30, then the
sampling distribution of difference of proportions is closely approximated by a
normal distribution.
Example
Ten per cent of machines produced by company A are defective, and five per
cent of those produced by company B are defective. A random sample of 250
machines is taken from company A and a random sample of 300 machines from
company B. What is the probability that the difference in sample proportion is
less than or equal to 0.02?
21
Solution
We are given the following information
pA − pB = pA − pB = pA − pB = 0.10 − 0.50 = 0.05 nA = 250 and nB = 300.
0.05=0.05;
Summary
In the course of this lecture, we discussed the following:
1. The sample proportion is defined as the number of units in the
sample having the characteristic of interest divided by the
sample size.
2. The sampling distribution of sample proportion has the
following properties:
1. p = p .
2. p = pq
n = p (1− p )
n
p− p p− p
3. z = p = P (1− p ) n
is approximately the standard normal.
p1 (1− p1 ) p2 (1− p2 )
Standard Deviation: p − p = p2 + p2 =
1 2 1 2 n1 + n2
Post-Test
1. Distinguish clearly between sample proportion and population
proportion.
22
2. If the population proportion is 0.5 and standard error of sample
proportion is 0.01, determine the sample size required.
3. If a coin is tossed 20 times and it falls on head after every toss, it is a
success. Suppose the probability of success is 0.5. What is the
probability that the number of successes is less than or equal to 12?
4. A sales manager of a firm believes that 30 percent of the firm’s orders
come from first time customers. A simple random sample of 100 orders
will be used to estimate the proportion of first-time customers. Assume
that the sales manager is correct and proportion is 0.30.
a. Justify sampling distribution of proportion for this case.
b. What is the probability that the sample proportion will be between
0.20 and 0.40?
23