0% found this document useful (0 votes)

21 views43 pages

Understanding Statistics: Scope & Types

Statistics is the study of data collection, organization, analysis, and inference about populations, divided into descriptive and inferential statistics. Descriptive statistics summarize data, while inferential statistics use sample data to make estimates about a population. Understanding statistics enhances communication, technical literacy, and career advancement, making it essential in various fields including business, healthcare, and engineering.

Uploaded by

princedeniyiasade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views43 pages

Understanding Statistics: Scope & Types

Uploaded by

princedeniyiasade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MODULE ONE

SCOPE AND ORIGIN OF STATISTICS

What is statistics?
Statistics is a field of study concerned with:
(a) the collection, organization, and analysis of data, and
(b) the drawing of inferences about a population.
It can be seen that statistics can be classified into two main branches – descriptive statistics
and inferential statistics.

Descriptive statistics consists of methods dealing with the collection, tabulation,

summarization, and presentation of data.

These methods describe the various aspects of a data set. Descriptive statistical methods have
their beginning in the inventories kept by early civilizations, such as the Babylonians,
Egyptians, and the Chinese. For example, the Old Testament of the Bible refers to the
numbering or counting of the people of Israel and to the casting of lots for selection by chance,
and the Romans kept careful counts of people, possessions, and wealth in the territories they
conquered. Similarly, the Domesday Book of the late eleventh century enumerated the lands
and wealth of England. The Middle Ages also saw the growth of governments and religious
institutions and their recording of births, deaths, and marriages. These early methods were
primarily lists and counts kept for purposes of taxation and military conscription.

Inferential statistics consist of methods that permit one to reach conclusions and make
estimates about populations based upon information from a sample.

Census, parameter and statistic

If every member of a population is evaluated, a census has been performed, and any summary
value of all the individual measurements is called a parameter. If only a subset of a population
has been evaluated, any summary value of such measurement is called a statistic. Inferential
statistics, therefore, involves using sample statistics to estimate population parameters.

A census is an enumeration or evaluation of every member of a population.

A parameter is any measurement that describes an entire population. Usually, the parameter
value is unknown since we rarely can observe the entire population. Parameters are often (but
not always) denoted by Greek letters, such as µ,θ and σ

Why study statistics?

Knowing statistics will make you a better consumer of other people’s data. Even if you don’t
plan to be a professional statistician, you should know enough to handle everyday data
problems, to feel confident that others cannot deceive you with spurious arguments, and to
know when you’ve reached the limits of your expertise. Statistical knowledge gives your
company a competitive advantage against organizations that cannot understand their internal
or external market data. And mastery of basic statistics gives you, the individual manager, a
competitive advantage as you work your way through the promotion process, or when you
move to a new employer. Here are some reasons why we study statistics.
Communication
The language of statistics is widely used in science, education, health care, engineering, and
even the humanities. In all areas of business (accounting, finance, human resources,
marketing, information systems, operations management), workers use statistical jargon to
facilitate communication. In fact, statistical terminology has reached the highest corporate
strategic levels. And in multinational environment, the specialized vocabulary of statistics
permeates language barriers to improve problem-solving across national boundaries.
Computer Skills
Whatever your computer skill level, it can be improved. Every time you create a spreadsheet
for data analysis, write a report, or make an oral presentation, you bring together skills you
already have, and learn new ones. Specialists, with advanced training, design the databases
and decision support systems, but you must expert to handle daily data problems without
experts. Besides, you can’t always find an “expert” and, if you do, the “expert” may not
understand your application very well. You need to be able to analyze data, use software with
confidence, prepare your own charts, write your own reports, and make electronic
presentations on technical topics.
Information Management
Statistics can help you handle either too little or too much information. When insufficient data
are available, statistical surveys and samples can be used to obtain the necessary market
information. But most large organizations are closer to drowning in data than starving for it.
Statistics can help you to summarize large amounts of data and reveal underlying
relationships. Have you heard of data mining? Statistics is the pick and shovel that you take
to the data mine.
Technical Literacy
Many of the best career opportunities are in growth industries propelled by advanced
technology. Marketing staff may work with engineers, scientists, and manufacturing experts
as new products and services are developed. Sales representatives must understand and
explain technical products like pharmaceutical, medical equipment and industrial tools to
potential customers. Purchasing managers must evaluate suppliers’ claims about the quality
of raw material, components, software, or parts.
Career Advancement
Whenever there are customers to whom services are delivered, statistical literacy can enhance
your career mobility. Multi-billion-dollar companies like Barclays Bank, Citibank, Microsoft,
and Wal-Mark, use statistics to control cost, achieve efficiency, and improve quality. Without
a solid understanding of data and statistical measures, you may be left behind.
Quality improvement
Large manufacturing firms like Coca Cola and General Motors, have formal systems for
continuous quality improvement. The same is true of insurance companies and financial
service firms like Vanguard, Fidelity, and Barclays Bank. Statistics helps firms oversee their
supplies, monitor their internal operations. Quality improvement goes far beyond statistics,
but every university college graduate is expected to know enough statistics to understand its
role in quality improvement.

Medicine
An experimental drug to treat asthma is given to 75 patients, of whom 24 get better. A placebo
is given to a control group of 75 volunteers, of whom 12 get better. Is the new drug better
than the placebo, or is the difference within the realm of chance?
Forecasting

Page 2 of 43
A large company carries 50 000 different products. To manage this vast inventory, it needs a
weekly order forecasting system that can respond to developing patterns in consumer demand.
Is there a way to predict weekly demand and place order from suppliers for every item,
without an unreasonable commitment of staff time?
Product warranty
A major automaker wants to know the average dollar cost of engine warranty claim on a new
hybrid engine. It has collected warranty cost data on 4 300 warranty claims during the first 6
months after the engines are introduced. Using these warranty claims as an estimate of future
costs, what is the margin of error associated with this estimate?

Variables and types of variables

Variables and constants
Variables
Any type of observation which can take different values for different people, or different
values at different times, or places, is called a variable. The following are examples of
variables:
(a) family size, number of hospital beds, year of birth, number of schools in a country, etc.
(b) height, mass, blood pressure, temperature, blood glucose level, etc.
There are, broadly speaking, two types of variables – quantitative and qualitative variables
(or categorical).
Constants
Constants are characteristics that have values that do not change. Examples of constants are:
pi (π), the ratio of the circumference of a circle to its diameter (Π = 3.14159...) er and е, the
base of the natural or (Napierian) logarithms e= (2.71828).
Types of variables
Quantitative variables
A quantitative variable is one that can take numerical values. The variables in (a) and (b),
above, are examples of quantitative variables. Quantitative variables may be characterized
further as to whether they are discrete or continuous.
Discrete variables
The variables in (a), above, can be counted. These are examples of discrete variables. A
discrete variable is characterized by gaps or interruptions in the values that it can assume.
Any variable phrased as “the number of …”, is discrete, because it is possible to list its
possible values {0,1, …}. Any variable with a finite number of possible values is discrete.
A variable is considered a discrete variable if its unit of measurement cannot be broken down
or subdivided into finer or smaller units. for example, the variable "Children ever born per
woman" is discrete. its unit of measurement is human beings or persons, and these exist ony
as whole numbers (integers). Children ever born could take on values ranging from 0 (no
children), 1 child, 2 children, or 3 children to the highest possible number of children born,
but this variable will never take on values such as 0.1 children or 2.5 children. Other examples
of discrete variables are household size, number of living children, number of married
couples, number of cars per garage, and so on.
The following example illustrates the point. The number of daily admissions to a hospital is
a discrete variable since it can be represented by a whole number, such as 0, 1, 2 or 3. The
number of daily admissions on a given day cannot be a number such as 1.8, 3.96 or 5.33.
Continuous variables
The variables in (b), above, can be measured. These are examples of continuous variables. A
continuous variable does not possess the gaps or interruptions characteristic of a discrete

Page 3 of 43
variable. A continuous variable can assume any value within a specific relevant interval of
values assumed by the variable. Notice that age is continuous since an individual does not
age in discrete jumps.
Continuous variables are those whose numerical values can be broken down or sub-divided
into finer units almost indefinitely. Age qualifies as an example of a continuous variable in
that it could be broken down into years, months, days, and beyond. Other examples are
weight, height, time, income, educational attainment, and so on. Many other continuous
variables are formed by taking ratios or rates. The homicide rate (homicides per 100,000
residents) is a continuous variable because it divides homicides by population, and the
calculation could be carried out to a number of decimal points. One hallmark of continuous
variables is that in addition to researcher's ability to break them down into finer gradations,
they can assume decimals.

Categorical variables
A variable is called categorical when the measurement scale is a set of categories. For
example, marital status, with categories (single, married, widowed), is categorical. For
Ghanaians, the region of residence, is categorical, with categories Greater Accra, Eastern, and
so on. Other categorical variables are whether employed (yes, no), religious affiliation
(Protestant, Catholic, Jewish, Muslim, others, none), political party preference and favorite
type of music (classical, country, folk, jazz, rock), place of birth, nationality, colour, colour
of hair, gender, blood group, smoking habit, surname, rank in military. Categorical variables
are often called qualitative. It can be seen that categorical variables can neither be measured
nor counted.

Levels of measurement and measurement scales

Variables can further be classified according to the following four levels of measurement:
nominal, ordinal, interval and ratio. A detailed discussion of this can be found in Stevens
(1946), and Ofosu and Hesse (2011).

Nominal scale
This scale of measure applies to qualitative variables only. On the nominal scale, no order is
required. For example, gender is nominal, blood group is nominal, and marital status is also
nominal. On the nominal scale, categories are mutually exclusive. Thus an item must belong
to exactly one category. Notice that, we cannot perform arithmetic operations on data
measured on the nominal scale.

Ordinal scale
This scale also applies to qualitative data. On the ordinal scale, order is necessary. This means
that one category is lower than the next one or vice versa. For example, in the Army, the rank
of private is lower than the rank of captain, which is lower than the rank of major, and so on.
Thus, the rank of an army officer is measured on the ordinal scale. In universities, the rank of

Page 4 of 43
an academic staff is measured on the ordinal scale. Grades are also ordinal, as excellent is
higher than very good, which in turn is higher than good, and so on.
It should be noted that, in the ordinal scale, differences between category values have no
meaning. For example, although Professor is higher than Lecturer, the difference between
these two ranks does not exist numerically. Similarly, if 4 denotes “excellent”, 3 denotes
“very good”, 2 denotes “good” and 1 denotes “fair”, it does not mean that a candidate who is
rated “excellent” is twice as competent as a candidate who is rated “good”, just because
“excellent” is denoted by 4 and “good” is denoted by 2.
Interval scale
This scale of measurement applies to quantitative data only. In this scale, the zero point does
not indicate a total absence of the quantity being measured. An example of such a scale is
temperature on the Celsius or Fahrenheit scale. Suppose the minimum temperatures of 3
cities, A, B and C, on a particular day were 00C, 200C and 100C, respectively. It is clear that
we can find the differences between these temperatures. For example, city B is 200C hotter
than city A. However, we cannot say that city A has no temperature. Note that city A has a
temperature equivalent to 320F. Moreover, we cannot say that city B is twice as hot as city C,
just because city B is 200C and city C is 100C. The reason is that, in the interval scale, the
ratio between two numbers is not meaningful.
Ratio scale
This scale of measurement also applies to quantitative data only and has all the properties of
the interval scale. In addition to these properties, the ratio scale has a meaningful zero starting
point and a meaningful ratio between 2 numbers.
An example of variables measured on the ratio scale, is weight. A weighing scale that reads
0 kg gives an indication that there is absolutely no weight on it. So, the zero starting point is
meaningful. If Yaw weighs 40 kg and Akosua weighs 20 kg, then Yaw weighs twice as
Akosua. Another example of a variable measured on the ratio scale is temperature measured
on the Kelvin scale. This has a true zero point.

Summary of types of variables

Fig. 1.1 shows a chart, summarizing the relationships between the various types of variables
and measurement scales.

Fig. 1.1:Types of variables

Page 5 of 43
Methods of data collection
Introduction

Page 6 of 43
Most research techniques and many statistical process-control techniques involve the use of
sampling. A sample is selected, evaluated and studied in an effort to gain information about
the larger population from which the sample was drawn. Above, we learned that a sample is
defined as a subset or part of a population. Although, by definition, samples will be smaller
than the population from which they are drawn, samples can be very small or very large. A
single student can be considered a sample of students from a given university, a very large
sample consisting of millions of households can be selected to respond to a lengthy
questionnaire that is part of a census.
A sample represents a population, and information obtained from a sample is generalized to
be true for the entire population from which it was drawn. The validity or accuracy of
generalizations from samples to populations depends on how well a sample represents its
population. A well-selected sample can provide information comparable to that obtained by
a census.

Advantages of sampling
Studying a sample instead of a population, can have the following advantages.
1. Cost – Samples can be studied at much lower cost. The smaller number of units or
individuals involved in a sample requires less time and money to evaluate. Samples can
provide affordable, accurate, and useful information in cases where a census would cost more
than the value of the information obtained.
2. Time – Samples can be evaluated more quickly than a population. If a decision had to wait
for the results of a census, a critical advantage might be missed, or the information might be
made obsolete by events or changes that took place while the data were being collected and
analyzed.
3. Accuracy – Any time data are collected, there is a chance for errors to occur. Errors of
measurement, incorrect recording of data, transposition of digits, recording of information in
the wrong area of a form, and errors in entering data into a computer can all influence the
accuracy of results. In general, the larger the data set, the more opportunity there is for errors
to occur. A sample can provide a data set that is small enough to monitor carefully and can
permit careful training and supervision of data gatherer and handlers.
4. Feasibility – In some research situations, the population of interest is not available for
study. A substantial portion of the population might not yet exist or might no longer be
available for evaluation. In other cases, evaluation of an item requires its destruction. For
example, a manufacturer interested in how much pressure could be applied to a part before it
cracked, could not perform a census without destroying the entire production run.
5. Scope of information – In a sample survey, there are greater varieties of information that
can be considered which may be impracticable in a complete census due to constraints such
as limited number of trained personnel and equipment. When evaluating a smaller group, it
is sometimes possible to gather more extensive information on each unit evaluated.

Sample designs
There are two categories of sample designs, namely, probability (or random) sampling and
non-probability sampling.
1. Probability Sampling
In this sub-section, we introduce important sampling methods which incorporate
randomization, which means that the selection is not consciously influenced by human
choice. The major principle of these designs is to avoid bias in the selection procedure and to
achieve the maximum precision for a given outlay of resources. The main types of probability

Page 7 of 43
sampling designs are: simple random sampling, systematic sampling, stratified sampling,
cluster sampling and multi-stage sampling.
(i) Simple random sample
Subjects of a population to be sampled could be families, schools, cities, hospitals, records of
reported crimes, and so on. Simple random sampling is a method of sampling for which every
possible sample has equal chance of selection. Let denote the number of subjects in the
sample. This number is called the sample size. N

A simple random sample of subjects from a population is one in which each possible sample
of that size has the same probability (chance) of being selected.

A simple random sample is often just called a random sample. The simple objective is used
to distinguish this type of sampling from more complex sampling schemes.
Why is it a good idea to use random sampling? Because everyone has the same chance of
inclusion in the sample, so it provides fairness. This reduces the chance that the sample is
seriously biased in some way, leading to inaccurate inferences about the population. Most
inferential statistical methods assume randomization of the sort provided by random
sampling.
How to select a simple random sample
One way of obtaining a simple random sample is to use the ‘lottery system’.
The lottery system
The lottery system consists of writing the name of each item in the sample frame on a slip of
paper or a card and then drawing them from a container one after the other. To ensure a bias
free selection, shuffle the cards or the slips of paper before each draw.
Advantages of the lottery system

lection bias.
Disadvantages of the lottery system
-consuming and cumbersome when the population is large.

Tables of random numbers

Another method for selecting a random sample is to use a table of random numbers. A table
of random numbers has the property that, no matter how we select our digits (up, down,
diagonally, etc.) each digit, 0 through 9, is equally likely to be selected. Table 1.2, on the next
page, shows 48 random digits arranged in 8 columns and 6 rows of five-digit blocks. The
random numbers were generated by using the MINITAB software.
Random numbers are numbers that are computer generated according to a scheme whereby
each digit is equally likely to be any of the integers 0, 1, 2, …, 9 and does not depend on the
other digits generated.
Page 8 of 43
The numbers fluctuate according to no set pattern. Any particular digit has the same chance
of being a 0, 1, 2, …, or 9. The numbers are chosen independently, so any digit chosen has
no influence on any other selection. If the first digit in a row of the table is 9, for instance, the
next digit is still just as likely to be a 9 as a 0 or a 1 or any other number. Random numbers
are available in published tables and can be generated with a software and many statistical
calculators.

Example 1.3
Suppose you want to select a simple random sample of 10 students from a class of 20 students.
The sampling frame is a directory of these students. You can select the students by using two-
digit random numbers to identify them, as follows:
(1) Assign the numbers 01 to 20 to the students in the directory, using 01 for the first student
in the list, 02 for the second student, and so on.
(2) Starting at any point in Table 1.2, choose successive two-digit numbers until you obtain
10 distinct numbers between 01 and 20.
(3) Include in the sample the students with the assigned numbers equal to the random numbers
selected.
For example, using the first row of Table 1.2, the first 5 two-digit random numbers are 10,
15, 01, 02 and 14. Notice that we skipped the numbers which are greater than 20 since no
student in the directory has an assigned number greater than these numbers.
After using the first row of Table 1.2, move to the next row of numbers and continue. The
column (or row) from which you begin selecting the number does not matter, since the
numbers have no set pattern. Most statistical software can do this all for you.
(ii) Systematic random sample
Another method of random sampling is to choose every kth item from the list, starting from a
randomly chosen entry among the first k items on the list. This is called systematic sampling.
The number k is called the skip number. Fig. 1.2 shows how to sample every fourth item,
starting from item 2, resulting in a sample of size n =20 items from a list of N =78 items.
A systematic sample of n items from a population of N items requires that the skip number
be approximately 𝑁 ⁄𝑛 sampling from a sampling frame, it is simpler to select a systematic
random sample than a simple random sample because it uses only one random number.

Page 9 of 43
Fig. 1.2:Systematic sampling
An attraction of systematic sampling is that it can be used with unlistable or infinite
population, such as production processes (e.g. testing every 5000th light bulb) or political
polling (e.g., surveying every tenth voter who emerges from the polling place). Systematic
sampling is also well-suited to linearly organized physical population (e.g., pulling every
tenth patient folder from alphabetized filing drawers in a veterinary clinic).
Example 1.4
Suppose we want a systematic random sample of 100 students from a population of students
30000 students listed in a campus directory. Here, n=100 and N =30000, and so k=30000/100
= 300. The population size is 300 times the sample size. Therefore, we have to select one of
every 300th students. We select one student at random using every student after the one
selected randomly. This produces a sample of size 100. The first three digits in Table 1.2 are
104, which falls between 001 and 300, so we first select the student numbered 104. The
numbers of the other students selected are 104 + 300 = 404, 404 + 300 = 704, 704 + 300 =
1004, 1004 +300 = 1304, and so on. The 100th student selected is listed in the last 300 names
in the directory.
(iii) Stratified random sample
Another probability sampling method, useful in social science research for studies comparing
groups, is stratified random sampling.
A stratified random sample divides the population into subgroups called strata, and then
selects a simple random sample from each stratum.
Stratified random sampling is called proportional if the sampled strata proportions are the
same as those in the entire population. For example, if 90% of the population of interest are
men and 10% are women, then the sampling is proportional if the sample size for men is nine
times the sample size for women.
Stratified random sampling is called disproportional if the sampled strata proportion differs
from the population proportions. This is useful when the population size for a stratum is
relatively small. A group that comprises a small part of the population may not have enough
representation in a simple random sample to allow precise inferences.
Example 1.5
Suppose we want to estimate smallpox vaccination rate among employees in a university, and
we know that our target population (those individuals we are trying to study) is 55% male and
45% female. Suppose our budget only allows a sample of size 200. To ensure the correct
gender balance, we could sample 110 males and 90 females.
(iv) Cluster random sampling
Simple, systematic, and stratified random sampling are often difficult to implement, because
they require a complete sampling frame. Such lists are easy to obtain when sampling cities or
Page 10 of 43
hospitals for example, but more difficult to obtain when sampling individuals or families.
Cluster samples are essentially strata consisting of geographical regions. We divide a region
(say a city) into sub-regions (say, blocks, sub-divisions, or schools). In a one-stage cluster
sampling, our sample consists of all elements in each of k randomly chosen sub-regions (or
clusters). In a two-stage cluster sampling, we first randomly select k sub-regions (clusters)
and then choose a random sample of elements within each cluster. Fig. 1.3 illustrates how
four elements could be sampled from each of five randomly chosen clusters, using a two-
stage cluster sampling.
Cluster sampling is useful when:

Although cluster sampling is cheap and quick, it is often reasonably accurate because people
in the same neighbourhood tend to be similar in income, ethnicity, educational background,
and so on. Cluster sampling is useful in political polling, surveys of gasoline pump prices,
studies of crime victimization surveys, or lead contamination in soil. A hospital may contain
clusters (floors) of similar patients. A warehouse may have cluster (pallets) of inventory parts.
Forest sections may be viewed as clusters to be sampled for disease or timber growth rates.

Fig. 1.3: Two-stage cluster sampling

Example 1.6

Page 11 of 43
A study might plan to sample about 1% of the families in a city, using city block as clusters.
Using a map to identify city blocks, it could select a simple random sample of 1% of the
blocks and then sample every family on each block. A study of patient care in mental hospitals
in Ghana could first randomly sample mental hospitals (the clusters) and then collect data for
patients within these hospitals.
Example 1.7
What is the difference between a stratified sample and a cluster sample?
Solution
A stratified sample uses every stratum. The strata are usually groups we want to compare. By
contrast, a cluster sample uses a sample of the clusters, rather than all of them. In cluster
sampling, clusters are merely ways of easily identifying groups of subjects. The goal is not to
compare the clusters but to use them to obtain a sample. Most clusters are not represented in
the eventual sample.
(v) Multi-stage Sampling
A random sample of a population of interest often incurs considerable expense in collecting
the data from a wide area. A cheaper solution is to use multi-stage sampling which starts by
dividing the country into a number of regions. Some of these are selected at random and
subdivided further, e.g. into rural, suburban and inner city areas. Again, some of these are
selected at random and subdivided again, e.g. into parliamentary wards and a further random
selection made. The process can be repeated until individual households or companies or units
of interest are identified.
The Family Expenditure Survey makes use of multi-stage sampling. The Survey uses the
Small Users File of Postcode Address File and the primary sampling unit is postal sectors.
The benefit of this approach is that the resulting samples are concentrated in relatively few
geographical areas which reduces the cost of data collection.
2. Non-probability sampling
Non-probability sampling designs select samples with features not embodying randomness.
The selection of the elements in the sample lies solely on personal judgement. The chance of
selecting an element cannot be determined. For this reason, there is no means of measuring
the risk of making erroneous conclusion desired from non-probability samples. Thus the
reliability of results (i.e. sampling errors) cannot be assessed and also used to make valid
conclusions about the population. The main methods of non-probability sampling are
Convenience, Judgemental and Quota Sampling
(i) Convenience sample
The sole virtue of convenience sampling is that it is quick. The idea is to grab whatever
sample is handy. The convenience sample is simply one that happens to come your way. An
accounting professor who wants to know how many MBA students would take a summer
elective in international accounting can just survey the class she is currently teaching. The
students polled may not be representative of all MBA students, but an answer (although
imperfect) will be available immediately.

Page 12 of 43
A newspaper reporter doing a story on perceived airport security might interview co-workers
who travel frequently. An executive might ask department heads if they think non-business
Web surfing is widespread.
You might think that convenience sampling is rarely used or, when it is, that the results are
used with caution. However, this does not appear to be the case. Since convenience samples
often sound the first alarm on timely issue, their results have a way of attracting attention and
have probably influenced quite a few business decisions. The mathematical properties of
convenience samples are unknowable, but they do serve a purpose and their influence cannot
be ignored.
(ii) Judgment sample
Judgment sampling is a non-probability sampling method that relies on the expertise of the
sampler to choose items that are representative of the population. The sample obtained by this
method is based on personal judgment and some pre-knowledge of the population. For
example, to estimate the corporate spending on research and development (R&D) in the
medical equipment industry, we might ask an industry expert to select several “typical” firms.
Unfortunately, subconscious biases can affect expert, too. In this context, “bias” does not
mean prejudice, but rather non-randomness in the choice. Judgment samples may be the best
alternative in some cases, but we can’t be sure whether the sample was random.
(iii) Quota Sampling
Quota sampling is a special kind of judgment sampling, in which the interviewer chooses a
certain number of people in each category (e.g., men/women). Quota sampling involves first
classification of the population into non-overlapping sub populations, called strata. The
sample is then obtained by selecting the individual elements from each stratum based on a
specified quota. In quota sampling the selection of the sample is made by the interviewer,
who has been given quotas to fill from specified sub-groups of the population. For example,
an interviewer may be told to sample 50 females between the ages of 45 and 60.
Since the selection of the sample is non–random, the enumerator is allowed to use his/her
own judgement to meet the various quotas. This introduces a large degree of biasness. The
lack of randomness is, however, compensated for by less cost and administrative
convenience.
Sampling with or without replacement
Consider the lottery system on page 9. If an item selected is put in the box before taking
another item, we are sampling with replacement. Using the box analogy, if we throw each
item back in the bowl and stir the contents before the next draw, an item can be chosen again.
Duplicates are unlikely when the sample size n is much smaller than the population size N.
People instinctively prefer sampling without replacement because drawing the same item
more than once seems to add nothing to our knowledge. However, using the same sample
item more than once does not introduce any bias (i.e. no systematic tendency to over or
underestimate whatever parameter we are trying to measure).
Computers and statistical analysis
The recent widespread use of computers has had a tremendous impact on statistical analysis.
Computers can perform more calculations faster and far more accurately than can human

Page 13 of 43
technicians. The use of computers makes it possible for investigators to devote more time to
the improvement of the quality of raw data and the interpretation of the results.
The current prevalence of microcomputers and the abundance of statistical software packages
have further revolutionized statistical computing. The researcher in search of a statistical
software package will find the book by Woodward et al. (1987) extremely helpful. This book
describes approximately 140 packages. Among the most prominent ones are: Statistical
Package for the Social Sciences (SPSS), S-plus, MINITAB, SAS and GENSTAT. The
spreadsheet, Excel, also has facilities for statistical analysis.

Page 14 of 43
MODULE TWO
DESCRIPTIVE STATISTICS
We have seen that statistical methods are descriptive or inferential. The purpose of descriptive
statistics is to summarize data to make it easier to assimilate the information. In this module,
we present basic methods of descriptive statistics.

Frequency distribution
Table 2.1 gives the number of children per family for 54 families selected from Obo, a town
in Ghana. The data, presented in this form in which it was collected, is called raw data.

From Table 2.1, it can be seen that, the minimum and the maximum numbers of children per
family are 0 and 4, respectively. Apart from these numbers, it is impossible, without further
careful study, to extract any exact information from the data. By breaking down the data into
the form of Table 2.2, however, certain features of the data become apparent. For instance,
from Table 2.2, it can easily be seen that, most of the 54 families selected have two children.
This information cannot easily be obtained from the raw data in Table 2.1.

Table 2.2 is called a frequency table or a frequency distribution. It is so called because it

gives the frequency or number of times each observation occurs. Thus, by finding the
frequency of each observation, a more intelligible picture is obtained.
The steps for constructing a frequency distribution may be summarized as follows:
(i) List all values of the variable in ascending order of magnitude.
(ii) Form a tally column, that is, for each value in the data, record a stroke in the tally column
next to that value. In the tally, each fifth stroke is made across the first four. This makes it
easy to count the entries and enter the frequency of each observation. (Note: Values with
frequency zero are omitted.)
(iii) Check that the frequencies sum to the total number of observations.
Grouped frequency distribution.

Page 15 of 43
Table 2.3 gives the body masses of 22 patients, measured to the nearest kilogram.

It can be seen that the minimum and the maximum body masses are 42 kg and 83 kg,
respectively. A frequency distribution giving every body mass between 42 kg and 83 kg
would be very long and would not be very informative. The problem is overcome by grouping
the data into classes. If we choose the classes 41 – 49, 50 – 58, 59 – 67, 68 – 76 and 77 – 85,
we obtain the frequency distribution given in Table 2.4.

These are, of course, not the only classes which could be chosen. Table 2.4 gives the
frequency of each group or class; it is therefore called a grouped frequency table or a grouped
frequency distribution. Using this grouped frequency distribution, it is easier to obtain
information about the data than using the raw data in Table 2.3. For instance, it can be seen
from Table 2.4, that 17 of the 22 patients have body masses between 50 kg and 76 kg (both
inclusive). This information cannot easily be obtained from the raw data in Table 2.3.
It should be noted that, even though Table 2.4 is concise, some information is lost. For
example, the grouped frequency distribution does not give us the exact body masses of the
patients. Thus, the individual body masses of the patients are lost in our effort to obtain an
overall picture. However, Table 2.4 is far more comprehensible, and its contents are easier to
grasp than Table 2.3.
We now define the terms that are used in grouped frequency tables.
(i) Class limits
The intervals into which the observations are put are called class intervals. The end points of
the class intervals are called class limits. For example, the class interval 41 – 49, has lower
class limit 41 and upper class limit 49.
(ii) Class boundaries
The raw data in Table 2.3 were recorded to the nearest kilogram. Thus, a body mass of 49.5
kg would have been recorded as 50 kg, a body mass of 58.4 kg would have been recorded as
58 kg, while a body mass of 58.5 kg would have been recorded as 59 kg. It can therefore be
seen that, the class interval 50 – 58, consists of measurements greater than or equal to 49.5
Page 16 of 43
kg and less than 58.5 kg. The numbers 49.5 and 58.5 are called the lower and upper
boundaries of the class interval 50 – 58. The class boundaries of the other class intervals are
given in Table 2.5.

How to Obtain the Lower and Upper Class Boundaries.

Lower Class Boundary: To obtain, deduct 0.5 from the lower class Interval
Upper Class boundary: To obtain this, add 0.5 to the upper class interval.
(iii) Class mark
The mid-point of a class interval is called the class mark or class mid-point of the class
interval. It is the average of the upper and lower class limits of the class interval. It is also the
average of the upper and lower class boundaries of the class interval. For example, in Table
2.5, the class mark of the third class interval was found as follows:
1 1
class mark = 2 (59+ 67) = (58.5+ 67.5) = 63.
2

(iv) Class width

The difference between the upper and lower class boundaries of a class interval is called the
class width of the class interval. Class widths of class intervals can also be found by
subtracting two consecutive lower class limits, or by subtracting two consecutive upper
class limits. In particular:
The width of the ith class interval is the numerical difference between the upper class limits
of the ith and the (i -1)th class intervals (i = 2, 3, …). It is also the numerical difference between
the lower class limits of the ith and the (i+1)th class intervals (i = 1, 2, …).
In Table 2.5, the width of the first class interval is |41 -50 | = 9. This is the numerical difference
between the lower-class limits of the first and the second class intervals. The width of the
second class interval is |50 - 59|= 9. This is the numerical difference between the lower-class
limits of the second and the third class intervals. It is also equal to |58 - 49| the numerical
difference between the upper-class limits of the first and the second class intervals.

Page 17 of 43
Relative frequency
It is sometimes useful to know the proportion, rather than the number, of values falling within
a particular class interval. We obtain this information by dividing the frequency of the
particular class interval by the total number of observations. We refer to the proportion of
values falling within a class interval as the relative frequency of the class interval. In Table
687
2.13, the relative frequency of the first class interval is 3088 = 0.2225
since the class frequency is 687 and the sum of the frequencies is 3088. Note that relative
frequencies must add up to 1, allowing for rounding errors.

Cumulative frequency
In many situations, we are not interested in the number of observations in a given class
interval, but in the number of observations which are less than (or greater than) a specified
value. For example, in Table 2.5, on page 26, it can be seen that 3 patients have body masses
less than 49.5 kg and 9 patients (i.e. 3 + 6) have body masses less than 58.5 kg. These
frequencies are called cumulative frequencies. A table of such cumulative frequencies is
called a cumulative frequency table or cumulative frequency distribution.
Table 2.14 shows the data in Table 2.5 along with the cumulative frequencies and the relative
frequencies. Notice that the last cumulative frequency is equal to the sum of all the
frequencies.

Example 2.2
Table 2.15 gives the ages of a sample of patients who attended Hope Medical Hospital.
(a) Find the sample size. (b) Complete the blank cells.

Page 18 of 43
Solution
(a) If the sample size is n, then the relative frequency of the second class interval is 8 ÷ n.
Hence, n is a root of the equation.
8 8
= 0.16 ≡ = n = 50
𝑛 0.16
The sample size is 50.
(b) Table 2.16 gives the completed blank cells.

Notice again that:

(i) the last cumulative frequency is equal to the sum of all the frequencies;
(ii) relative frequencies must add up to 1, allowing for rounding errors.

Page 19 of 43
MODULE THREE
GRAPHICAL REPRESENTATION OF DATA
In the last module, we found that information given in a frequency distribution is easier to
interpret than raw data. Information given in a frequency distribution in a tabular form is
easier to grasp if presented graphically. Many types of diagrams are used in statistics,
depending on the nature of the data and the purpose for which the diagram is intended. In this
module, we discuss how statistical data can be presented by histograms and cumulative
frequency curves and et cetera.
Histogram
A histogram consists of rectangles with:
(i) bases on a horizontal axis, centres at the class marks, and lengths equal to the class widths,
(ii) areas proportional to class frequencies.
If the class intervals are of equal size, then the heights of the rectangles are proportional to
the class frequencies, and it is then customary to take the heights of the rectangles numerically
equal to the class frequencies.
If the class intervals are of different widths, then the heights of the rectangles are proportional
𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
to . This ratio is called frequency density.
𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ

Example 2.3
Table 2.17 shows the distribution of the heights of 40 students selected from St. Paul High
School. Draw a histogram to represent the data.
Table 2.17: Heights of students

Solution
Since the class intervals have different sizes, the heights of the rectangles of the histogram
are proportional to the frequency densities of the class intervals. The calculations of the
heights of the rectangles can be set up as shown in Table 2.18. If di is the frequency density
of class interval cdi, then the height of the rectangle representing this class interval is where
c is any positive number (see Table 2.18, column 5). Fig. 2.1 shows a histogram for the data.
It was drawn by taking c = 5. Notice that the centres of the bases of the rectangles of the
histogram are at the class marks. If preferred, a histogram may be drawn showing class
boundaries instead of class marks. i d, i cd
Table 2.18: Work table for computing the heights of rectangles of a histogram

Page 20 of 43
Fig. 2.1: Histogram of the data in Table 2.17
Example 2.4
Table 2.19 shows the distribution of ages of 168 diabetic patients selected from Progress
Hospital. A histogram is drawn to represent the data. If the height of the rectangle representing
the second class interval is 3 cm, find the height of the rectangle which represents the third
class interval.
Table 2.19: Ages of diabetic patients

Solution
The calculations of the heights of the rectangles of the histogram can be set up as shown in
Table 2.20. Notice that the heights of the rectangles are proportional to the frequency densities
of the class intervals (see Table 2.20, column 5).
Table 2.20: Work table for computing the heights of rectangles of a histogram

If the height of the rectangle representing the second class interval is 3 cm, then
3
2.4c = 3 ⇒ c = 2.4 = 1.25
The height of the rectangle which represents the third class interval is
4.8c cm = 4.8 × 1.25 cm = 6 cm.
Drawing a histogram
1. When drawing a histogram, suitable scales must be chosen for both the vertical and
horizontal axes. Scales like “2 cm to 5 units” or “2 cm to 10 units” are the best. Avoid using
scales like “2 cm to 3 units” or “2 cm to 7 units”.

Page 21 of 43
2. Label the axes.
3. Give your graph a title.

Cumulative frequency curve

A graph obtained by plotting a cumulative frequency against the upper class boundary and
joining the points by a smooth curve, is called a cumulative frequency curve. The following
example illustrates an application of a cumulative frequency curve.
Example 2.5
Table 2.21 shows the frequency distribution of the body masses of 50 AIDS patients.
(a) Construct a cumulative frequency curve to represent the data.
(b) Use your cumulative frequency curve to estimate the number of patients whose body
masses are: (i) less than 65 kg, (ii) at least 75 kg.

Table 2.21: Body masses of 50 AIDS patients

Solution
(a) Table 2.22 gives the cumulative frequency distribution of the data in Table 2.21.

Table 2.22: Cumulative frequency distribution of the data in Table 2.21

Notice that a class with frequency zero is added before the first class. It can be seen that the
last cumulative frequency is equal to the total number of observations, a check on the accuracy
of our calculation. The corresponding cumulative frequency curve is shown in Fig. 2.2 on
page 40. The curve is obtained by marking the upper class boundary on the horizontal axis
and the cumulative frequencies on the vertical axis. All the points are joined by a smooth
curve.

Page 22 of 43
Fig. 2.2: Cumulative frequency curve of the data in Table 2.21
(b) (i) Since the body masses of the patients are recorded to the nearest integer, body masses
less than 65 kg consist of all body masses less than 64.5 kg. Therefore, to estimate the number
of patients whose body masses are less than 65 kg, we obtain the cumulative frequency which
corresponds to the point 64.5 kg on the horizontal axis. From Fig. 2.2, we find that 33 patients
have body masses less than 65 kg.
(ii) To estimate the number of patients whose body masses are at least 75 kg, we first estimate
the number of patients whose body masses are less than 75 kg. Now, the upper boundary of
the interval “less than 75” is 74.5. From Fig. 2.2, the cumulative frequency which corresponds
to the point 74.5 kg on the horizontal axis, is 44. It follows that 44 patients have body masses
less than 75 kg. Thus, the number of patients whose body masses are at least 75 kg is (50 –
44) = 6.

Frequency polygon
A grouped frequency table can also be represented by a frequency polygon, which is a special
kind of line graph. To construct a frequency polygon, we plot a graph of class frequencies
against the corresponding class mid-points and join successive points with straight lines. Fig.
2.3, on the next page, shows the frequency polygon for the data in Table 2.16, on page 33.

Page 23 of 43
Fig. 2.3: Frequency polygon of the data in Table 2.16
Notice that the polygon is brought down to the horizontal axis at the ends of points that would
be the mid-points if there were additional class intervals at each end of the corresponding
histogram. This makes the area under a frequency polygon equal to the area under the
corresponding histogram.
Fig. 2.4 shows the frequency polygon of Fig. 2.3 superimposed on the corresponding
histogram. This figure allows us to see, for the same set of data, the relationship between the
two graphic forms.

Page 24 of 43
Fig. 2.4: Histogram and frequency polygon of the data in Table 2.16

Stem-and-leaf plot
A stem-and-leaf plot is a graphical device that is useful for representing a relatively small set
of data which takes numerical values. To construct a stem-and-leaf plot, we partition each
measurement into two parts. The first part is called the stem, and the second part is called the
leaf. The stem of a measurement consists of one or more of the remaining digits. The stems
form an ordered column with the smallest stem at the top and the largest at the bottom. The
stems are separated from their leaves by a vertical line. We include in the stem column all
stems within the range of the data even when a measurement with that stem is not in the data
set. The rows of a stem-and-leaf plot contain the leaves, ordered and listed to the right of their
respective stems. When leaves consist of more than one digit, all digits after the first may be
omitted. Decimals, when present in the original data, are omitted in a stem-and-leaf plot.

A stem-and-leaf plot conveys similar information as a histogram. Turned on its side, it has
the same shape as the histogram. In fact, since the stem-and-leaf plot shows each observation,
it displays information that is lost in a histogram. A properly constructed stem-and-leaf plot,
like a histogram, provides information regarding the range of the data set, shows the location
of the highest concentration of measurements, and reveals the presence or absence of
symmetry. An advantage of a stem-and-leaf plot over a histogram, is the fact that it preserves
Page 25 of 43
the information contained in the individual measurements. Such information is lost when we
construct a grouped frequency table. Another advantage of a stem-and-leave plot is that it can
be constructed during the tallying process, so the intermediate step of preparing an ordered
array is eliminated.

Stem-and-leaf plots are useful for quick portrayal of a small data set. As the sample size
increases, you can accommodate the increase in leaves by splitting the stems. For instance,
you can list each stem twice, putting leaves of 0 to 4 on one line and leaves of 5 to 9 on
another. When a number has several digits, it is simplest for graphical portrayal to drop the
last digit or two. For instance, for a stem-and-leaf plot of annual income in thousands of
dollars, a value of GH¢27.1 thousand has a stem of 2 and a leave of 7 and a value of GH¢106.4
thousand has a stem of 10 and a leaf of 6.
Example 2.6
The following are the marks scored by 30 candidates in an English test. Construct a stem-
and-leaf plot for the data.

Solution
Since all the measurements are two-digit numbers, we will have one-digit stems and one-digit
leaves. For example, the mark 85 has a stem of 8 and a leaf of 5. Fig. 2.5 is the required stem-
and-leaf plot. The four numbers in the first row represent 52, 53, 56 and 56.

Fig. 2.5: Stem-and-leaf plot of the data in Example 2.6

Bar chart
A bar chart is a diagram consisting of a series of horizontal or vertical bars of equal width.
The bars represent various categories of the data. There are three types of bar charts, and these
are simple bar charts, component bar charts and grouped bar charts.
(i) Simple bar chart
In a simple bar chart, the height (or length) of each bar is equal to the frequency it represents.
Example 2.7
Table 2.23 gives the production of timber in five districts of Ghana in a certain year. Draw a
bar chart to illustrate the data. The bars are separated to emphasize that the variable is
quantitative rather than quantitative.

Table 2.23: Production of timber in 5 districts in Ghana

Page 26 of 43
Solution
Fig. 2.6, on the next page, represents the required bar chart. Notice that the bars are of equal
width and the distances between them are equal. Since district is a nominal variable, there is
no particular natural order for the bars.

Fig. 2.6:A simple bar chart for the data in Table 2.23
(ii) Component bar chart
In a component bar chart, the bar for each category is subdivided into component parts; hence
its name. Component bar charts are therefore used to show the division of items into
components. This is illustrated in the following example.

Example 2.8
Table 2.24 shows the distribution of sales of agricultural produce from Asiedu Farm in 1995,
1996 and 1997. Illustrate the information with a component bar chart.

Table 2.24: Sales of agricultural produce from Asiedu Farm

Page 27 of 43
Solution
Fig. 2.7, on the next page, shows a component bar chart for the data. The sales of agricultural
produce consist of three components: the sales of coffee, cocoa, and palm oil. The component
bar chart shows the changes of each component over the years as well as the comparison of
the total sales between different years.

Fig. 2.7: A component bar chart of the data in Table 2.24

(iii) Grouped bar chart
For a grouped bar chart, the components are grouped together and drawn side by side. We
illustrate this with the following example.
Example 2.9
Illustrate the data in Table 2.24 with a grouped bar chart.
Solution
Fig. 2.8 shows the required grouped bar chart.

Page 28 of 43
Fig. 2.8:A grouped bar chart of the data in Table 2.24
Pie Charts
A pie chart is a circular graph divided into sectors, each sector representing a different value
or category. The angle of each sector of a pie chart is proportional to the value of the part of
the data it represents. The bar chart is more precise than the pie chart for visual comparison
of categories with similar relative frequencies.
The following are the steps for constructing a pie chart
(1) Find the sum of the category values.
(2) Calculate the angle of the sector for each category, using the following result:
𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝐴
angle of the sector for category A = 𝑠𝑢𝑚 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑣𝑎𝑙𝑢𝑒𝑠 x 3600
(3) Construct a circle and mark the centre.
(4) Use a protractor to divide the circle into sectors, using the angles obtained in step 2.
(5) Label each sector clearly.

Example 2.10
A housewife spent the following sums of money on buying ingredients for a family Christmas
cake in 2020.
Flour .................................... GH¢24
Margarine ............................ GH¢96
Sugar .................................... GH¢18
Eggs ..................................... GH¢60
Baking powder..................... GH¢12
Miscellaneous ...................... GH¢30
Represent the above information on a pie chart.

Solution
The angles of the sectors are calculated as shown in Table 2.25. Fig. 2.9, on the next page,
shows the required pie chart.

Table 2.25: Work table for computing the angles of the sectors of a pie chart

Page 29 of 43
Fig. 2.9: A pie chart of the data in Table 2.24

Page 30 of 43
Applications of Statistics
Statistics and Sociology
Sociology is one of the social sciences aiming to discover the basic structure of human
society, to identify the main forces that hold groups together or weaken them and to learn the
conditions that transform social life. It highlights and illuminates aspects of social life that
otherwise might be only obscurely recognized and understood. The sociologist may be called
upon for help with a special problem such as social conflict, urban plight or the war on poverty
or crimes. His practical contribution lies in the ability to clarify the underlaying nature of
social problems to estimate more exactly their dimensions and to identify aspects that seem
most amenable to remedy with the knowledge and skills at hand. He naturally lands in
sociological research which is the purposeful effort to learn more about society than one can
in the ordinary course of living. Keeping in view of the problem he sets forth his objectives
collects materials or data and uses statistical techniques and the knowledge and theory already
established on similar topics to achieve his objectives. So statistical data and statistical
methods are quite indispensable for sociological research studies. There is a growing
emphasis recently on social survey methods or research methodology in all faculties of arts.
Sociologists seek the help of statistical tools to study cultural change in the society, family
pattern, prostitution, crime, marriage system etc. They also study statistically the relation
between prostitution and poverty, crime and poverty, drunkenness and crime, illiteracy, and
crime etc. Thus, statistics is of immense use in various sociological studies.
Statistics and Government
The functions of a government are more varied and complex. Various depts in the state are
required to collect and record statistical data in a systematic manner for an effective
administration. Data pertaining to various fields namely population, natural resources,
production both agricultural and industrial, finance, trade, exports and imports, prices, labour,
transport and communication, health, education, defence ,crimes etc are the most fundamental
requirements of the state for its administration. It is only on this basis of such data; the
government decides on the priority areas, gives more attention to them through target-oriented
programmes and studies the impact of the programmes for its future guidelines.

Statistics and Planning

Modern age is an age of planning and statistics are indispensable for planning. According to
Tippett planning greater or lesser degree according to the government in power is the order
of the day and without statistics, planning is inconceivable. Based only on a correct
assessment of various resources both human and material of the country proper planning can
be made. A study of data relating to population, agriculture, industry, prices, employment,
health, education enables the planners to fix up time-bound targets on the social and economic
fronts evaluation of such economic and social programmes at different stages by means of
related data gathered continuously and systematically is also done to decide whether the
programmes are on towards the goal or targets set.
Statistics and Economics
In the fields of economics it is almost impossible to think of a problem which does not require
an extensive use of statistical data. Most of the laws in economics are based on a study of a
large number of units and their analysis is enabled by statistical data and the statistical
methods. The important economic aspects like production, consumption, exchange and
distribution are described, compared and correlated with the aid of statistical tools. By a
statistical study of time series on prices, sales, production one can study their trends,

Page 31 of 43
fluctuations and the underlaying causes. Thus, statistics is indispensable in economic
analysis.
DATA CLASSIFICATION
(a) Primary vs. Secondary Data
Data can be collected in two different ways. One way is to collect data directly from the
respondent. The person who answers the questions of the investigator is called respondent.
Statistical information thus collected is called primary data and the source of such information
is called primary source. This data is original because it is collected for the first time by the
investigator himself. For example, if the investigator collects the information about the
salaries of National Institute of Open Schooling employees by approaching them, then it is
primary data for him. Another way is to adopt the data already collected by someone else.

The investigator only adopts the data. Statistical information thus obtained is called secondary
data. The source of such information is called secondary source. For example, if the
investigator collects the information about the salaries of employees of National Institute of
Open Schooling from the salary register maintained by its accounts branch, then it is
secondary data for him.

(b) Methods for collecting primary data

There are several methods for collecting primary data. Some of which are:
1. Direct personal interview: In this method investigator (also called interviewer) has to be
face-to-face with the person from whom he wants information. The person from whom this
information is collected to called respondent.
2. Indirect oral investigation: Under this method data are collected through indirect sources.
Under this method questions relating to the inquiry are put to different persons and their
answers are recorded. This method is most suitable when the person from whom the
information is sought is either unavailable or unwilling.
3. Questionnaire method: In this method a list of questions called questionnaire is prepared
and sent to respondents either through post or given personally to them. This method is
suitable where the field of inquiry is wide. There are some advantages of using primary data.
The investigator can collect the data according to his requirement. It is reliable and sufficient
for the purpose of investigation. However, it suffers from disadvantages also in that it involves
a lot of cost in terms of money, time and energy. This make unsuitable when field of enquiry
is very large. Many a times with some modifications, same purpose may be served by using
data collected by other persons or agencies.

(c) Sources of secondary data: As already discussed secondary data are not collected by the
investigator himself but they are obtained by him from other source. Broadly, there area two
sources:
(a) Published data and
(b) Unpublished data.
I. Published Sources: There are certain agencies which collect the data and publish them in
the form of either regular journals or reports. These agencies/sources are known as published
sources of data.
Sources of secondary macro-economic data in Nigeria include
(i) Governmental Agencies: National Bureau for Statistics (NBS), Central Bank of
Nigeria (CBN), National Agricultural Extension and Research Liaison Services
(NAERLS) etc.;

Page 32 of 43
(ii) (ii) International Agencies: The World Bank, IMF, FAO, IFDC etc.; and several
non-Governmental
II. Unpublished Sources: Secondary data are also available from unpublished sources,
because all statistical data is not always published. For example, information recorded in
various government and private offices, studies made by research scholars etc. can be
important sources of secondary data.

ORGANISING AND CONDENSING DATA

Suppose a psychological investigator wants to analyse the marks obtained by 40 students in
a class. He collects data and finds that marks obtained by 40 students in the class are:
20 25 28 27 34 31 30 32 33 40
43 43 40 43 42 43 42 45 43 47
48 46 47 48 46 49 58 54 56 50
53 51 39 38 36 38 35 35 37 40
Put yourself in the position of investigator. In which aspect of this data you will be interested?
Perhaps you would be interested in knowing the highest marks obtained by any student. You
may also be interested to know the lowest marks obtained by a student. Another point of
interest can be the marks level around which most of the students have obtained.

The above data are unorganized. To refine this data for comparison and analysis it should be
arranged in an orderly sequence or into groups on the basis of some similarity. This whole
process of arranging and grouping the data into some meaningful arrangement is a first step
towards analysis of data. Data can be arranged in two forms:
(a) Arrays and
(b) Frequency distributions.

(a) Arrays
A method of presenting an individual series is a simple array of data. An orderly arrangement
of raw data is called ‘Array’. Arrays are of two types:
(i) Simple array, and
(ii) Frequency array.

(i) Simple Array: A simple array is an arrangement of data in ascending or descending order.
Let us construct the simple arrays of the data about the marks of 40 students. The data in table
6.1 is arranged in ascending order and in table 6.2 in descending order.

Table 6.1: Ascending Array of the Marks obtained by 40 students in class:

20 35 42 47
25 36 43 48
27 37 43 48
28 38 43 49
30 38 43 50
31 39 43 51
32 40 45 53
33 40 46 54
34 40 47 56
35 42 47 58

Table 6.2: Descending Array of the Marks obtained by 40 students in class.

Page 33 of 43
58 47 42 35
56 47 40 34
54 46 40 33
53 45 40 32
51 43 39 31
50 43 38 30
49 43 38 28
48 43 37 27
48 43 36 26
47 42 35 20

The above arrays reveal information on two points clearly. One, the highest marks obtained
by any student are 58. Two, the lowest marks obtained by any student are 20.

Organising the data in the form of simple array is convenient if number of items is small. As
the number of items increase the series becomes too long and unmanageable. As such there
is need to condense data. Making a frequency array is one method of condensing data.

(ii) Frequency Array: Frequency array is a series formed on the basis of frequency with
which each item is repeated in series. The main steps in constructing frequency array are:

1. Prepare a table with three columns-first for values of items, second for tally sheet and third
for corresponding frequency. Frequency means the number of times a value appears in a
series. For example, in table 6.1 the marks 43 appears five times. So, frequency of 43 is 5.
2. Put the items in first column in a ascending order in such a way that one item is reordered
once only.
3. Prepare the tally sheet in second column marking one bar for one item. Make blocks of five
tally bars to avoid mistake in counting. Note that every fifth bar is shown by crossing the
previous four bars like e.g., ////.
4. Count the tally bars and record the total number in third column. This column will represent
the frequencies of corresponding items. Let us now explain construction of frequency array
of the marks obtained by 40 students. In table 6.3 data about the marks is arranged in an
ascending order in first column. It helps to find not only the maximum and minimum values
but also makes it easy to draw bars.

Now for each mark level make one bar (/) in second column and cross the item from the data.
Table 6.3 Frequency array of marks obtained by 40 students.

Marks(X) Tally Sheet Frequency

20 1
25 1
27 1
28 1
30 1
31 1
32 1
33 1
34 1

Page 34 of 43
35 2
36 1
37 1
38 2
39 1
40 3
41 2
42 2
43 5
45 1
46 2
47 2
48 1
49 1
50 1
51 1
53 1
54 1
56 1
58 1
Total Frequency =40

The main limitations of frequency array is that it does not give the idea of the characteristics
of a group. For example, it does not tell us that how many students have obtained marks
between 40 and 45. Therefore it is not possible to compare characteristics of different groups.
This limitation is removed by frequency distribution.

FREQUENCY DISTRIBUTION
Data in a frequency array is ungrouped data. To group the data, we need to make a ‘frequency
distribution’. A frequency distribution classifies the data into groups. For example, it tells us
how many students have secured marks between 40 and 45.

Before constructing frequency distribution, it is necessary to learn the following important

concepts (see tables 6.4 and 6.5) :

1. Class : Class is a group of magnitudes having two ends called class limits. For example,
20-25, 25-30 etc. or 20-24, 25-29 etc. as the case may be, each represents a class.
2. Class Limits : Every class has two boundaries or limits called lower limit (L1) and upper
limit (L2). For example in the class (20-30) L1 = 20 and L2 = 30.
3. Class Interval : The difference between two limits of a class is called class interval. It is
equal to upper limit minus lower limit. It is also called class width. Class interval = L2 – L1.
For 30 – 20 =10.
4. Class Frequency : Total number of items falling in a class that is having the value within
L1 and L2 is class frequency. For example, in table 6.4 class frequency in class (40-45) is 10.
Similarly in class (50-55) the frequency is 4.
5. Mid-Point/Mid-Value(M.V.) : The mid-value of the class interval of a class also called as
mid-point is obtained by dividing the sum of lower limit and upper limit of the class by 2. It
is the average value of two limits of a class. It falls just in the middle of a class is

Page 35 of 43
𝐿1 + 𝐿2 20+30
M.V. = For example, the mid-value of class (20-30) is = 25
2 2

Construction of Frequency Distribution

Frequency distributions can be constructed in many ways. We will explain here the
construction of the following types:
(a) Exclusive series
(b) Inclusive series
(c) Open end classes
(d) Cumulative frequency
While constructing frequency distribution same steps are to be taken which we have followed
in the frequency array. The only difference is that we record classes like (20-25), (25-30),
(30-35)….(55-60) etc., in first column in place of absolute items like 20, 25,..56,58 etc.
(a) Exclusive series: In this type one of the class limits (generally upper limit L2) is excluded
while making a tally sheet. Any item having the value equal to the upper limit of a class is
counted in the next class. For example, in a class of (20-25) all items having the value of 20
and more but less than 25 will be counted in this class.

Item having the value of 25 will be counted in next class of (25-30) as is clear from the
following example, Using the same data as given in making a frequency array and taking
class interval of 5, a frequency distribution of exclusive type will be as under:

(b) Inclusive Series : In this type the lower limit of next class is increased by one over the
upper limit of previous class. Both the items having value equal to lower and upper limit of a
class are counted or included in the same class. That is why such a frequency distribution is
called inclusive type. For example in the class (20-24) both 20 and 24 will be included in the
same class. Similarly in the class (40-44) both 40 and 44 will be included. The following table
has been formed on the basis of same data as taken in the exclusive type.

Table 6.5: Construction of Frequency Distribution – “Inclusive Type”

Page 36 of 43
(c) Open-end Classes : Open-end frequency distribution is one which has at least one of its
ends open. You will observe that either lower limit of first class or upper limit of last class or
both are not given in such series. In table 6.6 the first class and the last class i.e. below 25 and
55 and above are open-end classes.

Table 6.6: Open-end Classes Frequency Distribution

(d) Unequal Classes : In case of unequal classes frequency distribution, the width of different
classes (i.e. L2-L1) need not be the same. In table 6.7, the class (30 – 40) has width 10 while
the class (40-55) has width 15.

Page 37 of 43
Table 6.7: Unequal Classes Frequency Distribution

(e) Cumulative Frequency: A ‘Cumulative Frequency Distribution’ is formed by taking

successive totals of given frequencies. This can be done in two ways: (i) From above, such as
1,4 (i.e. 1+3), 9(i.e. 4+5), 16 (i.e. 9+7), and so on.
Such a distribution is called ‘Less-than’ cumulative frequency distribution. It shows the total
numbers of observations (frequencies) having less than a particular value of the variable (here
marks). For example, there are 4 (i.e., 1+3) students who got marks less than 30; 9 (i.e. 4+5)
students who got marks less than 35 and so on. Table 6.8 gives the less-than cumulative
frequency distribution.

Table 6.8: ‘Less-than’ Cumulative Frequency Distribution

(ii) From below, such as 2,6 (i.e. 2 + 4), 14 (i.e. 6+8), 24 (i.e. 14 + 10) and so on. Such a
distribution is called ‘More-than’ cumulative frequency distribution. It shows the total
number of observations (frequencies) having more than a particular value of the variable (here

Page 38 of 43
marks). For example, there are 6 (i.e. 2 + 4) students who got marks more than 50, 14 (i.e. 2
+ 4 + 8) students who got marks more than 45 etc. See table 6.9.

Table 6.9: ‘More-than’ Cumulative Frequency Distribution

Page 39 of 43
SELF REVIEW QUESTIONS
1. For each of the following variables, state whether it is quantitative or qualitative and specify
the measurement scale that is employed when taking measurements on each.
(a) gender of babies born in a hospital, (b) marital status,
(c) temperature measured on the Kelvin scale, (d) nationality,
(e) masses of babies in kg, (f) temperature in 0C,
(g) prices of items in a shop, (h) position in an exam.
(i) the rank of an academic staff in a university.
2. For each of the following situations, answer questions (a) through (d):
(a) What is the variable in the study? (b) What is the population?
(c) What is the sample size? (d) What measurement scale was used?
A. A study of 150 students from St. Ann School, showed that 10% of the students had blood
group A.
B. A study of 100 patients admitted to St. Paul’s Hospital, showed that 25 patients lived 8 km
from the hospital.
C. A study of 50 teachers in Town A showed that 5% of the teachers earn N8000.00 per
month.
3. A team of ornithologist is doing field research by using a mist net to capture migrating
birds. They collect the following information:
(a) Species, (b) Weight (c) Wing span (d) Condition, either poor, fair, good, or excellent,
(e) Band ID number, (f) Approximate age.
Indicate whether each of these is an attribute measure or a variable measure.
4. Explain what is meant by inferential statistics.
5. Define the following terms:
(a) population, (b) qualitative variable,
(c) discrete variable, (d) sample,
(e) continuous variable, (f) quantitative variable.
6. For each of the following, indicate whether it is a discrete or a continuous variable.
(a) The number of minutes it takes to read a page in this text.
(b) The number of chapters in the text.
(c) The weight of the text.
(d) The number of problems in the text.
(e) The number of times the letter e appears on a page.
(f) The length of a page in inches.
7. Suppose that the following information is obtained from Ms Ofosu on her application for
a home mortgage following response, indicate whether it is a continuous variable and which
type of measurement scale it represents.
(a) Place of residence: in Accra.
(b) Type of residence: Single family home.
(c) Date of birth: August 13, 1966.
(d) Projected monthly payments: GH¢2 479.
(e) Occupation: Director of Food and Drug Board.
(f) Employer: Methodist University.
(g) Number of years at Job: 10.
(h) Annual income: GH¢140 000.
(i) Amount of mortgage requested: GH¢220 000.
8. Which scale of measurement (nominal, ordinal, or interval) is most appropriate for
(a) Attitude toward legalization of marijuana (favour, neutral, oppose).

Page 40 of 43
(b) Gender (male, female).
(c) Number of children in a family (0, 1, 2, …).
(d) Political party affiliation (APP, PDP, CPP).
(e) Religious affiliation (Catholic, Jewish, Protestant, Muslim, Others).
(f) Political philosophy (very liberal, somewhat liberal, moderate, somewhat conservative,
very conservative).
(g) Years of school completed (0, 1, 2, 3, …).
(h) Highest degree attained (none, high school, bachelor’s, master’s, doctorate).
(i) Employment status (employed, full time, employed part time, unemployed).
9. Give two reasons why it is sometimes necessary to take a sample from a population.
10. State two ways of obtaining primary data.
11. State two sources of secondary data.
12. State two advantages and two disadvantages of the lottery system for taking a simple
random sample from a population.
13. State two disadvantages and one advantage of telephone interview, as a means of
collecting data.
14. Briefly describe the difference between descriptive statistics and inferential statistics.
15. A doctor examined a patient to determine the cause of a disease. He took a drop of blood
and used it to determine the state of health of the patient. What aspect of statistics is the doctor
employing in order to form a judgement?
16. In your own words, explain and give an example of each of the following statistical terms:
(a) population, (b) sample.
17. Mrs. Akrong wants to check whether the pot of soup she is cooking has the right taste and
quantity of salt. She did this by tasting a small portion of the soup scooped in a ladle. What
aspect of statistics is she employing in order to form a judgement? Briefly explain why she
decided to use this particular method?
18. Explain the difference between qualitative and quantitative data. Give examples of
qualitative and quantitative data.
19. List the four levels of measurement and give examples.
20. Explain the difference between:
(a) nominal and ordinal data, (b) a census and a sample survey,
21. Clusters versus strata
(a) With a cluster random sample, do you take a sample of (i) the clusters? (ii) the subjects
within every cluster?
(b) With a stratified random sample, do you take a sample of (i) the strata? (ii) the subjects
within every stratum?
(c) Summarize the main differences between cluster sampling and stratified sampling in terms
of whether you sample the group or sample from within the group the form the clusters or
strata.
22. A class has 50 students. Use the column of the first two digits in the random number table
(Table 1.2) to select a simple random sample of three students. If the students are numbered
01 to 50, what are the numbers of the three students selected?
23. In cluster random sample with equal-sized clusters, every subject has the same chance of
selection. However, the sample is not a simple random sample. Explain why not.
[Link] following are the ages of 30 patients seen in the emergency room of a hospital on a
Monday night. Construct a stem-and-leaf plot for the data.

Page 41 of 43
25. The following table gives the ages (in years) of 60 cancer patients.

A histogram is drawn to represent this data. If the height of the rectangle representing the fifth
class interval is 2 cm, find the heights of the rectangles representing the first, second and the
third class intervals. Construct a histogram to represent the data.
26. The following table gives the distribution of the heights of 100 children, to the nearest
centimetre.

Draw a cumulative frequency curve for the data and use it to estimate:
(a) the number of children whose heights are between 142 cm and 152 cm (inclusive),
(b) the number of children whose heights are greater than 156 cm.
27. The following table gives the distribution of the marks scored by 40 students in an
examination.

Draw a cumulative frequency curve for the data and use it to estimate:
(a) the number of students who scored between 42% and 62% (inclusive),
(b) the least mark a student must score if he/she is to be placed in the top 25% of the class.
28. The following table gives the enrolments in primary and secondary schools in Kenya.

Illustrate the information with

(a) a component bar chart, (b) a grouped bar chart.
29. The heights, in centimetres, of 30 boys are as follows:

Construct a stem- and- leaf plot of the data (for a boy whose height is 162 cm, record this as
a stem of 16 and a leaf of 2.)

Page 42 of 43
30. The following table shows the amount of rainfall in Asarekrom during the first five
months of 2006. Construct a bar chart to illustrate the data.

31. The following table gives the frequency distribution of the results of an examination taken
by students from two schools M and N. Construct a grouped bar chart to represent this
information.

32. The following information gives the proportion in which Yaro spends his annual salary.

(a) Construct a pie chart to illustrate the above information.

(b) If Yaro’s annual salary is GH¢1 800.00, calculate the amount he spends on food.
[Link] between primary and secondary data. Describe the methods for collecting
primary data.
34. What is secondary data? Name some of its sources in Nigeria.
35. Distinguish between simple array and frequency array with examples.
36. On the basis of the following data about the wages of 20 workers in a factory, prepare a
frequency array; 450, 580,600, 480, 540, 620, 400, 475, 500, 480, 620, 480, 570, 600, 650,
410, 550, 600, 650, 450.
37. Explain the concept of ‘frequency distribution’. How is it different from ‘frequency
array.?
38. On the basis of data in question 4, prepare a frequency distribution by exclusive method.
39. Distinguish between ‘exclusive method’ and ‘inclusive method’ of frequency distribution
with examples.
40. Write short notes on: (a) Open-end frequency distribution. (b) Frequency distribution with
unequal classes. (c) Cumulative frequency distribution.

Page 43 of 43

Importance of Statistics in Health Care
No ratings yet
Importance of Statistics in Health Care
93 pages
Understanding Statistics in Data Analysis
No ratings yet
Understanding Statistics in Data Analysis
25 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
24 pages
Introduction to Advanced Statistics
No ratings yet
Introduction to Advanced Statistics
23 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
27 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
66 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
26 pages
Biostatistics Fundamentals and Applications
No ratings yet
Biostatistics Fundamentals and Applications
22 pages
Introduction to Statistics and Its Types
No ratings yet
Introduction to Statistics and Its Types
7 pages
Understanding Statistics: Definition & Importance
No ratings yet
Understanding Statistics: Definition & Importance
15 pages
Understanding Statistics: Definitions & Applications
No ratings yet
Understanding Statistics: Definitions & Applications
24 pages
Introduction to Probability and Statistics
No ratings yet
Introduction to Probability and Statistics
25 pages
Importance of Statistics in Business
No ratings yet
Importance of Statistics in Business
43 pages
Business Statistics Overview and Applications
No ratings yet
Business Statistics Overview and Applications
37 pages
Probability and Statistics
No ratings yet
Probability and Statistics
401 pages
Understanding Probability and Statistics
No ratings yet
Understanding Probability and Statistics
8 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
33 pages
Overview of Biostatistics Principles
No ratings yet
Overview of Biostatistics Principles
43 pages
Understanding Statistics in Data Management
No ratings yet
Understanding Statistics in Data Management
22 pages
Statistics Fundamentals for Data Science
No ratings yet
Statistics Fundamentals for Data Science
96 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
16 pages
Introduction to Business Statistics
No ratings yet
Introduction to Business Statistics
19 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
44 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
37 pages
Lesson 02 Introduction To Statistics
No ratings yet
Lesson 02 Introduction To Statistics
96 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
19 pages
Understanding Population and Sampling in Statistics
No ratings yet
Understanding Population and Sampling in Statistics
6 pages
Introduction to Statistics Course Overview
No ratings yet
Introduction to Statistics Course Overview
33 pages
What Is Statistics?: Mcgraw Hill/Irwin
No ratings yet
What Is Statistics?: Mcgraw Hill/Irwin
22 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
34 pages
Business Statistics Overview by J.K. Sharma
No ratings yet
Business Statistics Overview by J.K. Sharma
9 pages
Basic Concepts of Statistics Module
No ratings yet
Basic Concepts of Statistics Module
30 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
6 pages
Quantitative Methods
No ratings yet
Quantitative Methods
33 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
39 pages
Understanding Statistics in Singular Sense
No ratings yet
Understanding Statistics in Singular Sense
14 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
11 pages
Understanding Statistics: Concepts & Types
No ratings yet
Understanding Statistics: Concepts & Types
18 pages
Understanding Statistics in Business Decisions
No ratings yet
Understanding Statistics in Business Decisions
27 pages
Introduction to Basic Statistics Concepts
No ratings yet
Introduction to Basic Statistics Concepts
44 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
16 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
7 pages
Business Statistics Overview and Applications
No ratings yet
Business Statistics Overview and Applications
18 pages
Basic Statistical Concepts Overview
No ratings yet
Basic Statistical Concepts Overview
15 pages
Statistical Analysis Fundamentals
100% (1)
Statistical Analysis Fundamentals
6 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
26 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
11 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
122 pages
Introduction to Business Statistics
No ratings yet
Introduction to Business Statistics
32 pages
Business Statistics: Key Concepts & Methods
No ratings yet
Business Statistics: Key Concepts & Methods
49 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
22 pages
Understanding Statistics in Cricket Analysis
No ratings yet
Understanding Statistics in Cricket Analysis
28 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
207 pages
Understanding Statistics: Types and Uses
100% (1)
Understanding Statistics: Types and Uses
17 pages
Understanding Statistics: Concepts & Methods
No ratings yet
Understanding Statistics: Concepts & Methods
100 pages
Evolution of Statistical Definitions
No ratings yet
Evolution of Statistical Definitions
7 pages
QM Version 1.0
No ratings yet
QM Version 1.0
303 pages
Lesson 1
No ratings yet
Lesson 1
21 pages
Coaching Impact on Employee Performance
No ratings yet
Coaching Impact on Employee Performance
3 pages
Hypothesis Testing in A Level Maths
No ratings yet
Hypothesis Testing in A Level Maths
4 pages
Understanding t-Tests: Types and Analysis
No ratings yet
Understanding t-Tests: Types and Analysis
4 pages
Understanding Ridge Regression Techniques
No ratings yet
Understanding Ridge Regression Techniques
35 pages
Drug Efficacy and Statistical Tests
No ratings yet
Drug Efficacy and Statistical Tests
13 pages
ISYE 6501 Analytics Modeling Notes
No ratings yet
ISYE 6501 Analytics Modeling Notes
86 pages
OLS Regression Analysis and Forecasting
No ratings yet
OLS Regression Analysis and Forecasting
19 pages
Bayesian Unit Roots Significance Test
No ratings yet
Bayesian Unit Roots Significance Test
15 pages
Understanding t-Tests in Experimental Design
No ratings yet
Understanding t-Tests in Experimental Design
29 pages
Wind Turbine RPM Calculation
No ratings yet
Wind Turbine RPM Calculation
50 pages
Statistical Hypothesis Testing - Wikipedia
No ratings yet
Statistical Hypothesis Testing - Wikipedia
17 pages
ST104A Statistics 1 Exam Paper
No ratings yet
ST104A Statistics 1 Exam Paper
21 pages
Statistical Tables Overview
No ratings yet
Statistical Tables Overview
26 pages
Impact of Spending on Math Performance
No ratings yet
Impact of Spending on Math Performance
9 pages
Data Visualization and Analysis Assignment
No ratings yet
Data Visualization and Analysis Assignment
3 pages
Midterm Exam: Statistics & Probability
No ratings yet
Midterm Exam: Statistics & Probability
6 pages
Frequency Estimation in Laplace Noise
No ratings yet
Frequency Estimation in Laplace Noise
5 pages
Regression Analysis of Sales Data
No ratings yet
Regression Analysis of Sales Data
35 pages
Assignment of Correlation
No ratings yet
Assignment of Correlation
3 pages
Probability Analysis of Games and Bird Sightings
No ratings yet
Probability Analysis of Games and Bird Sightings
19 pages
Age and Diastolic BP Analysis in R
No ratings yet
Age and Diastolic BP Analysis in R
6 pages
Pengaruh Supervisi dan Kedisiplinan Guru
No ratings yet
Pengaruh Supervisi dan Kedisiplinan Guru
10 pages
Dummy Variable Technique in Event Studies
No ratings yet
Dummy Variable Technique in Event Studies
7 pages
Time Series Analysis Overview 2022
No ratings yet
Time Series Analysis Overview 2022
59 pages
Confidence Intervals Explained
100% (1)
Confidence Intervals Explained
31 pages
Inconsistencies in PD Modeling Explained
No ratings yet
Inconsistencies in PD Modeling Explained
5 pages
Statistics Homework: Hypothesis Testing
No ratings yet
Statistics Homework: Hypothesis Testing
3 pages
Statistical Analysis of Financial Variables
No ratings yet
Statistical Analysis of Financial Variables
12 pages
Trung bình điểm SAT các trường cao đẳng
No ratings yet
Trung bình điểm SAT các trường cao đẳng
11 pages
STAT 100 Formula Sheet Overview
No ratings yet
STAT 100 Formula Sheet Overview
6 pages

Understanding Statistics: Scope & Types

Uploaded by

Understanding Statistics: Scope & Types

Uploaded by

MODULE ONE

SCOPE AND ORIGIN OF STATISTICS

Descriptive statistics consists of methods dealing with the collection, tabulation,

Census, parameter and statistic

A census is an enumeration or evaluation of every member of a population.

Why study statistics?

Variables and types of variables

Levels of measurement and measurement scales

Summary of types of variables

Fig. 1.1:Types of variables

Tables of random numbers

Fig. 1.3: Two-stage cluster sampling

Table 2.2 is called a frequency table or a frequency distribution. It is so called because it

How to Obtain the Lower and Upper Class Boundaries.

(iv) Class width

Notice again that:

Cumulative frequency curve

Table 2.21: Body masses of 50 AIDS patients

Table 2.22: Cumulative frequency distribution of the data in Table 2.21

Fig. 2.5: Stem-and-leaf plot of the data in Example 2.6

Table 2.23: Production of timber in 5 districts in Ghana

Table 2.24: Sales of agricultural produce from Asiedu Farm

Fig. 2.7: A component bar chart of the data in Table 2.24

Statistics and Planning

(b) Methods for collecting primary data

ORGANISING AND CONDENSING DATA

Table 6.1: Ascending Array of the Marks obtained by 40 students in class:

Table 6.2: Descending Array of the Marks obtained by 40 students in class.

Marks(X) Tally Sheet Frequency

Before constructing frequency distribution, it is necessary to learn the following important

Construction of Frequency Distribution

Table 6.5: Construction of Frequency Distribution – “Inclusive Type”

Table 6.6: Open-end Classes Frequency Distribution

(e) Cumulative Frequency: A ‘Cumulative Frequency Distribution’ is formed by taking

Table 6.8: ‘Less-than’ Cumulative Frequency Distribution

Table 6.9: ‘More-than’ Cumulative Frequency Distribution

Illustrate the information with

(a) Construct a pie chart to illustrate the above information.

You might also like