0% found this document useful (0 votes)
37 views19 pages

Introduction to Descriptive Statistics

Uploaded by

johnoladotun16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views19 pages

Introduction to Descriptive Statistics

Uploaded by

johnoladotun16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

STA 111 DESCRIPTIVE STATISTICS

Course Content

Statistical Data: type, source and method of collections


Presentation of Data: Table, charts, and graphs
Error and Approximation
Frequency and cumulative distributions
Measures of location, partition, dispersion skewness and kurtosis
Rate ratio and index number

STATISTICAL DATA: TYPE, SOURCE AND METHODS OF DATA COLLECTION

What is Statistics?
To a lame man, Statistics is synonymous to figure, data, numbers or information. However, Statistics is
simply the science of collecting, organizing, presenting, analyzing and drawing conclusion from data. In
the plural sense, it is often referred to as the collection of numerical and non-numerical data. For
example, educational statistics, Economic Statistics, Health Statistics, Crime Statistics, Labour Statistics
e.t.c

Uses of Statistics
1. To evaluate the existing condition
2. To provide information that can be useful in formulating plan for development program
3. To measure progress
4. To guild research
5. For decision making and forecasting

Basic Statistical concept


Some of the basic terms in Statistics are discussed below
1. Data: It can be defined as the basic raw material for Statistical investigation. They are the value
(measurements or observation) that the variable can assume. Data forms a basic for discussion and
action. A collection of data value forms a data set.
2. Experiment: It is simply referred to as any study that can yield one or several outcomes.
3. Variable: It is a characteristic of interest which, if we observe, takes different values at different unit.
It can be simply put as characteristics of interest being measured.
4. Population: It consists of all individual, unit or entity of interest or under discussion. In other word, it
consists of all subject (human or otherwise) that are being study. It can be finite and infinite i) Finite
population: it is when it is possible to list all the unit or entity in the population. In other words, they can
be counted. ii) Infinite population: it is when it is impossible to list all the elements of population.
5. Sample: It is a portion or part taken from a population. It is simply refer to as a subset of a population.

1
6. Unit: it is any individual member of a population
7. Census: it is defined as a complete enumeration of characteristics in the population. It involves taking
observation from every unit of the population.
8. Survey: it involves taking observation from only a fractional part of the population.
9. Parameter: it is numerical value that describes a characteristic of a population. Note that a parameter
is fixed constant but it's value is unknown in most cases.
10. Statistic: it is a single numerical value that describes the characteristics of sample. This number is
derived from a sample which may be used to estimate the value of parameter. Note that the value of
Statistic is known when a sample has been taken. Hence the value changes from sample to sample.

Sources of Data
1. Primary source of data
2. Secondary source of data
1. Primary source of data: These are data used for specific purpose for which they were collected. In
other words, it could be regarded as data drawn from its original source. Example: census, sample
survey, record of vital Statistics (birth, death e.t.c) and information. Advantages of using primary source
of data
1. It constitute the exact information been sort for
2. It is more reliable.
3. It constitutes more detail information.
Disadvantages of using primary source of data
1. It is expensive to collect i.e it is costly.
2. It is time consuming and it requires more professional personal
2. Secondary source of data: These are data that are made available by others. It could as well be
referred to as generated data used for some purposes other than that for which they were collected
especially from administrative course. E.g CBN publication, NBS publication
Advantages of using secondary source of data
1. The required information is quickly gathered or collected
2. It allows for timely result.
3. It is less expensive to collect.
Disadvantages of using secondary source of data
1. It usually involves loss of details.
2. It is less reliable.

Methods of data collection


1. Questionnaire: It can simply be described as a set of printed and standardize questions designed to get
information from respondent on a given subject of research or administration. E.g employment,
admission e.t.c. Questionnaire is in personal and self-administered, respondent record their responded in
space provide or select one of the option provided in the questionnaire. It may be delivered by hand or
post.
Advantages of questionnaire
i. It is relatively cheap to administer over a very large sample.
ii. It has a wide coverage.

2
iii. Eliminate biasing error associated with interviews
iv. Offer greater anonymity of the subject.
v. Anonymity enhance the reliability of information supplied especially when the subject of study
is sensitive

Disadvantages of questionnaire or completed questionnaire


i. A reasonable level of literacy is required.
ii. It doesn't offer adequate prove into subject of research which may arise from the response of the
respondent.
iii. Researcher is not able to clarify questions that are not understood by responded and their
answers may therefore be inadequate.
iv. A person other than the intended responded may complete the form.
v. Low responds rate than the interview approach.
vi. Lost of uncompleted in trances.
vii. Researcher is unable to observe body gestures that may provide insight into the respondent
reactions.

Design of Questionnaire
Things to note: In the design of a questionnaire the following points should be observed.
i. Questions should be simple, brief and unambiguous.
ii. It should be questions which allows the printed answers to be ticked.
iii. It should neither be too personal or be irrelevant.
iv. Leading questions should not be hard.
v. It should be design so that the questions fall into a logical sequence to enable the respondent
understand it's purpose and improve the quality of answer.
2. INTERVIEW METHOD
Interview involves face to face interaction in which respondent is asked a set of printed and standardize
questions designed to get information from respondent on a given subject of research or administration.
The main difference between questionnaire and interview is that the formal is self-administered and the
later is completed by the interviewer while the respondent verbalize response. Interview may also be
conducted through telephone. Further interview may be recorded in audio or video format.
Advantages of interview
i. It is flexible because interviewee can clarify questions or prove response.
ii. Higher response rate is obtained through these techniques.
iii. It can be conducted with both literate and non-literate sample.
iv. Non-verbal behavior of respondent can be observed and these can help the interview to
determine the accuracy of response and reactions of respondent.
v. Only respondent who are intended are actually interviewed.
vi. Most questions are answered by the respondent because of the presence and clarification of
interviewer.
Disadvantages of interview
i. High cost of administration. Interviewers have to be paid.
ii. Take a longer time to be administered.

3
iii. The presence of an interviewer undermine the anonymity of the responded and leads to both
interviewer and respondent bias.
iv. Interviewer error associated with reading a question and recording of response.

3. Observation Method
These entails direct contact between the researcher and respondent or event under investigation. It
involves systematic observation of event activities or behavior as they occur. Observation could be
classified into participant and non-participant.
* Participant observation refers to the involvement or participation of a researcher in the event or
activities he or she is investigating.
* Non participant observer are present at the event or activities but doesn't get directly involved.
Advantages of observation Method
i. More comprehensive information can be obtained.
ii. Context for behavior can be observed and better understanding achieved.
iii. Behavior and activities are observed and recorded as they occurred, thereby reducing error
associated with forgetfulness.
iv. Both literate and non-literate sample as well as event such as traditional and religious rite can be
study in their natural settings.
Disadvantages of observation Method
i. Present of researcher may encourage people been study to distort the activities or behavior.
ii. Access to the event, sight of activities and behavior may be difficult.
iii. Information from observation are difficult to organize and analyze.
iv. Objectives of observation may be impaired.
v. Researcher has very little control over when or how the event or behavior of interest may take
place.
Documentary Method
Documentary source are used to get information when it will be unnecessary to interview. All that is
needed is to get to the archives ( Library and administrative office ) and collect the data. Data collected
are usually from secondary source.

PRESENTATION OF DATA
Bar Chart
In bar chart, there are no set of rules to be observed in drawing bar charts. The following
consideration will be quite useful.
Note: Bar chart is applicable only to discrete, Categorical, nominal and ordinal data.
i. Bar should be neither too short and nor very long and narrow.
ii. Bar should be separated by spaces which are about one and half of the width of a bar.
iii. The length of the bar should be proportional to frequencies of the categories.

4
iv. Guide note should be provided to ease the reading of the chart.
Bar charts are used for making comparisons among categories. In the simplest form several
items are presented graphically by horizontal or vertical bars of uniform width, with lengths
proportional to the values they represent.

Simple Bar chart


Example
A study of 267 students was taken, 165 were male and the rest were female, show the information in a
frequency distribution table and draw a simple bar chart.

Solution

Freque
Sex ncy
Male 165
Female102
Total 267

200

150

100

50

0
Male Female

Sex

Multiple Bar Charts

These charts enable comparisons of more than one variable to be made the same time. For
example, one could go further by considering Age and sex.
Example

5
Using the information from the frequency table below, draw a multiple bar chart.
Age group Sex Total
Male Female
21 – 30 44 49 93
31 – 40 75 33 108
41 – 50 40 17 57
51 – 60 5 3 8
60 above 1 0 1
Total 165 102 267

Graph of Multiple Bar Chart

80 Sex
Male
Female

60

40

20

0
21-30 31-40 41-50 51-60 above 60

Age

Component Bar Chart

Similarly, these charts enable comparisons of more than one variable to be made the same time.
Displays a total magnitude for different categories, with each bar divided into sections to
represent the components that make up that total.

Example

6
Age group Sex Total
Male Female
21 – 30 44 49 93
31 – 40 75 33 108
41 – 50 40 17 57
51 – 60 5 3 8
60 above 1 0 1
Total 165 102 267

120 Sex
Male
Female

100

80

60

40

20

0
21-30 31-40 41-50 51-60 above 60

Age

Pie Chart

7
A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating relative
magnitudes or frequencies. In a pie chart, the arc length of each sector (and consequently its
central angle and area), is proportional to the quantity it represents. Together, the sectors create a
full disk. It is named for its resemblance to a pie which has been sliced.

A pie chart gives an immediate visual idea of the relative sizes of the shares as a whole. It is a
good method of representation if one wishes to compare a part of a group with the whole group.
You could use a pie chart to show sex of respondents in a given study, market share for different
brands or different types of sandwiches sold by a store.

In order to draw a pie chart, you must have data for which you need to show the proportion of
each category as a part of the whole. Then the process is as below.

1. Collect the data so the number per category can be counted. In other words, decide on the data
that you wish to represent and collect it altogether in a format that shows shares of the whole.
2. Decide on clear title. The title should be a brief description of the data you wish to show. For
example, if you wish to show sex of the respondents you could call the pie chart ‘sex of the
respondent in the study’.
3. Decide on the total number of responses. the number of categories is two (male and female).
4. Calculate the degree share in each category.

As an example, here is the calculation of the degree share for the sex of the respondents in a
given study.

Sex of the
respondent Frequency
Male 165
Female 102
Total
267

Number of Male 165


Angle of Male = = × 360 = 222.5
Total Number 267

Number of Female 102


Angle of Female = = × 360 = 137.5
Total Number 267

8
Sex
Male =
222.5 deg.
Female =
137.5 deg.

Histogram
This is the most widely used graphical presentation of a frequency distribution. The histogram is a
development of the simple bar chart, with the following differences:

Note: Histogram is applicable only to continuous data. Such as height, weight and so on. In histogram
the bars have to touch each other unlike in bar chart.

1 Except for the case of equal intervals: the area (A) of each rectangular bar is proportional to the
frequency in the class, it does not represent its heights. That is A = width * height = frequency.
2 Each rectangular bar is constructed to cover the class it represents without gaps.

9
years

Exercises
The data below comes from a survey of physiotherapists in Nigeria and they were asked
the questions about patients who have Osteoarthritis knee. And the questions asked were;
What age group are you and sex?
For how long have you been practicing physiotherapy?

In a typical week, how many patients do you see?

On the average, about how many minutes do you spend in treating a patient?

1. Create simple bar charts for age group and for sex.
2. Create a multiple bar chart for age group with the bars divided into sex
3. Create a component bar chart for age group with sex as the two component 4. For years of practice,
suggest why we did not draw a bar chart

4. Create a pie chart for age group of the physiotherapists


5. Create a pie chart for sex of the physiotherapists
6. Create histogram for years of practice.
7. Create histogram for typical.
8. Create histogram for Average.
9. For sex, suggest why we did not draw histogram.

S/No Age Sex Years of Typical Ave.

10
group practice
1 31-40 Female 4 2 30
2 31-40 Male 14 20 45
3 21-30 Female 8 3 45
4 21-30 Male 3 5 55
5 31-40 Female 10 25 25
6 31-40 Male 10 15 30
7 31-40 Female 9 30 30
8 21-30 Female 2 150 20
9 31-40 Female 2 100 15
10 41-50 Male 17 40 45
11 21-30 Male 5 40 20
12 41-50 Male 17 15 15
13 21-30 Male 3 55 30
14 31-40 Male 11 20 20
15 31-40 Female 10 25 20
16 31-40 Male 3 15 60
17 31-40 Male 14 10 40
18 21-30 Female 2 9 45
19 51-60 Female 29 12 45
20 31-40 Male 6 10 45
21 21-30 Female 4 50 30
22 21-30 Female 5 12 35
23 21-30 Female . 30 15
24 31-40 Female 18 50 40
25 31-40 Male 5 20 45
26 21-30 Male 5 15 30
27 31-40 Male 10 1 30
28 41-50 Male 13 10 45
29 21-30 Female 2 20 15
30 31-40 Male 7 22 30
31 31-40 Female 13 40 30
32 41-50 Male 22 40 40
33 21-30 Female 8 75 20
34 31-40 Male 9 5 20
35 31-40 Male 7 30 30
36 41-50 Male 20 13 30
37 31-40 Male 5 200 40
38 41-50 Female 24 10 20
39 31-40 Male 3 30 45
40 41-50 Female 16 30 15
41 21-30 Male 3 60 45

11
42 31-40 Female 11 5 20
43 31-40 Male 7 25 30
44 51-60 Male 25 3 30
45 21-30 Female 4 20 25
46 21-30 Female 3 30 30
47 31-40 Female 11 30 30
48 21-30 Male 3 4 30
49 21-30 Male 5 60 30
50 31-40 Female 16 92 60
51 21-30 Female 7 45 30
52 21-30 Male 3 10 20
53 21-30 Female 4 5 30
54 41-50 Male 16 7 30
55 31-40 Male 10 225 25
56 41-50 Male 17 40 60
57 31-40 Male 15 40 25
58 21-30 Male 2 15 40
59 21-30 Female 1 7 80
60 21-30 Female 2 2 180

12
ERROR AND APPROXIMATIONS
Meaning of Error
In numerical computation, an error is the difference between the true value and the approximate
(calculated) value.
Error = True Value - Approximate Value
Since the true value is often unknown, error analysis helps us understand how accurate our
approximations are.

Types of Errors
1. Absolute Error: The magnitude of the difference between true and approximate value.
Absolute Error = |True Value - Approximate Value |
Example:
True value = 25.83, Approx = 25.80 → Absolute error = 0.03

2. Relative Error: Error compared to the true value.


Absolute Error
Relative Error = × True Value
True Value
Shows how large the error is in proportion.

3. Percentage Error
Percentage Error = Relative Error × 100 %
Example: relative error = 0.02 → percentage error = 2%

Causes of Errors
1. Instrumental Errors Caused by imperfections in measuring instruments Example: faulty ruler,
uncalibrated scale.
2. Observational Error Caused by humans when taking readings Example: reading a scale at the wrong
angle (parallax error).
3. Computational Errors Due to rounding, truncation, or approximations in calculations.
4. Theoretical/Model Errors When the model used does not fully describe the real system.

Approximations
Approximations occur when exact values are difficult or impossible to obtain.

Forms of Approximation
1. Rounding Off: Reducing digits to a required number of decimal places. Caused when numbers are
rounded during calculations.

Example:

3.141592 → 3.14 (round-off)

13
2. Significant Figures

Number of digits that carry meaningful information.


Example:
0.00450 → 3 significant figures

Minimizing Errors

Use calibrated instruments

Take repeated measurements

Use proper significant figures

Use better numerical methods

Avoid unnecessary rounding in intermediate steps

Importance of Error and Approximation

Ensures accuracy and reliability of results

Essential in engineering, science, physics, statistics

Helps judge the quality of measurements and models

Useful in predicting uncertainty in results

Measures of location, partition, dispersion skewness and kurtosis

Measures of location
Measures of location, averages, or measures of central tendencies are single values that describe a data’s
sets position or central tendency. Common measures include then mean, median and mode which
represent the average, middle and the most frequent values respectively.
Arithmetic Mean

The arithmetic Mean or the mean of a set of numbers X 1 , X 2 , X 3 … X N is defined as

14
N
X + X + X …+ X N
X= 1 2 3 =
∑ Xj =
∑X
j =1
N N
N
Example
The arithmetic Mean of the numbers 8, 3, 5, 12, and 10 is
8+3+5+12+10 18
X= = =7.6
5 5

If the numbers X 1 , X 2 , X 3 … X K occur f 1 , f 2 , f 3 … f K times respectively, the arithmetic mean is


K

f 1 X 1 +f 2 X 2 + f 3 X 3 …+f K X k ∑ f j Xj ∑ fX ∑ fX
j =1
X= = = =
f 1 + f 2 +f 3 + …+f K k
∑f N
∑fj
j=1

Example
If 5, 8, 6, 2 occur with frequencies 3, 2, 4 and 1, the arithmetic mean is
(3 )( 5 )+ (2 )( 8 )+ ( 4 )( 6 )+(1)(2) 15+16+24 +2
X= = = 5.7
3+ 2+ 4+ 1 10

The Median
The median of a set of numbers arranged in order of magnitude in array is either the middle value or the
arithmetic mean of the two middle numbers.
Example: The set of numbers 8, 6,4,5,4,8,3,8, and 10 has a median 6
Example: the set of numbers 5, 5, 7, 9, 11, 12, 15, and 18 has median 9 + 11/2 = 10
For group data, the median is obtained by interpolation, as given below

( )
N
−( ∑ f )1
Median = 2
L1 = c
f median

L1=¿ lower class boundary of the median class

N = number of items in the data


(∑ f )1 = sum of frequencies of all classes lower than the median class

15
f median=¿frequency of the median class

c = size of the median class interval


The Mode
The mode of a set of numbers is that value which occurs with the greatest frequency, that is, t is the most
common value. The mode may not exist, and even if it does exist it may not be unique.
Example: the set 2,2,5,7,9,9,9,10,10,11,12, and 18 has mode 9
The set 3,5,8,10,12,15, and 16 has no mode
The set 2,3,4,4,4,5,5,7,7,7 and 9 has two modes 4 and 7 and is called bimodal
The empirical relationship between the Mean, Median, and Mode
Mean – Mode = 3(mean - median)

Measures of Dispersion
Dispersion or variation is the degree in which numerical data tend to spread about an average value is
called the dispersion or variation of the data. Varius measures of the dispersion or variation are
available, the most common being the range, mean deviation, and standard deviation.

The Range
The range of a set of numbers is the difference between the largest and the smallest numbers in the set.
Example: the range of the set 2, 3, 3, 5, 5, 5, 8, 10, 12 is 12 – 2 = 10

The mean deviation


The men deviation or average deviation is a set of N numbers X 1 , X 2 , X 3 … X N is defined by
N

MD =
∑ ¿ X j−X ∨¿ ¿
= Σ∨ X−X ∨ N ¿
j =1
¿
N
Example: find the mean deviation of the set 2,3,6,8,11
Solution
Arithmetic mean = 2+ 3 + 6 + 8 + 11\5 = 6

¿
MD = ¿ 2 – 5∨+¿ 3 – 6∨+¿ 6−6∨+ ¿ 8−6∨+¿ 11 – 6∨ 5 ¿ = 2.8

The standard deviation


The standard deviation of a set N numbers X 1 , X 2 , X 3 … X N is defined by

16

N

∑ (X j− X) = √ Σ( X− X)
2 2
S
¿ j=1
N
N
Example
Find the standard deviation s of the set of numbers 12, 6, 7, 3, 15, 10, 18, 5 and 9, 3, 8, 8, 9, 8, 9, 18
Solution
12+6 +7+3+15+10+ 18+5
(a) The arithmetic mean X = = = 76/8= 9.5
8

The mean deviation s = √


2
Σ( X− X)
N
=
√(12−9.5)2 +(6−9.5)2+(7−9.5)2+(3−9.5)2+(15−9.5)2+(10−9.5)2+(18−9.5)2 +(5−9.5)2
8
= √ 23.75
= 4.87
The variance

The variance of a set of data is defined as the square of the standard deviation and is thus given by s2

Skewness
Skewness is the degree of asymmetry or departure from symmetry of a distribution. If the frequency
curve of a distribution has a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right, or to have a positive skewness. If the reverse is true, it is
said to be skewed to the left, or has negative skewness
mean−mode X−mode
Skewness = =
standard deviation s
To avoid using the mode, wer can employ the empiriucal formular
3(mean−mediam) 3 (X −median)
Skewness = =
standard deviation s
Example: find the first and second coefficient of skewness for the wage distribution of the 65 employees
at the P & R company.
Mean = $279.76, median = $279.06, mode = $277.50, and standard deviation s = $15.60
Solution
mean−mode $ 279.76−$ 277.50
First coefficient of skewness = = = 0.1448
standard deviation $ 15.60

3(mean−mediam) 3 ($ 279.76−$ 279.06)


Second coefficient of skewness = = =0.1346
standard deviation $ 15.60

17
Rate, Ratio, and Index Numbers
In descriptive statistics we often encounter data that need to be compared, normalized or transformed in
some way to provide meaningful insights. Rates, Ratios and index numbers are tools that help in making
these comparisons. they allow us to describe relationship between different quarters in a more
understandable form
Rates
A rate is a ratio that compares two quantities of different units typically involving time ,place, or, other
factors. It compares how one quantity changes relative to another.
Quantity of interest
Mathematically, Rate =
Unit of copmparison
Example
Birth rate: the number of live births per 1000 people in a specific time period
Crime rate: the number of reported crimes per 100,000 people in a given year
Ratios
A ratio is a relationship between two numbers or quartiles showing how many times the first number
contains the second, unlike rates, ratios compare quantities of the same unit.
Quantity 1
Ratio =
Quantity 2
Example
Student to teacher ratio: if there are 30 students to and 5 teachers, the ratio of students to teachers is 6:1
Male to female ratio: if a class has 20 males and 15 females the ratio males to females is 4:3
Index numbers
An index number is a statistical measure used to represent the relative change in a variable over time or
space. Index numbers are typically used to track the movement or trends in an economic or social
variable.
value of variable at current time
Index Number = ∗100
value of variable at base time
Example
Consumer price index (CPI) measures the average change in prices paid by consumers for goods and
services over time, if (CPI) in 2020 is 110 and the base year 2010 is 100 then the index number indicates
that prices have increase by 10m% since the base year.

18
Exercises
1. If there are 5,000 live births in a city with population of 1,000.000. calculate the birth rate per 1,000
people.
2. A school has 200 boys and 150 girls. what is the boy to girl ratio.
3. The price of bread in 2020 is 2,000 and the p[rice in the base year 2010 was 500. Calculate the index
number for bread price in 2020.

19

You might also like