0% found this document useful (0 votes)
28 views99 pages

Central Tendency Measures Explained

Chapter 3 discusses numerical descriptive measures, focusing on measures of central tendency such as mean, median, and mode, along with their relationships and examples. It also covers measures of dispersion including range, variance, and standard deviation, highlighting their significance in data analysis. The chapter emphasizes the importance of understanding these measures for both ungrouped and grouped data.

Uploaded by

tasnim.monir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views99 pages

Central Tendency Measures Explained

Chapter 3 discusses numerical descriptive measures, focusing on measures of central tendency such as mean, median, and mode, along with their relationships and examples. It also covers measures of dispersion including range, variance, and standard deviation, highlighting their significance in data analysis. The chapter emphasizes the importance of understanding these measures for both ungrouped and grouped data.

Uploaded by

tasnim.monir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER 3

NUMERICAL
DESCRIPTIVE
MEASURES
MEASURES OF CENTRAL TENDENCY FOR
UNGROUPED DATA

 Mean
 Median
 Mode
 Relationships among the Mean, Median, and Mode
Mean
The mean for ungrouped data is obtained
by dividing the sum of all values by the
number of values in the data set. Thus,

Mean for population data: =  x


N
Mean for sample data: x=
 x
n
Example 3-1

Table 3.1 lists the total sales (rounded to


billions of dollars) of six U.S. companies for
2008.
Table 3.1 2008 Sales of Six U.S. Companies

Find the 2008 mean sales for these six


companies.
Example 3-1: Solution
x = x 1 + x2 + x3 + x 4 + x5 + x6
= 149 + 406 + 183 + 107 + 426 + 97 = 1368

x=
 x
=
1368
= 228 = $228 Billion
n 6

Thus, the mean 2008 sales of these six companies


was 228, or $228 billion.
Example 3-2

The following are the ages (in years) of all

eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.


Example 3-2: Solution

=  x 362
= = 45.25 years
N 8

Thus, the mean age of all eight employees of


this company is 45.25 years, or 45 years and
3 months.
Example 3-3
Table 3.2 lists the total philanthropic
giving's (in million dollars) by six
companies during 2007.
Example 3-3

Notice that the charitable contributions


made by Wal-Mart are very large
compared to those of other companies.
Hence, it is an outlier. Show how the
inclusion of this outlier affects the value of
the mean.
Example 3-3: Solution

If we do not include the charitable giving of


Wal-Mart (the outlier), the mean of the
charitable contributions of the fiver
companies is
22.4 + 31.8 + 19.8 + 9.0 + 27.5
Mean = = $22.1 million
5
Example 3-3: Solution

Now, to see the impact of the outlier on the


value of the mean, we include the
contributions of Wal-Mart and find the
mean contributions of the six companies.
This mean is
22.4 + 31.8 + 19.8 + 9.0 + 27.5 + 337.9
Mean = = $74.73 million
6
Median
Definition
The median is the value of the middle term in a data set
that has been ranked in increasing order.

The calculation of the median consists of the following two


steps:
◼ Rank the data set in increasing order.
◼ Find the middle term. The value of this term is the
median.
Example 3-4

The following data give the prices (in thousands of


dollars) of seven houses selected from all houses sold
last month in a city.

312 257 421 289 526 374 497

Find the median.


Example 3-4: Solution
First, we rank the given data in increasing order as
follows:
257 289 312 374 421 497 526
Since there are seven homes in this data set and the middle
term is the fourth term,

Thus, the median price of a house is 374.


Example 3-5

Table 3.3 gives the 2008 profits (rounded to


billions of dollars) of 12 companies selected from
all over the world.
Table 3.3 Profits of 12 Companies for 2008
Find the median
of these data.
Example 3-5: Solution
First we rank the given profits as follows:

7 8 9 10 11 12 13 13 14 17 17 45

There are 12 values in this data set. Because there is an


even number of values in the data set, the median is given
by the average of the two middle values.
Example 3-5: Solution
The two middle values are the sixth and
seventh in the foregoing list of data, and these
two values are 12 and 13.

12 + 13 25
Median = = = 12.5 = $12.5 billion
2 2

Thus, the median profit of these 12 companies


is $12.5 billion.
Advantages of Median

The median gives the center of a histogram, with half the


data values to the left of the median and half to the right of
the median. The advantage of using the median as a
measure of central tendency is that it is not influenced by
outliers. Consequently, the median is preferred over the
mean as a measure of central tendency for data sets that
contain outliers.
Mode

Definition

The mode is the value that occurs with the highest


frequency in a data set.
Example 3-6

The following data give the speeds (in miles


per hour) of eight cars that were stopped on I-
95 for speeding violations.

77 82 74 81 79 84 74 78

Find the mode.


Example 3-6: Solution

In this data set, 74 occurs twice and each of the remaining


values occurs only once. Because 74 occurs with the
highest frequency, it is the mode. Therefore,

Mode = 74 miles per hour


Mode

A major shortcoming of the mode is that a data set may


have none or may have more than one mode, whereas it
will have only one mean and only one median.

◼ Unimodal: A data set with only one mode.

◼ Bimodal: A data set with two modes.

◼ Multimodal: A data set with more than two modes.


Example 3-7 (Data set with no mode)

Last year’s incomes of five randomly selected


families were
$76,150 $95,750 $124,985 $87,490 $53,740

Find the mode.


Example 3-7: Solution

Because each value in this data set occurs


only once, this data set contains no mode.
Example 3-8 (Data set with two modes)

Refer to the data on 2008 profits of 12 companies


given in Table 3.3 of Example 3-5. Find the
mode for these data.
Example 3-8: Solution

In the data given in Example 3-5, each of two


values 13 and 17 occurs twice, and each of the
remaining values occurs only once. Therefore, that
data set has two modes: $13 billion and $ 17
billion.
Example 3-9 (Data set with three modes)

The ages of 10 randomly selected students from a


class are 21, 19, 27, 22, 29, 19, 25, 21, 22 and 30
years, respectively. Find the mode.
Example 3-9: Solution

This data set has three modes: 19, 21 and


22. Each of these three values occurs with a
(highest) frequency of 2.
Mode

One advantage of the mode is that it can be


calculated for both kinds of data -quantitative and
qualitative - whereas the mean and median can be
calculated for only quantitative data.
Example 3-10

The status of five students who are members of the


student senate at a college are

senior, sophomore, junior, senior, senior

Find the mode.


Example 3-10: Solution

Because senior occurs more frequently than the


other categories, it is the mode for this data set.
We cannot calculate the mean and median for this
data set.
Relationships among the Mean, Median, and Mode

1. For a symmetric histogram and frequency curve with


one peak (Figure 3.2), the values of the mean,
median, and mode are identical, and they lie at the
center of the distribution.
Figure 3.2 Mean, median, and mode for a
symmetric histogram and frequency curve.
Relationships among the Mean, Median, and Mode
2. For a histogram and a frequency curve skewed to the right
(Figure 3.3), the value of the mean is the largest, that of the
mode is the smallest, and the value of the median lies between
these two. (Notice that the mode always occurs at the peak
point.) The value of the mean is the largest in this case
because it is sensitive to outliers that occur in the right tail.
These outliers pull the mean to the right.
Figure 3.3 Mean, median, and mode for a histogram and
frequency curve skewed to the right.
Relationships among the Mean, Median, and Mode

3. If a histogram and a distribution curve are skewed to


the left (Figure 3.4), the value of the mean is the
smallest and that of the mode is the largest, with the
value of the median lying between these two. In this
case, the outliers in the left tail pull the mean to the
left.
Figure 3.4 Mean, median, and mode for a histogram and
frequency curve skewed to the left
MEASURES OF DISPERSION FOR UNGROUPED
DATA

Consider the following two data sets on the ages (in


years) of all workers working for each of two small
companies

Company 1: 47 38 35 40 36 45 39

Company 2: 70 33 18 52 27

What is the mean age of workers of both companies?


In this situation, the mean, median, or mode usually not a sufficient
measure to reveal the shape of the distribution of a data set. We also
need a measure that can provide some information about the variation
among data values.
MEASURES OF DISPERSION FOR UNGROUPED
DATA

 Range

 Variance and Standard Deviation


Range

Finding Range for Ungrouped Data

Range = Largest value – Smallest Value


Example 3-11

Table 3.4 gives the total areas in square miles of the four
western South-Central states of the United States.

Find the range for this data set.


Table 3.4
Example 3-11: Solution

Range = Largest value – Smallest Value

= 267,277 – 49,651

= 217,626 square miles


Thus, the total areas of these four states are
spread over a range of 217,626 square miles.
Range
Disadvantages
 The range, like the mean has the disadvantage of being
influenced by outliers. Consequently, the range is not a
good measure of dispersion to use for a data set that
contains outliers.

 Its calculation is based on two values only: the largest


and the smallest. All other values in a data set are
ignored when calculating the range.
Variance and Standard Deviation

 The standard deviation is the most used measure of dispersion.

 The value of the standard deviation tells how closely the values of a
data set are clustered around the mean.

 In general, a lower value of the standard deviation for a data set


indicates that the values of that data set are spread over a relatively
smaller range around the mean.

 In contrast, a large value of the standard deviation for a data set


indicates that the values of that data set are spread over a relatively
large range around the mean.
Variance and Standard Deviation

 The Variance calculated for population data is


denoted by σ² (read as sigma squared), and the
variance calculated for sample data is denoted by s².

 The standard deviation calculated for population data


is denoted by σ, and the standard deviation calculated
for sample data is denoted by s.
Variance and Standard Deviation
The formula for calculating the variance
Variance and Standard Deviation

Formulas for the Variance and Standard


Deviation for Ungrouped Data
( x) 2
( x)
2

 x − 2

N
x − n
2

2 = and s 2 =
N n −1

where σ² is the population variance and s² is the sample


variance.
Variance and Standard Deviation
Formulas for Standard Deviation for Ungrouped Data

The standard deviation is obtained by taking the positive


square root of the variance.

Population standard deviation:  = 2

Sample standard deviation: s = s2


Example 3-12
The following table gives the 2008 market values
(rounded to billions of dollars) of five international
companies. Find the variance and standard deviation for
these data.
Example 3-12: Solution
Let x denote the 2008 market value of a company. The
value of Σ x and Σx2 are calculated in Table 3.6.
Example 3-12: Solution

( x )
2
(662)2
x 2

n
114,600 −
5 114,600 − 87,648.80
s2 = = = = 6737.80
n −1 5 −1 4
s = 6737.80 = 82.0841 = $82.08 billion

Thus, the standard deviation of the market


values of these five companies is $82.08 billion.
Two Observations

1. The values of the variance and the standard deviation


are never negative.

2. The measurement units of variance are always the


square of the measurement units of the original data.
Example 3-13

Following are the 2009 earnings (in thousands of dollars)

before taxes for all six employees of a small company.

88.50 108.40 65.50 52.50 79.80 54.60

Calculate the variance and standard deviation for these


data.
Example 3-13: Solution
Let x denote the 2009 earnings before taxes of an
employee of this company. The value of Σ x and Σx2
are calculated in Table 3.7.
Example 3-13: Solution

(  x)
2
(449.30)2
x 2

N
35,978.51 −
6
2 = = = 388.90
N 6
 = 388.90 = $19.721 thousand = $19,721

Thus, the standard deviation of the 2009 earnings of


all six employees of this company is $19,721.
Population Parameters and Sample Statistics

 A numerical measure such as the mean, median,


mode, range, variance, or standard deviation
calculated for a population data set is called a
population parameter, or simply a parameter.

 A summary measure calculated for a sample data


set is called a sample statistic, or simply a
statistic.
MEAN, VARIANCE AND STANDARD DEVIATION
FOR GROUPED DATA

 Mean for Grouped Data

 Variance and Standard Deviation for Grouped Data


Mean for Grouped Data

Calculating Mean for Grouped Data

Mean for population data: =  mf


N

Mean for sample data: x=


 mf
n
where m is the midpoint and f is the frequency of a
class.
Example 3-14

Table 3.8 gives the frequency distribution of the daily


commuting times (in minutes) from home to work for all
25 employees of a company.

Calculate the mean of the daily commuting times.


Table 3.8
Example 3-14: Solution
Example 3-14: Solution

=  mf
=
535
= 21.40 minutes
N 25

Thus, the employees of this company spend an


average of 21.40 minutes a day commuting from
home to work.
Example 3-15

Table 3.10 gives the frequency distribution of the number


of orders received each day during the past 50 days at the
office of a mail-order company.

Calculate the mean.


Table 3.10
Example 3-15: Solution
Example 3-15: Solution

x=
 mf
=
832
= 16.64 orders
n 50

Thus, this mail-order company received an average of


16.64 orders per day during these 50 days.
Variance and Standard Deviation for Grouped
Data
Formulas for the Variance and Standard Deviation for
Grouped Data
( mf ) 2
( mf )
2

 m f−
2

N
m f − n
2

2 = and s 2 =
N n −1
where σ² is the population variance, s² is the sample
variance, and m is the midpoint of a class.
Variance and Standard Deviation for Grouped
Data

Formulas for Standard Deviation for Grouped Data

The standard deviation is obtained by taking the positive


square root of the variance.

Population standard deviation:  = 2

Sample standard deviation: s = s2


Example 3-16

Table 3.8 gives the frequency distribution of the daily


commuting times (in minutes) from home to work for all
25 employees of a company. Calculate the variance and
standard deviation.
Table 3.8
Example 3-16: Solution
Example 3-16: Solution

 m 2
f −
(  mf ) 2

14,825 −
(535) 2

N 25 3376
 =
2
= = = 135.04
N 25 25

 =  2 = 135.04 = 11.62 minutes

Thus, the standard deviation of the daily commuting


times for these employees is 11.62 minutes.
Example 3-17

Table 3.10 gives the frequency distribution of the number


of orders received each day during the past 50 days at the
office of a mail-order company.

Calculate the variance and standard deviation.


Table 3.10
Example 3-17: Solution
Example 3-17: Solution

 m 2
f −
(  mf ) 2

14,216 −
(832 ) 2

s2 = n = 50 = 7.5820
n −1 50 − 1

s = s 2 = 7.5820 = 2.75 orders

Thus, the standard deviation of the number of orders


received at the office of this mail-order company during
the past 50 days in 2.75.
MEASURES OF POSITION

 Quartiles

 Percentiles
Quartiles and Interquartile Range
Definition

Quartiles are three summery measures that divide a


ranked data set into four equal parts. The second quartile
is the same as the median of a data set. The first quartile
is the value of the middle term among the observations
that are less than the median, and the third quartile is the
value of the middle term among the observations that are
greater than the median.
Figure 3.11 Quartiles.
Quartiles and Interquartile Range

Calculating Interquartile Range

The difference between the third and first quartiles gives


the interquartile range; that is,

IQR = Interquartile range = Q3 – Q1


Example 3-20
Refer to Table 3.3 in Example 3-5, which gives the 2008
profits (rounded to billions of dollars) of 12 companies
selected from all over the world. That table is reproduced
below.

a) Find the values of the three quartiles. Where does the


2008 profits of Merck & Co fall in relation to these
quartiles?

b) Find the interquartile range.


Table 3.3
Example 3-20: Solution
a)

By looking at the position of $8 billion, which is the


2008 profit of Merck & Co, we can state that this
value lies in the bottom 25% of the profits for 2008.
Example 3-20: Solution

b)
IQR = Interquartile range = Q3 – Q1
= 15.5 – 9.5
= $6 billion
Example 3-21

The following are the ages (in years) of nine employees of


an insurance company:

47 28 39 51 33 37 59 24 33

a) Find the values of the three quartiles. Where does the


age of 28 fall in relation to the ages of the employees?

b) Find the interquartile range.


Example 3-21: Solution
a) Values less than the median Values greater than the median

24 28 33 33 37 39 47 51 59

28 + 33 47 + 51
Q1 =
2
Q2 = 37 Q3 =
2
= 30.5 = 49

The age of 28 falls in the lowest 25% of the ages.


Example 3-21: Solution

b)
IQR = Interquartile range = Q3 – Q1
= 49 – 30.5
= 18.5 years
BOX-AND-WHISKER PLOT
Definition
A plot that shows the center, spread, and skewness of a
data set. It is constructed by drawing a box and two
whiskers that use the median, the first quartile, the third
quartile, and the smallest and the largest values in the
data set between the lower and the upper inner fences.
Example 3-24

The following data are the incomes (in thousands of


dollars) for a sample of 12 households.

75 69 84 112 74 104 81 90 94 79 98 144

Construct a box-and-whisker plot for these data.


Example 3-24: Solution

69 74 75 79 81 84 90 94 98 104 112 144

Smallest value = 69

Highest value = 144

Median or Q2 = (84 + 90) / 2 = 87

Q1 = (75 + 79) / 2 = 77

Q3 = (98 + 104) / 2 = 101


Find the points that are 1.5×IQR below Q1 and
1.5×IQR above Q3.

These two points are called the lower and the upper
inner fences, respectively.
IQR=Q3 – Q1 =101-77=24

Lower inner fence =Q1 -1.5*IQR=77-1.5*24=41

Upper inner fence =Q3 +1.5*IQR=101+1.5*24=137

The results outside the range = [41, 137] is the outlier


Determine the smallest and the largest values in the
given data set within the two inner fences. These two
values for our example are as follows:

Smallest value within the two inner fences=69

Largest value within the two inner fences=112


Example 3-24: Solution

Figure 3.14
Example 25 (Self test)
The following data give the lengths of time (in weeks) taken to find a
full-time job by 18 computer science majors who graduated in 2008
from a small college.

30 43 32 21 65 8 4 18 16
38 9 44 33 23 24 81 42 55

Make a box-and-whisker plot. Does this data set contain any outliers?

You might also like