0% found this document useful (0 votes)
95 views11 pages

Amplitude Interquartilica in Statistics

1. This document discusses indicators used to analyze unidimensional frequency distribution series in statistics. 2. It defines key terms like frequency, absolute frequency, relative frequency, and cumulative frequencies. Frequency represents the number of statistical units a variable is found in. 3. Measures of central tendency discussed include the mean, median, and mode. The mean, or average, represents the central value of a data set and is calculated by summing all values and dividing by the total number of units.

Uploaded by

Mihai Constantin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views11 pages

Amplitude Interquartilica in Statistics

1. This document discusses indicators used to analyze unidimensional frequency distribution series in statistics. 2. It defines key terms like frequency, absolute frequency, relative frequency, and cumulative frequencies. Frequency represents the number of statistical units a variable is found in. 3. Measures of central tendency discussed include the mean, median, and mode. The mean, or average, represents the central value of a data set and is calculated by summing all values and dividing by the total number of units.

Uploaded by

Mihai Constantin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Ass. Prof. Ph. D. L.

Marcu, 2019-2020

Chapter 4. Indicators of Unidemensional Distribution Series

1. Statistical Series

Statistical series is that way of presentation of statistical data consisting of two or more
rows of data, the first row representing the grouping variable and the others rows indicating
frequencies obtained according to grouping variable.
Depending on the grouping variable, statistical series can be:
- Series of frequency distribution;
- Territorial series;
- Time series.
Unidimensional frequency distribution series is one of frequency distribution series. It
consists of two data rows, the first one is a attributive variable and the second one contains the
number of statistical units to which the variable is found.
Frequency distribution series are analyzed using indicators that show:
- central tendency of development of the phenomenon described by series;
- the spread of individual values from the central tendency;
- the asymmetry of the distribution series;
- concentration of the individual values to a certain value of the series or according to a
certain variable.

2. The Frequency

When statistical data are grouped into classes, the frequency of a class interval is given
by the number of units to which the variable studied is placed between the lower class limit and
its upper limit.
Therefore, frequency represents the number of units in which we find the variable
analyzed. Frequencies can be:
- Absolute frequencies;
- Relative frequencies;
- Cumulative frequencies (ascending or descending).
The absolute frequency of a class interval, noted as fi (or ni) is obtained by centralising
individual data, ie by counting the units where the variable value is in the ”i” class interval.
The relative frequency of class interval ”i”, noted as fi* (or ni*) is the quotient of the
number of units in the class interval ”i” and the total number of units studied (Σfi). The relative
frequency may be expressed as a ratio or percentage (if the ratio is multiplied by 100).
fi fi
*
fi = *
or fi = ꞏ 100
∑ fi ∑ fi
Cumulative increasing frequency of class interval ”i” (fci) is the sum of frequencies
from the first class interval to the ”i” class interval.
k
fci = ∑ f i
i=1

Example: fc1 = f1, fc2 = f1+f2, ...... fcn = f1+f2+...+fn

1
Ass. Prof. Ph. D. L. Marcu, 2019-2020

Cumulative decreasing frequency of class interval ”i” (fdi) is the difference between the
total number of individual cases (Σfi) and the sum of the frequencies from the first class interval
to the prior class interval of the class interval analyzed („i”).
n k −1
fdi = ∑ fi – ∑ f k−1
i=1 i−1

n n n
Example: fd1 = ∑ fi - 0, fd2 = ∑ fi - f1, .......................... fdi = ∑ fi - (f1+f2+...+fn-1)
i=1 i=1 i=1

Cumulative frequencies are used for the determination of the location indicators that
characterizes the series distribution.

3. Measures of Central Tendency1

Measures of central tendency (sometimes called “measures of central location” or


“measures of location”) express in a quantitative form the trend manifestation of phenomena.
These indicators are: the Mean (often called the average); the Median; the Module.

3.1. The Mean

The mean (the average) is an indicator who summarizes in a single numeric value all
individual values and express what is essential and common for all statistical units considered.
Depending on the calculation and the nature of phenomenon studied, average can be:
- Arithmetic mean;
- Harmonic mean;
- Geometric mean;
- Square mean.

The arithmetic mean is the most commonly used statistical indicator. It represents the
value of the variable that would be obtained it the causal factors acted constantly, and the
influence of random factors would have been zero.
Calculation of arithmetic mean:
a) For individual data: x́ =
∑ xi
n
xi = individual values of the variable
n = number of population units
x́ = variable mean
n

b) For data systematized by classes (weighted mean2) : x́ =


∑ xi f i
i=1

∑fi
xi = centre of grouping class

1
In Romanian: “Indicatorii tendintei centrale”.
2
In Romanian: ”Media aritmetică ponderată”.

2
Ass. Prof. Ph. D. L. Marcu, 2019-2020

fi = frequency of grouping class


n = number of classes
x́ = variable mean

To sense the arithmetic mean, it is necessary that the number of units to be large enough
(to meet the requirements of the law of large numbers) or population present a certain uniformity
of cases (to be homogeneous).
Law of large numbers first formulated by Jakob Bernoulli supposed a sufficient number
of individual cases so that deviations in addition to an average compensate by the deviations less
than the average.
Arithmetic mean properties:
- It is within the range of variation of the variable: x min < x < x max

- If all terms of the series are equal to each other and equal to a constant „k”, the mean is
equal to a constant „k”.
X1 = x́ , X2 = x́........................ Xn = x́
X1+X2+...............+Xn = x́ + x́ + .... + x́ = Σxi = n x́
- The sum of the individual deviations from the average is zero:

∑ (x i −x )=0 ∑ (x i −x)ni=0
(simple series) ( frequency series)
- If all individual values increase/decrease by a constant „a” then the average
increase/decrease of the same constant:

x'=
∑ ( xi ±a ) = ∑ x i ± n∗a =x±a
n n n

x'=
∑ ( xi ±a )∗n i = ∑ x i∗ni ± a ∑ n i =x±a
∑ ni ∑ ni ∑ n i
„a” may be the centre of the class with the highest frequency.
- If all values are multiplied/divided by a constant „h” then the average
multiplies/is divided by the constant „h”:
x
∑ hi 1 ∑ xi x
x'= = ∗ =
n h n h
i x
∑ ∗n
h i 1 ∑ ( x i∗ni ) x
x'= = ∗ =
∑ n i h ∑ ni h

„h” is the width of the class with the highest frequency.

3
Ass. Prof. Ph. D. L. Marcu, 2019-2020

- Arithmetic mean is associative: x́ + ý = x +´ y

Example:
Alfa trading company with 50 employees has the following data on net sales in March.
Determine the average sales for March.

Classes by sales Class Number of


(thousand Lei) centre (xi) employees Xi*fi
(fi)
below 5 4,5 5 22,5
5-6 5,5 7 38,5
6-7 6,5 10 65
7-8 7,5 12 90
8-9 8,5 9 76,5
9 and over 9,5 7 66,5
Total x 50 359

Firstly we close the first and the last classes, given that we use equal classes of 1 million lei. The
average sales will be:

∑ xi f 4,5 x 2+5,5 x 7+ 6,5 x 10+7,5 x 12+8,5 x 9+ 9,5 x 7 359


Xi = i
= 50
= 50 = 7,18 thousand lei.
∑fi
Harmonic mean is used to determine speed, time or space indicators. In economic
calculations, the harmonic mean is used to determine average index of prices. Formulas are:
n
xh=
1
∑x
a) For individual data: i or

xh=
∑n
∑ ( x1 ∗n i )
b) For frequency distribution series: i
Square mean is used in the study of variation of a phenomenon around the central value
(standard deviation). Formulas are:
∑ x 2i
a) For individual data:
x p=
√ n or
x 2i ∗ni

b) For frequency distribution series: ∑ ni


x p=

Geometric mean applies for economic calculation such as time series analysis (determining
the average index for changes in economic phenomenon). Formulas are:

4
Ass. Prof. Ph. D. L. Marcu, 2019-2020

a) For individual data:


xg= √n ∏ xi sau
∑ ni ni
b) For frequency distribution series: x g= √∏ x i

Among the four types of means there is the following relation: h g p x <x <x <x
As a result, for the same set of data, the lowest value has the harmonic mean and the
highest one is the square mean.
3.2. The Median3

The median is the value of the variable that divides data series into two equal parts in
terms of number of observations. To determine the median there are two situations:
- The case of individual data;
- The case of data grouped into classes and presented as frequency distribution.

If individual data series:


- Individual data are put in increasing order;
- The median is the value of central term (observation) if the series has an odd
number of terms;
- If the series has a even number of terms, the median will be equal to the
simple average of the two terms of the series centre.
Example of determining the median for a series of seven terms: 20, 18, 21, 27, 35, 19, 22.
- We put the values in increasing order: 18, 19, 20, 21, 22, 27, 35.
- The value of the centre series is 21 so Me = 21 (there are 3 terms before Me
and 3 terms after Me).

If frequency distribution series:


- We determine median location (loc Me) with the formula: Loc Me =
∑ f i +1
2
Thus we know the term that divide the series into two equal parts in terms of
the number of observations.
- We determine the interval that contains the median. For this, we calculate
cumulative increasing frequency. Median interval is the interval in which the
cumulative increasing frequency is higher than median location (loc Me).
- We calculate the median value with the formula:
∑ fi+1 − f
Me = X0 + h 2
∑ ¿ where: h = width interval; FMe = frequency of median
f Me
interval; Σfprec = the sum of the cumulative frequencies from the first interval to the previous
interval of median interval;
∑ f i +1 = median location.
2

Example: We know the following store distribution of a retailer according to the monthly
turnover. It is required to determine the median.
3
In Romanian: “mediana”.

5
Ass. Prof. Ph. D. L. Marcu, 2019-2020

Turnover Number of Cumulative frequencies


(thousand lei) stores increasing decreasing
50-60 5 5 50
60-70 12 17 45
70-80 17 34 33
80-90 9 43 16
90-100 7 50 7
Total 50 x x

Loc Me =
∑ f i +1 =
50+1
= 25,5
2 2

Median interval: 70-80

∑ fi+1 −
Me = X0 + h 2
∑ f ¿ = 70 + 10 25,5−17 = 75 thousand lei
17
f Me

The median can be determined graphically using the polygon of cumulative frequencies.
In this case, the median is on horizontal axis, the right at the intersection point of cumulative
increasing frequencies curve and cumulative decreasing frequencies curve.

Median determination by graphical representation:

As noted, the median not depends on extreme values, but only on the number of terms
and how they are assigned in the first half of the series. As a result, the median can replace
average when the series has an approximately symmetric distribution and if we know the total
number of terms of the series and the first part of their distribution.

3.3. The Mode4

4
In Romanian: “modul”.

6
Ass. Prof. Ph. D. L. Marcu, 2019-2020

The mode (Mo) expresses the variable value to which there is a tendency of terms
concentration. Mode calculation has sense only to the extent that there is a tendency for
concentration of individual data to specific values.
For the individual data, the mode is equal to the variable value which occurs most
frequently.
Example: a store sold in a day the following pairs of men’s shoes sizes: 39, 45, 39, 42,
43, 44, 42, 43, 42, 40, 45, 42. It is noted that the most sold size is 42 ( 4 times), so Mo = 42.

For data grouped in the form of distribution series:


- The mode interval is determined as the higher frequency interval.
Δ1
Mo=x 0 +h
- We calculate the value of the mode with the formula: Δ1 + Δ 2

Where
x 0 = the lower limit of mode interval, h = width of mode interval, ∆ and ∆ are
1 2

differences calculated in relation to the frequency of mode interval: Δ1 = frequency of mode


interval – frequency of previous interval, Δ2 = frequency of mode interval – frequency of next
interval.

Example: Mode calculation for trading company Alfa (mentioned above)

Classes by sales Number of


(thousand Lei) employees (fi)
4-5 5
5-6 7
6-7 10
7-8 12
8-9 9
9-10 7
Total 50

Mode interval is: 7-8


12−10
Value of mode is: Mo = 7+1 = 7,4thousand lei (ie the most common sales
( 12−10 ) +(12−9)
value is 7,4 thousand lei).

In practice it often finds that individual data tend to focus to two or more values. In this
case, it can lead to more values for the mode and we can say that we are dealing with bimodal
series (two peaks) or multimodal series (multi-peaks).

4. Measures of Variability5

Variability aimed at measuring the spreading of individual values from the central
tendency.
5
In Romanian: “Indicatorii variatiei”.

7
Ass. Prof. Ph. D. L. Marcu, 2019-2020

4.1. Simple indicators of variability

A. Maximum absolute amplitude6

Maximum absolute amplitude (A) is calculated as the difference between the maximum
value of the variable studied (Xmax) and the minimum value (Xmin). It is belived that the spread
is greater, since the maximum absolute amplitude is higher.
A = Xmax - Xmin

B. Interquartile deviation and interdecile deviation7

Quartiles are values of variable studied that divide the data series, sorted in ascending
order, into four equal parts in terms of number of observations. Therefore, we can determine
three quartiles: Q1, Q2, Q3. To determine quartiles we proceed as the median.
Interquartile deviation is determined as the difference between Q3 and Q1:
Ainterq = Q3-Q1
Deciles are values of variable studied that divide data series, ordered ascending, in ten
equal parts in terms of number of observations. Consequently it can be nine deciles (D1, D2,
D3..... D9). Interdecile deviation is the difference between D9 and D1:
Ainterd = D9-D1

C. Absolute deviation8

Absolute deviation is determined as the difference between the individual values of the
variable and the variable mean.
di=x i− x
The main disadvantage of this indicator lies in the fact that it cannot be used in
subsequent calculations because individual values higher than average are compensated by the
individual values lower than average so that their sum is always zero.
Σ(xi - x́ ) = 0
Therefore, simple indicators of variability, although they offer an insight into the level of
variation, have limitations which may distort findings of statistical analysis. As a result, there
were created indicators that express in a single value the full measure of variation.

4.2. Synthetic indicators of variability

A. Mean absolute deviation (MAD)9


6
In Romanian: “Amplitudinea maximă absolută”.
7
In Romanian: “Abaterea interquartilica si abaterea interdecilica”.
8
In Romanian: “Abaterea lineară”.
9
In Romanian: “Abaterea medie lineară”.

8
Ass. Prof. Ph. D. L. Marcu, 2019-2020

Mean absolute deviation (or average absolute deviation) (d́) is a simple or weighted
arithmetical mean of absolute deviations taken into absolute value. In order to overcome the
disadvantage of absolute deviation (Σdi = 0), the values of absolute deviation are taken positive,
which allow their summation.
∑ |x i−x́|∗f i
d́ =
∑ fi
The mean absolute deviation is expressed in the unit of the variable. The deviation value
is higher, the spread is greater; the value is lower, the series is homogeneous.

B. Variance10

Variance is a synthetic indicator which measures the spreading of values and is


determined as an arithmetic mean, simple or weighted of squared deviations of individual values
from the mean. .
Determination of the variance for individual data:
2 ∑ ( x i−x )2
σ =
n
Determination of the variance for frequency distribution series:
σ2 = ∑ ¿¿ ¿
The proprieties of the variance:
- If all terms of the series are equal between them and equal to a constant, then
the variance is zero.
- If each term of the series is reduced to ”K” times, then the variance of the new
series will be”K2” times lower.
- The variance calculated from the square of deviations of individual values
from the constant ”a” is greater than the variance obtained in the same
deviations from their mean value by the square of the difference between the
2 2 2
arithmetic mean and the constant “a”. Namely: σ x-a = σ x + ( x́ -a)

The variance value is higher, the spread is greater, since the variance value is lower, the
series is homogeneous. The variance has the disadvantage that it has no unit of measurement.

C. Standard deviation11

The standard deviation measures the variation of a phenomenon and is calculated as a


squared mean of the deviations of the individual data from the mean. Standard deviation has the
same units as the original data analysed. The indicator value is higher, the series values are
scattered; the lower the value, the more homogeneous series.
10
In Romanian: “Dispersia”.
11
In Romanian: “Abaterea medie pătratică”.

9
Ass. Prof. Ph. D. L. Marcu, 2019-2020

σ= √ σ 2
∑ ( x i −x )2
For individual data:
σ=
√ n
∑ ( x i− x́)2 f i
For data grouped into classes: σ=
√ ∑ fi
D. Coefficient of variation

Coefficient of variation allows the comparison of spreading for two or more data series
characterising the various phenomena (expressed in different units of measurement). It is a
dimensionless indicator (expressed as a ratio or a percentage).
σ d́
v = * 100 or v = * 100
x́ x́

The coefficient of variation is greater, the series is spread, since the coefficient is closer
to zero, the series is homogeneous. It is considered that if the coefficient of variation is less than
35% the series is homogeneous and in this case the mean is representative. If the coefficient of
variation exceeds 35% (0,35) the series is not homogeneous and the mean should be regarded
with some reserve.

5. Asymmetry Indicators

Asymmetry indicators apply especially when considering the situation at sector level
(example: the SME sector).
A series can be symmetrical or asymmetrical. An asymmetric distribution is characterized
by the fact that the frequencies of variable investigated are non-uniformly distributed to the
central tendency values of the series (x́, Me, Mo).
A symmetric series satisfies the condition: x=Me=Mo . The asymmetry is measured
by the difference: as=x−Mo .
Asymmetric series can be in two states:
- left asymmetry if Mo<x so that ( x−Mo )>0 (positive asymmetry);
- right asymmetry if x< Mo so that ( x−Mo<0 ) (negative asymmetry).
The degree of asymmetry is measured by the coefficient K. Pearson:
x−Mo
C Pas=
σ −1≤C Pas≤1
P P
The series is moderately asymmetric if
|C as|≤0.3 , and when C as tends to zero, then
the series tends to be symmetrical.
For a bimodal series (series with two maximal frequencies) it is used the Fisher’s
coefficient.

10
Ass. Prof. Ph. D. L. Marcu, 2019-2020

3 ( x−Me )
C Fas=
σ −3≤C asF ≤3

If
|C F |>0.3 there is a strong asymmetry and the measure of central tendency are
as
unrepresentative.

11

You might also like