0% found this document useful (0 votes)
46 views35 pages

Section 53

Glimpse on penal section 53

Uploaded by

alfiya87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views35 pages

Section 53

Glimpse on penal section 53

Uploaded by

alfiya87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Convergence in Distribution

Central Limit Theorem


Statistics 110

Summer 2006

c 2006 by Mark E. Irwin


Copyright °
Convergence in Distribution

Theorem. Let X ∼ Bin(n, p) and let λ = np, Then

µ ¶
n x n−x e−λλx
lim P [X = x] = lim p (1 − p) =
n→∞ n→∞ x x!

So when n gets large, we can approximate binomial probabilities with


Poisson probabilities.

Proof.
µ ¶ µ ¶ µ ¶x µ ¶n−x
n x n λ λ
lim p (1 − p)n−x = lim 1−
n→∞ x n→∞ x n n
µ ¶µ ¶−x µ ¶n
n! 1 λ λ
= λx 1 − 1 −
x!(n − x)! nx n n

Convergence in Distribution 1
µ ¶µ ¶−x µ ¶n
n! 1 λ λ
= λx 1 − 1 −
x!(n − x)! nx n n
µ ¶n
λx n! 1 λ
= lim 1−
x! n→∞ (n − x)! (n − λ)x n
| {z } | {z }
→1 →e−λ

e−λλx
=
x!
2

Note that approximation works better when n is large and p is small as


can been seen in the following plot. If p is relatively large, a different
approximation should be used. This is coming later.

(Note in the plot, bars correspond to the true binomial probabilities and the
red circles correspond to the Poisson approximation.)

Convergence in Distribution 2
lambda = 1 n = 10 p = 0.1 lambda = 1 n = 50 p = 0.02 lambda = 1 n = 200 p = 0.005

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35


0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.3
0.2
p(x)

p(x)

p(x)
0.1
0.0

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4

x x x

lambda = 5 n = 10 p = 0.5 lambda = 5 n = 50 p = 0.1 lambda = 5 n = 200 p = 0.025

0.15
0.20

0.15
0.15

0.10
0.10
p(x)

p(x)

p(x)
0.10

0.05
0.05
0.05
0.00

0.00

0.00
0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12 0 2 4 6 8 10 12

x x x

Convergence in Distribution 3
Example: Let Y1, Y2, . . . be iid Exp(1). Then

Xn = Y1 + Y2 + . . . + Yn ∼ Gamma(n, 1)

which has

E[Xn] = n; Var(Xn) = n; SD(Xn) = n

X√
n −n
Thus Zn = n
has mean = 0 and variance = 1.

Lets compare its distribution to Z ∼ N (0, 1). i.e. Is

P [−1 ≤ Zn ≤ 2] ≈ P [−1 ≤ Z ≤ 2]?

Let
Xn − n √
Zn = √ ; Xn = n + nZn
n
√ √
fZn (z) = fXn (n + nz) × n

Convergence in Distribution 4
Z b
P [a ≤ Zn ≤ b] = fZn (z)dz
a
Z b√ √
= nfXn (n + nz)dz
a
Z b√ √
(n + nz)n−1 −(n+√nz)
= n e dz
a (n − 1)!

n −n

To go further we need Stirling’s Formula: n! ≈ n e 2πn. So

√ √ √
−n−z n
√ n−1 n
fXn (n + nz) n = e (n + z n)
(n − 1)!
−n−z n
√ √ √
e (n + z n)n−1 n
≈ √
(n − 1) n−1 e −n+1 2πn
µ ¶n
1 −z n √ z
≈√ e 1+√
2π | n
{z }
gn (z)

Convergence in Distribution 5
µ ¶
√ z
log(gn(z)) = −z n + n log 1 + √
n
· 2 3
¸ µ ¶
√ z 1z 1 z 1 1
= −z n + n √ − + 3/2 − . . . ≈ − z 2 + O √
n 2n 3n 2 n

so

√ √ 1 2
fXn (n + z n) n ≈ √ e−z /2

Thus

Z b
1 −z2/2
P [a ≤ Zn ≤ b] → √ e dz = P [a ≤ Z ≤ b]
a 2π

So as n increases, the distribution of Zn gets closer and closer to a N (0, 1).

Convergence in Distribution 6

Another way of thinking of this, is that the distribution of Xn = n + Zn n
approaches that of a N (n, n).

n= 2 n= 5 n = 10

0.20

0.12
0.3

0.15

0.08
0.2

0.10
f(x)

f(x)

f(x)

0.04
0.05
0.1

0.00

0.00
0.0

−2 0 2 4 6 −2 0 2 4 6 8 10 12 0 5 10 15 20

x x x

n = 20 n = 50 n = 100

0.04
0.05
0.08

0.03
0.04
0.06

0.03

0.02
f(x)

f(x)

f(x)
0.04

0.02

0.01
0.02

0.01
0.00

0.00

5 10 15 20 25 30 35 30 40 50 60 70 0.00 70 80 90 100 110 120 130

x x x

Convergence in Distribution 7
Definition. Let X1, X2, . . . be a sequence of RVs with cumulative
distribution functions F1, F2, . . . and let X be a RV with distribution
F . We say Xn Converges in Distribution to X if

lim Fn(x) = F (x)


n→∞

D
at every point at which F is continuous. Xn −→ X

An equivalent statement to this is that for all a and b where F is continuous

P [a ≤ Xn ≤ b] → P [a ≤ X ≤ b]

Note that if Xn and X are discrete distributions, this condition reduces to


P [Xn = xi] → P [X = xi] for all support points xi.

Convergence in Distribution 8
Note that an equivalent definition of convergence in distribution is that
D
Xn −→ X if E[g(Xn)] → E[g(X)] for all bounded, continuous functions
g(·).

This statement of convergence in distribution is needed to help prove the


following theorem

Theorem. [Continuity Theorem] Let Xn be a sequence of random


variables with cumulative distribution functions Fn(x) and corresponding
moment generating functions Mn(t). Let X be a random variable with
cumulative distribution function F (x) and moment generating function
M (t). If Mn(t) → M (t) for all t in an open interval containing zero, then
D
Fn(x) → F (x) at all continuity points of F . That is Xn −→ X.

Thus the previous two examples (Binomial/Poisson and Gamma/Normal)


could be proved this way.

Convergence in Distribution 9
For the Gamma/Normal example

µ ¶ Ã !n
t √
−t n 1 √
−t n
MZn (t) = MXn √ e = e
n 1 − √tn

Similarly to the earlier proof, its easier to work with log MZn (t)
µ ¶
√ t
log MZn (t) = −t n − n log 1 − √
n
· 2 3
¸
√ t 1t 1 t
= −t n − n − √ − − 3/2 − . . .
n 2 n 3n
µ ¶
1 2 1
= t +O √
2 n

Thus
t2 /2
MZn (t) → e
which is the MGF for a standard normal.

Convergence in Distribution 10
Central Limit Theorem

Theorem. [Central Limit Theorem (CLT)] Let X1, X2, X3, . . . be a


sequence of independent RVs having mean µ and variance σ 2 and a
common distribution function F (x) and moment generating function M (t)
defined in a neighbourhood of zero. Let

n
X
Sn = Xn
i=1

Then

· ¸
Sn − nµ
lim P √ ≤ x = Φ(x)
n→∞ σ n

That is
Sn − nµ D
√ −→ N (0, 1)
σ n

Central Limit Theorem 11


Proof. Without a loss of generality, we can assume that µ = 0. So let
Zn = σS√nn . Since Sn is the sum of n iid RVs,

µ µ ¶¶n
n t
MSn (t) = (M (t)) ; MZn (t) = M √
σ n

Taking a Taylor series expansion of M (t) around 0 gives

1 00 1 22
M (t) = M (0) + M (0)t + M (0)t + ²t = 1 + σ t + O(t3)
02
2 2

since M (0) = 1, M 0(0) = µ = 0, M 00(0) = σ 2. So

µ ¶ µ ¶2 õ ¶3 !
t 1 2 t t
M √ =1+ σ √ +O √
σ n 2 σ n σ n
µ ¶
t2 1
=1+ +O
2n n3/2

Central Limit Theorem 12


This gives

µ 2
µ ¶¶n
t 1 t2 /2
MZn (t) = 1 + +O →e
2n n3/2
2

Note that the requirement of a MGF is not needed for the theorem to hold.
In fact, all that is needed is that Var(Xi) = σ 2 < ∞. A standard proof of
this more general theorem uses the characteristic function (which is defined
for any distribution)
Z ∞
φ(t) = eitxf (x)dx = M (it)
−∞


instead of the moment generating function M (t), where i = −1.

Thus the CLT holds for distributions such as the log normal, even though
it doesn’t have a MGF.

Central Limit Theorem 13


Also, the CLT is often presented in the following equivalent form

X̄n − µ √ X̄n − µ D
Zn = √ = n −→ N (0, 1)
σ/ n σ

To see this is the same, just multiply the numerator and denominator by n
in the first form to get the statement about Sn.

The common way that this is used is that


µ ¶
approx. ¡ 2
¢ approx. σ 2
Sn ∼ N nµ, nσ or X̄n ∼ N µ,
n

Central Limit Theorem 14


Example: Insurance claims
Suppose that an insurance company has 10,000 policy holders. The expected
yearly claim per policyholder is $240 with a standard deviation of $800.
What is the approximate probability that the total yearly claims S10,000 >
$2.6 Million
E[S10,000] = 10, 000 × 240 = 2, 400, 000
p
SD(S10,000) = 10, 000 × 800 = 80, 000

P [S10,000 > 2, 600, 000]


· ¸
S10,000 − 2, 400, 000 2, 600, 000 − 2, 400, 000
=P >
80, 000 80, 000
≈ P [Z > 2.5] = 0.0062

Note that this probability statement does not use anything about the
distribution of the original policy claims except their mean and standard
deviation. Its probable that their distribution is highly skewed right (since
µx << σx), but the calculations ignore this fact.

Central Limit Theorem 15


One consequence of the CLT is the normal approximation to the binomial.
If Xn ∼ Bin(n, p) and p̂n = Xnn , then (since Xn can be thought of the sum
of n Bernoulli’s)

X − np D p̂ − p D
p n −→ N (0, 1); p n −→ N (0, 1)
np(1 − p) p(1 − p)/n

Another way of think of this is that

µ ¶
approx. approx. p(1 − p)
Xn ∼ N (np, np(1 − p)); p̂n ∼ N p,
n
1
This approximation works better when p is closer to 2 than when p is near
0 or 1.
A rule of thumb is that is ok to use the normal approximation when np ≥ 5
and n(1 − p) ≥ 5 (expect at least 5 successes and 5 failures). (Other books
sometimes suggest other values, with the most popular alternative being
10.)

Central Limit Theorem 16


p = 0.25 n = 10 p = 0.25 n = 50 p = 0.25 n = 200

0.00 0.02 0.04 0.06 0.08 0.10 0.12

0.00 0.01 0.02 0.03 0.04 0.05 0.06


0.25
0.20
0.15
p(x)

p(x)

p(x)
0.10
0.05
0.00

0 1 2 3 4 5 6 7 3 5 7 9 12 15 18 21 31 36 41 46 51 56 61 66

x x x

p = 0.5 n = 10 p = 0.5 n = 50 p = 0.5 n = 200

0.10

0.05
0.20

0.08

0.04
0.15

0.06

0.03
p(x)

p(x)

p(x)
0.10

0.04

0.02
0.05

0.02

0.01
0.00

0.00

0.00
0 1 2 3 4 5 6 7 8 9 14 17 20 23 26 29 32 35 78 84 90 96 103 111 119

x x x

Central Limit Theorem 17


p = 0.01 n = 10 p = 0.01 n = 100 p = 0.01 n = 1000

0.00 0.02 0.04 0.06 0.08 0.10 0.12


0.8

0.30
0.6

0.20
p(x)

p(x)

p(x)
0.4

0.10
0.2

0.00
0.0

0 1 2 0 1 2 3 4 0 2 4 6 8 11 14 17 20

x x x

p = 0.1 n = 10 p = 0.1 n = 100 p = 0.1 n = 1000

0.04
0.12
0.3

0.03
0.08
0.2
p(x)

p(x)

p(x)

0.02
0.04
0.1

0.01
0.00

0.00
0.0

0 1 2 3 4 1 3 5 7 9 12 15 18 71 79 87 95 104 114 124

x x x

Central Limit Theorem 18


Continuity correction to the binomial approximation

Suppose that X ∼ Bin(50, 0.3)


and we are interested in p = 0.3 n = 50

0.12
P [p̂ ≤ 0.24] = P [X ≤ 12]

0.08
0.04
Notice that the bar corresponding

0.00
to X = 12, the normal curve only 0 5 10 15 20 25 30
picks up about half the area, as
x
the bar actually goes from 11.5 to
12.5. p = 0.3 n = 50
0.12
The normal approximation can be
0.08

improved if we ask for the area


under the normal curve up to
0.04

12.5.
0.00

10 11 12 13 14 15

Central Limit Theorem x 19


Let Y ∼ N (15, 10.5) (approximating normal). Then

P [X ≤ 12] = 0.2229 (True Probability)


P [Y ≤ 12] = 0.1773 (No correction)
P [Y ≤ 12.5] = 0.2202 (With correction)

p = 0.3 n = 50
0.12
0.08
0.04
0.00

0 5 10 15 20 25 30

Central Limit Theorem 20


While this does give a better answer for many problems, normally I
recommend ignoring it. If the correction makes a difference, you probably
want to be doing an exact probability calculation instead.
When will the CLT work better?

• Big n

• Distribution of Xi close to normal. Approximation holds exactly if n = 1


if Xi ∼ N (µ, σ 2).

• Xi roughly symmetric. As we observed with the binomial examples, the


closer p was to 0.5, thus closer to symmetry, the better the approximation
works. The more skewness there is in the distribution of the observations,
the bigger n needs to be.

In the following plots, the histogram represents 10,000 simulated X̄s, the
black curves are the true densities or CDFs, and the red curves are the
normal approximations.

Central Limit Theorem 21


Exponential samples − n = 1 Exponential samples − n = 2 Exponential samples − n = 5

0.6

0.6

0.8
0.5

0.5

0.6
0.4

0.4
Density

Density

Density
0.3

0.3

0.4
0.2

0.2

0.2
0.1

0.1
0.0

0.0

0.0
0 2 4 6 8 10 12 0 1 2 3 4 5 6 0 1 2 3 4

X X X
Exponential samples − n = 10 Exponential samples − n = 50 Exponential samples − n = 100

4
1.2

2.5
1.0

3
2.0
0.8

1.5
Density

Density

Density

2
0.6

1.0
0.4

1
0.5
0.2
0.0

0.0

0
0.5 1.0 1.5 2.0 2.5 0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4

X X X

Central Limit Theorem 22


Gamma(5,1) samples − n = 1 Gamma(5,1) samples − n = 2 Gamma(5,1) samples − n = 5

0.4
0.25
0.15

0.20

0.3
0.15
Density

Density

Density
0.10

0.2
0.10
0.05

0.1
0.05
0.00

0.00

0.0
0 5 10 15 2 4 6 8 10 12 2 4 6 8 10

X X X
Gamma(5,1) samples − n = 10 Gamma(5,1) samples − n = 50 Gamma(5,1) samples − n = 100
1.2
0.5

1.5
1.0
0.4

0.8

1.0
0.3
Density

Density

Density
0.6
0.2

0.4

0.5
0.1

0.2
0.0

0.0

0.0
2 3 4 5 6 7 8 4.0 4.5 5.0 5.5 6.0 6.5 4.5 5.0 5.5 6.0

X X X

Central Limit Theorem 23


Exponential samples − n = 1 Exponential samples − n = 2 Exponential samples − n = 5

1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
F (X )

F (X )

F (X )
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 1 2 3 4 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5

X X X
Exponential samples − n = 10 Exponential samples − n = 50 Exponential samples − n = 100
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
F (X )

F (X )

F (X )
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0.0 0.5 1.0 1.5 2.0 0.4 0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4

X X X

Central Limit Theorem 24


Gamma(5,1) samples − n = 1 Gamma(5,1) samples − n = 2 Gamma(5,1) samples − n = 5

1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
F (X )

F (X )

F (X )
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 2 3 4 5 6 7 8 9

X X X
Gamma(5,1) samples − n = 10 Gamma(5,1) samples − n = 50 Gamma(5,1) samples − n = 100
1.0

1.0

1.0
0.8

0.8

0.8
0.6

0.6

0.6
F (X )

F (X )

F (X )
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
2 3 4 5 6 7 8 4.0 4.5 5.0 5.5 6.0 4.0 4.5 5.0 5.5

X X X

Central Limit Theorem 25


There are other forms of the CLT, which relax the assumptions about the
distribution. One example is,

Theorem. [Liapunov’s CLT] Let X1, X2, . . . be independent random


variables with E[Xi] = µi, Var(Xi) = σi2, and E[|Xi − µ|] = βi. Let
à n !1/3 à n !1/2 à n !
X X X
Bn = βi cn = σi2 = SD Xi .
i=1 i=1 i=1

Then Pn
i=1 (Xi − µi ) D
Yn = −→ Z ∼ N (0, 1)
cn
Bn
if cn →0

Proof. Omitted 2

The condition involving Bi and ci has to do with each term in the sum
having roughly the same weight. We don’t want the sum to be dominated
by a few terms.

Central Limit Theorem 26


Example: Regression through the origin
Let Xi = weight of car i and Yi = fuel in gallons to go 100 miles.
Model: Yi = θXi + ²i where ²i are independent errors with

E[²i] = 0, Var(²i) = σ 2, E[|²i|3] < ∞


5
4
Fuel Use

3
2
1
0

0 1000 2000 3000 4000

Weight

Central Limit Theorem 27


How to estimate θ from data?

Minimize the least squares


criterion

5.0
n
X

4.5
SS(θ) = (Yi − θXi)2
i=1

4.0
Fuel Use

3.5
which is minimized by
Pn

3.0
i=1 Xi Y i
θ̂ = Pn 2
i=1 Xi

2.5
2.0
What is the distribution of θ̂ − θ? 2000 2500 3000 3500 4000

Car Weight
Pn Pn
Xi(θXi + ²i)
i=1P i=1 Xi ²i
θ̂ = n 2
=θ+ P n 2
X
i=1 i X
i=1 i

Central Limit Theorem 28


Let Zi = Xi²i. Thus E[Zi] = 0, Var(Zi) = Xi2σ 2. Thus
Pn
i=1 (Xi ²i − 0) D
pPn −→ N (0, 1)
2 2
i=1 Xi σ

Note that Pn
Xi²i σ
pPi=1
n × pPn = (θ̂ − θ)
2 2 2
i=1 Xi σ i=1 Xi

implying qP
n 2 D 2
(θ̂ − θ) i=1 Xi −→ N (0, σ )

So µ ¶
approx. σ2
θ̂ ∼ N θ, Pn 2
i=1 Xi

Central Limit Theorem 29


If weight is measured in 100’s of
pounds, the estimate of θ is θ̂ =

5.0
0.114 (which implies that each

4.5
additional 100 pounds of weight

4.0
appears to add 0.114 gallons to

Fuel Use
the fuel use on average).

3.5
3.0
The estimate of σ is s = 0.3811.

2.5
This gives a standard error of

2.0
s
qP = 0.00126 2000 2500 3000 3500 4000
93 2
i=1 Xi Weight

which implies we are estimating θ


very precisely in this case.
approx.
θ̂ ∼ N (θ, 0.001262)

(Red line: fitted line. Green lines: 95% confidence intervals of the fitted line.)

Central Limit Theorem 30


There are also versions of the CLT that allow the variables to have limited
levels of dependency.

They all have the basic form (under different technical conditions)

Sn − E[Sn] D X̄n − E[X̄n] D


−→ N (0, 1) or −→ N (0, 1)
SD(Sn) SD(X̄n)

which imply

approx. approx.
Sn ∼ N (E[Sn], Var(Sn)) or X̄n ∼ N (E[X̄n], Var(X̄n))

Central Limit Theorem 31


These mathematical results suggest why the normal distribution is so
commonly seen with real data.

They say, that when an effect is the sum of a large number of small,
roughly equally weighted terms, the effect should be approximately normally
distributed.

For example, peoples heights are influenced by (a potentially) large number


of genes and by various environmental effects.

Histograms of adult men and women’s heights are both well described by
normal densities.

Central Limit Theorem 32


D P
Theorem. [Slutsky’s Theorems] Suppose Xn −→ X and Yn −→ c
(constant). Then

D
1. Xn + Yn −→ X + c

D
2. XnYn −→ cX

Xn D Xn
3. If c 6= 0, Yn −→ c

D
4. Let f (x, y) be a continuous function. Then f (Xn, Yn) −→ f (X, c)

Example: Suppose X1, X2, . . . are iid with E[Xi] = µ, Var(Xi) = σ 2. What
are the distributions of the t-statistics

X̄n − µ
Tn = √
Sn / n

as n → ∞.

Central Limit Theorem 33


As we have seen before

1. By the central limit theorem

X̄n − µ D
√ −→ N (0, 1)
σ/ n

P Sn P
2. Sn2 −→ σ 2, or σ −→ 1

· ¸Á
X̄n − µ Sn D N (0, 1)
T = √ −→ = N (0, 1)
σ/ n σ 1

This result proves that the tn distributions converge to the N (0, 1)


distribution.

Central Limit Theorem 34

You might also like