Section 53
Section 53
Summer 2006
µ ¶
n x n−x e−λλx
lim P [X = x] = lim p (1 − p) =
n→∞ n→∞ x x!
Proof.
µ ¶ µ ¶ µ ¶x µ ¶n−x
n x n λ λ
lim p (1 − p)n−x = lim 1−
n→∞ x n→∞ x n n
µ ¶µ ¶−x µ ¶n
n! 1 λ λ
= λx 1 − 1 −
x!(n − x)! nx n n
Convergence in Distribution 1
µ ¶µ ¶−x µ ¶n
n! 1 λ λ
= λx 1 − 1 −
x!(n − x)! nx n n
µ ¶n
λx n! 1 λ
= lim 1−
x! n→∞ (n − x)! (n − λ)x n
| {z } | {z }
→1 →e−λ
e−λλx
=
x!
2
(Note in the plot, bars correspond to the true binomial probabilities and the
red circles correspond to the Poisson approximation.)
Convergence in Distribution 2
lambda = 1 n = 10 p = 0.1 lambda = 1 n = 50 p = 0.02 lambda = 1 n = 200 p = 0.005
p(x)
p(x)
0.1
0.0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
x x x
0.15
0.20
0.15
0.15
0.10
0.10
p(x)
p(x)
p(x)
0.10
0.05
0.05
0.05
0.00
0.00
0.00
0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12 0 2 4 6 8 10 12
x x x
Convergence in Distribution 3
Example: Let Y1, Y2, . . . be iid Exp(1). Then
Xn = Y1 + Y2 + . . . + Yn ∼ Gamma(n, 1)
which has
√
E[Xn] = n; Var(Xn) = n; SD(Xn) = n
X√
n −n
Thus Zn = n
has mean = 0 and variance = 1.
Let
Xn − n √
Zn = √ ; Xn = n + nZn
n
√ √
fZn (z) = fXn (n + nz) × n
Convergence in Distribution 4
Z b
P [a ≤ Zn ≤ b] = fZn (z)dz
a
Z b√ √
= nfXn (n + nz)dz
a
Z b√ √
(n + nz)n−1 −(n+√nz)
= n e dz
a (n − 1)!
n −n
√
To go further we need Stirling’s Formula: n! ≈ n e 2πn. So
√
√ √ √
−n−z n
√ n−1 n
fXn (n + nz) n = e (n + z n)
(n − 1)!
−n−z n
√ √ √
e (n + z n)n−1 n
≈ √
(n − 1) n−1 e −n+1 2πn
µ ¶n
1 −z n √ z
≈√ e 1+√
2π | n
{z }
gn (z)
Convergence in Distribution 5
µ ¶
√ z
log(gn(z)) = −z n + n log 1 + √
n
· 2 3
¸ µ ¶
√ z 1z 1 z 1 1
= −z n + n √ − + 3/2 − . . . ≈ − z 2 + O √
n 2n 3n 2 n
so
√ √ 1 2
fXn (n + z n) n ≈ √ e−z /2
2π
Thus
Z b
1 −z2/2
P [a ≤ Zn ≤ b] → √ e dz = P [a ≤ Z ≤ b]
a 2π
Convergence in Distribution 6
√
Another way of thinking of this, is that the distribution of Xn = n + Zn n
approaches that of a N (n, n).
n= 2 n= 5 n = 10
0.20
0.12
0.3
0.15
0.08
0.2
0.10
f(x)
f(x)
f(x)
0.04
0.05
0.1
0.00
0.00
0.0
−2 0 2 4 6 −2 0 2 4 6 8 10 12 0 5 10 15 20
x x x
n = 20 n = 50 n = 100
0.04
0.05
0.08
0.03
0.04
0.06
0.03
0.02
f(x)
f(x)
f(x)
0.04
0.02
0.01
0.02
0.01
0.00
0.00
x x x
Convergence in Distribution 7
Definition. Let X1, X2, . . . be a sequence of RVs with cumulative
distribution functions F1, F2, . . . and let X be a RV with distribution
F . We say Xn Converges in Distribution to X if
D
at every point at which F is continuous. Xn −→ X
P [a ≤ Xn ≤ b] → P [a ≤ X ≤ b]
Convergence in Distribution 8
Note that an equivalent definition of convergence in distribution is that
D
Xn −→ X if E[g(Xn)] → E[g(X)] for all bounded, continuous functions
g(·).
Convergence in Distribution 9
For the Gamma/Normal example
µ ¶ Ã !n
t √
−t n 1 √
−t n
MZn (t) = MXn √ e = e
n 1 − √tn
Similarly to the earlier proof, its easier to work with log MZn (t)
µ ¶
√ t
log MZn (t) = −t n − n log 1 − √
n
· 2 3
¸
√ t 1t 1 t
= −t n − n − √ − − 3/2 − . . .
n 2 n 3n
µ ¶
1 2 1
= t +O √
2 n
Thus
t2 /2
MZn (t) → e
which is the MGF for a standard normal.
Convergence in Distribution 10
Central Limit Theorem
n
X
Sn = Xn
i=1
Then
· ¸
Sn − nµ
lim P √ ≤ x = Φ(x)
n→∞ σ n
That is
Sn − nµ D
√ −→ N (0, 1)
σ n
µ µ ¶¶n
n t
MSn (t) = (M (t)) ; MZn (t) = M √
σ n
1 00 1 22
M (t) = M (0) + M (0)t + M (0)t + ²t = 1 + σ t + O(t3)
02
2 2
µ ¶ µ ¶2 õ ¶3 !
t 1 2 t t
M √ =1+ σ √ +O √
σ n 2 σ n σ n
µ ¶
t2 1
=1+ +O
2n n3/2
µ 2
µ ¶¶n
t 1 t2 /2
MZn (t) = 1 + +O →e
2n n3/2
2
Note that the requirement of a MGF is not needed for the theorem to hold.
In fact, all that is needed is that Var(Xi) = σ 2 < ∞. A standard proof of
this more general theorem uses the characteristic function (which is defined
for any distribution)
Z ∞
φ(t) = eitxf (x)dx = M (it)
−∞
√
instead of the moment generating function M (t), where i = −1.
Thus the CLT holds for distributions such as the log normal, even though
it doesn’t have a MGF.
X̄n − µ √ X̄n − µ D
Zn = √ = n −→ N (0, 1)
σ/ n σ
To see this is the same, just multiply the numerator and denominator by n
in the first form to get the statement about Sn.
Note that this probability statement does not use anything about the
distribution of the original policy claims except their mean and standard
deviation. Its probable that their distribution is highly skewed right (since
µx << σx), but the calculations ignore this fact.
X − np D p̂ − p D
p n −→ N (0, 1); p n −→ N (0, 1)
np(1 − p) p(1 − p)/n
µ ¶
approx. approx. p(1 − p)
Xn ∼ N (np, np(1 − p)); p̂n ∼ N p,
n
1
This approximation works better when p is closer to 2 than when p is near
0 or 1.
A rule of thumb is that is ok to use the normal approximation when np ≥ 5
and n(1 − p) ≥ 5 (expect at least 5 successes and 5 failures). (Other books
sometimes suggest other values, with the most popular alternative being
10.)
p(x)
p(x)
0.10
0.05
0.00
0 1 2 3 4 5 6 7 3 5 7 9 12 15 18 21 31 36 41 46 51 56 61 66
x x x
0.10
0.05
0.20
0.08
0.04
0.15
0.06
0.03
p(x)
p(x)
p(x)
0.10
0.04
0.02
0.05
0.02
0.01
0.00
0.00
0.00
0 1 2 3 4 5 6 7 8 9 14 17 20 23 26 29 32 35 78 84 90 96 103 111 119
x x x
0.30
0.6
0.20
p(x)
p(x)
p(x)
0.4
0.10
0.2
0.00
0.0
0 1 2 0 1 2 3 4 0 2 4 6 8 11 14 17 20
x x x
0.04
0.12
0.3
0.03
0.08
0.2
p(x)
p(x)
p(x)
0.02
0.04
0.1
0.01
0.00
0.00
0.0
x x x
0.12
P [p̂ ≤ 0.24] = P [X ≤ 12]
0.08
0.04
Notice that the bar corresponding
0.00
to X = 12, the normal curve only 0 5 10 15 20 25 30
picks up about half the area, as
x
the bar actually goes from 11.5 to
12.5. p = 0.3 n = 50
0.12
The normal approximation can be
0.08
12.5.
0.00
10 11 12 13 14 15
p = 0.3 n = 50
0.12
0.08
0.04
0.00
0 5 10 15 20 25 30
• Big n
In the following plots, the histogram represents 10,000 simulated X̄s, the
black curves are the true densities or CDFs, and the red curves are the
normal approximations.
0.6
0.6
0.8
0.5
0.5
0.6
0.4
0.4
Density
Density
Density
0.3
0.3
0.4
0.2
0.2
0.2
0.1
0.1
0.0
0.0
0.0
0 2 4 6 8 10 12 0 1 2 3 4 5 6 0 1 2 3 4
X X X
Exponential samples − n = 10 Exponential samples − n = 50 Exponential samples − n = 100
4
1.2
2.5
1.0
3
2.0
0.8
1.5
Density
Density
Density
2
0.6
1.0
0.4
1
0.5
0.2
0.0
0.0
0
0.5 1.0 1.5 2.0 2.5 0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4
X X X
0.4
0.25
0.15
0.20
0.3
0.15
Density
Density
Density
0.10
0.2
0.10
0.05
0.1
0.05
0.00
0.00
0.0
0 5 10 15 2 4 6 8 10 12 2 4 6 8 10
X X X
Gamma(5,1) samples − n = 10 Gamma(5,1) samples − n = 50 Gamma(5,1) samples − n = 100
1.2
0.5
1.5
1.0
0.4
0.8
1.0
0.3
Density
Density
Density
0.6
0.2
0.4
0.5
0.1
0.2
0.0
0.0
0.0
2 3 4 5 6 7 8 4.0 4.5 5.0 5.5 6.0 6.5 4.5 5.0 5.5 6.0
X X X
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
F (X )
F (X )
F (X )
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 1 2 3 4 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5
X X X
Exponential samples − n = 10 Exponential samples − n = 50 Exponential samples − n = 100
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
F (X )
F (X )
F (X )
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0 0.5 1.0 1.5 2.0 0.4 0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4
X X X
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
F (X )
F (X )
F (X )
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 2 3 4 5 6 7 8 9
X X X
Gamma(5,1) samples − n = 10 Gamma(5,1) samples − n = 50 Gamma(5,1) samples − n = 100
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
F (X )
F (X )
F (X )
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
2 3 4 5 6 7 8 4.0 4.5 5.0 5.5 6.0 4.0 4.5 5.0 5.5
X X X
Then Pn
i=1 (Xi − µi ) D
Yn = −→ Z ∼ N (0, 1)
cn
Bn
if cn →0
Proof. Omitted 2
The condition involving Bi and ci has to do with each term in the sum
having roughly the same weight. We don’t want the sum to be dominated
by a few terms.
3
2
1
0
Weight
5.0
n
X
4.5
SS(θ) = (Yi − θXi)2
i=1
4.0
Fuel Use
3.5
which is minimized by
Pn
3.0
i=1 Xi Y i
θ̂ = Pn 2
i=1 Xi
2.5
2.0
What is the distribution of θ̂ − θ? 2000 2500 3000 3500 4000
Car Weight
Pn Pn
Xi(θXi + ²i)
i=1P i=1 Xi ²i
θ̂ = n 2
=θ+ P n 2
X
i=1 i X
i=1 i
Note that Pn
Xi²i σ
pPi=1
n × pPn = (θ̂ − θ)
2 2 2
i=1 Xi σ i=1 Xi
implying qP
n 2 D 2
(θ̂ − θ) i=1 Xi −→ N (0, σ )
So µ ¶
approx. σ2
θ̂ ∼ N θ, Pn 2
i=1 Xi
5.0
0.114 (which implies that each
4.5
additional 100 pounds of weight
4.0
appears to add 0.114 gallons to
Fuel Use
the fuel use on average).
3.5
3.0
The estimate of σ is s = 0.3811.
2.5
This gives a standard error of
2.0
s
qP = 0.00126 2000 2500 3000 3500 4000
93 2
i=1 Xi Weight
(Red line: fitted line. Green lines: 95% confidence intervals of the fitted line.)
They all have the basic form (under different technical conditions)
which imply
approx. approx.
Sn ∼ N (E[Sn], Var(Sn)) or X̄n ∼ N (E[X̄n], Var(X̄n))
They say, that when an effect is the sum of a large number of small,
roughly equally weighted terms, the effect should be approximately normally
distributed.
Histograms of adult men and women’s heights are both well described by
normal densities.
D
1. Xn + Yn −→ X + c
D
2. XnYn −→ cX
Xn D Xn
3. If c 6= 0, Yn −→ c
D
4. Let f (x, y) be a continuous function. Then f (Xn, Yn) −→ f (X, c)
Example: Suppose X1, X2, . . . are iid with E[Xi] = µ, Var(Xi) = σ 2. What
are the distributions of the t-statistics
X̄n − µ
Tn = √
Sn / n
as n → ∞.
X̄n − µ D
√ −→ N (0, 1)
σ/ n
P Sn P
2. Sn2 −→ σ 2, or σ −→ 1
· ¸Á
X̄n − µ Sn D N (0, 1)
T = √ −→ = N (0, 1)
σ/ n σ 1