Transformation and Expectation
1 Function of a random variable
Assume that X is a random variable with pmf/pdf fX , cdf FX .
Denote the sample space of X by X .
Then any function of X, say Y = g(X), is also a random variable.
• Examples: Y = X, X + 5, 3X 2 , e−X , |X|, I{X > 0}.
• Denote the sample space of Y by Y. So g : X → Y.
Question: How to determine the distribution of Y ?
For any subset A ⊂ Y, we have
P (Y ∈ A) = P (g(X) ∈ A)
= P ({x ∈ X : g(x) ∈ A})
= P (X ∈ g −1 (A)),
where
g −1 (A) = {x ∈ X : g(x) ∈ A}.
Note that
• g −1 (A) is the set of points in X that g(x) takes into the set A
• g −1 is an inverse mapping from subsets of Y to subsets of X . It can
be defined for any function g (g is not necessarily one-to-one and/or
onto).
2 Transformation of Discrete X
Assume X is discrete with the pmf fX (x) = P (X = x).
Let Y = g(X), then the sample space of Y is
Y = {y : y = g(x), x ∈ X }.
• Since X is countable, so is Y. Therefore, Y is also a discrete random
variable.
27
• The pmf of Y can be computed as following
fY (y) = P (Y = y) = P (g(X) = y)
= P (X ∈ g −1 ({y}))
X
= fX (x), for any y ∈ Y.
x∈g −1 (y)
Example. The distribution of X is
x −2 −1 0 1 2
fX (x) 0.1 0.2 0.4 0.2 0.1
Y = |X|. Find the pmf of Y .
Example: (Binomial Transformation) Toss a coin n times whose
probability of head is p. X is the number of heads. Then X has a binomial
distribution, denoted as ∼ Bin(n, p), with the pmf
n x
fX (x) = P (X = x) = p (1 − p)n−x , x = 0, · · · , n.
x
Let Y denote the number of tails, i.e., Y = n − X. Find the pmf of Y .
28
3 Transformation of Continuous X
Assume that both X and Y are continuous. It is convenient to define
X = {x : fX (x) > 0}, Y = {y : y = g(x) for some x ∈ X }.
The set {x : fX (x) > 0} is the support set of X.
The cdf of Y = g(X) is
FY (y) = P (Y ≤ y) = P (g(X) ≤ y)
= P ({x ∈ X : g(x) ≤ y})
Z
= fX (x)dx.
{x∈X :g(x)≤y}
Example: (Uniform Transformation) Suppose X has a uniform distri-
bution on the interval (0, 2π). Let Y = sin2 (X). Describe the cdf of Y .
29
3.1 g Strictly Monotone (Increasing or Decreasing)
Theorem. Let X ∼ FX with support X . Let Y = g(X) ∼ FY with the
sample space Y.
(a) If g is increasing, then FY (y) = FX (g −1 (y)).
(b) If g is decreasing, then FY (y) = 1 − FX (g −1 (y)).
(c) Assume X has density fX . If g is monotone and g −1 is continuously
differentiable on Y. Then
(
d −1
fX (g −1 (y)) dy g (y) , y ∈ Y,
fY (y) =
0, otherwise.
Proof.
30
Example. X ∼ Unif(0, 1).
Y = X3
Y = 1/X
Y = − log X. (uniform-exponential)
31
Example. (Inverted Gamma) Assume X ∼ Gamma(α, β). Y = 1/X.
Theorem. (cdf Transformation)
Let X have continuous cdf FX (x). Define the cdf transformation Y =
FX (X). Then Y ∼ Unif[0, 1].
32
Application: The cdf transformation is a general method to generating
random variables.
Let F be any cdf. If you want to generate an observation from a popu-
lation with cdf F , do the following
(i) Generate a uniform random number u from (0, 1), i.e. U ∼ Unif(0, 1).
(ii) If F is strictly monotone, define x = F −1 (u); otherwise define x as
inf{x : F (x) ≥ u}.
Then X ∼ F .
Example. Exponential F (x) = 1 − e−x/β , x > 0.
ex
Example. Logistic F (x) = 1+ex , x > 0.
33
Example. (Normal-chi squared relationship) Assume X ∼ N (0, 1), with
the pdf
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞.
2π
Let Y = X 2
Let Y = X 4 .
35
Example. Assume X ∼ Uniform (−1, 1).
Let Y = X 2 for X ≤ 0 and Y = X for X > 0.
Example. Assume X ∼ N (0, 1).
Let Y = X 2 for X ≤ 0 and Y = X for X > 0.
36
4 Expected Values
Let X be a random variable with pdf or pmf f (x). The expected value or
mean of g(X) is defined as
R∞
E(g(X)) = P −∞ g(x)f (x)dx, if X is continuous,
x∈X g(x)f (x), if X is discrete.
provided that the integral or sum exists.
If E|g(X)| = ∞, we say that the expectation does not exist.
Example. X ∼ Unif(0, 1). Find E(X) and E(X 2 ).
Example. X ∼ Binomial(n, p) with pmf
n x
fX (x) = p (1 − p)n−x , for x = 0, . . . , n.
x
38
Example. X has a Exp(β), β > 0, distribution with pdf
fX (x) = β −1 e−x/β , for x > 0; = 0 otherwise.
Find E(X) and E(eX ).
Example. (Cauchy mean) X has pdf fX (x) = π −1 (1 + x2 )−1 for all x.
39
Theorem.
Assume Eg1 (X) and Eg2 (X) exist.
E(ag1 (X) + bg2 (X) + c) = aE(g1 (X)) + bE(g2 (X)) + c.
g(x) ≥ 0 =⇒ E(g(X)) ≥ 0.
g1 (x) ≥ g2 (x) =⇒ E(g1 (X)) ≥ E(g2 (X)).
a ≤ g(x) ≤ b =⇒ a ≤ E(g(X)) ≤ b.
E(X − E(X)) = 0.
Result. The expected value of X is a good predictor of X in the
following sense, E(X − b)2 is minimized by b = E(X).
40
Remark 1: Assume Y = g(X), and X ∼ fX , Y ∼ fY . There are two ways
of computing E(Y )R
(1) E(g(X))R= g(x)fX (x)dx.
(2) E(Y ) = yfY (y)dy.
Are they equal?
Example. X ∼ Unif(0, 1). Let g(X) = − log(X).
Remark 2: (Expectation of non-negative random variables)
Assume X ≥ 0 and has the cdf FX . Then
R∞
E(X) = P 0 (1 − FX (x))dx, X continuous,
∞
k=0 (1 − FX (k)), X discrete.
Example. mean of exponential.
41
5 Median.
Assume X is continuous and has the cdf FX . Its median m is the value
which satisfies FX (m) = 1/2, that is,
Z m Z ∞
1
FX (m) = fX (x)dx = fX (x)dx = .
−∞ m 2
Equivalently, m = FX−1 (1/2).
Example. Assume X has the pdf f (x) = 3x2 , 0 < x < 1.
Example. Assume X ∼ Exponential(β).
Example. Cauchy.
Example. Symmetric.
42
6 Variance and Standard Deviation
The variance of a random variable X is defined as
var(X) = E(X − E(X))2 .
Use σ 2 = var(X). The standard deviation of X is the square root of var(X)
p
σ = var(X).
• Both variance and standard deviation measure the degree of spread of
a distribution around its mean E(X).
Theorem. Computational formula:
var(X) = E(X 2 ) − (E(X))2 ,
var(aX + b) = a2 var(X).
Example. X ∼ Unif[0,1]. Compute var(X).
Example. Binomial. Assume X is discrete and has the pmf
n x
fX (x) = p (1 − p)n−x , for x = 0, . . . , n.
x
43
7 Moments
For each positive integer r, the rth (raw) moment of X is
µ0r = E(X r ).
The rth central moment
µr = E(X − E(X))r .
Note that
µ00 = µ0 = 1, µ01 = E(X), µ1 = 0, µ2 = var(X).
Example. Binomial.
The rth factorial moment of X is defined as
µ[r] = E(X(X − 1) · · · (X − r + 1)), ∀r ≥ 1.
It is useful for the calculation of moments of discrete distributions.
44
Two special moments: Skewness and Kurtosis
Skewness of a distribution measures its departure from symmetry
3/2
γ1 = µ3 /σ 3 = µ3 /µ2 .
• For symmetric random variables, µ3 = E(X − E(X))3 = 0, so is γ1 .
• γ1 is not unit free.
Example. Exponential.
45
Kurtosis of a distribution measures its peakedness of the distribution, de-
fined using the fourth moment
µ4 µ4
γ2 = 2 − 3 = 4 − 3.
µ2 σ
• Division by σ 4 is to make kurtosis a pure number.
• Subtraction of 3 is a convention, so that kurtosis is zero for the normal
distribution.
• γ2 > 0 leptokurtic (high peak, fat tail), γ2 < 0 platyokurtic (low peak,
thin tail).
• For the normal, γ1 = γ2 = 0.
Example. Uniform.
46
8 Moment Generating Function.
Assume X has the cdf FX and the pmf/pdf fX . The moment generating
function (MGF) of X is defined as
P tx
e f (x), if X discrete,
MX (t) = E(e ) = R ∞ txX
tX
−∞ e fX (x)dx, if X continuous.
• Interpretation: Note
tX t2 X 2
etX = 1 + + + ···
1! 2!
so intuitively
tE(X) t2 E(X 2 )
E(etX ) = 1 + + + ···
1! 2!
Thus, the coefficient of tr /r! in the infinite Taylor series expansion
of MX (t) is µ0r = E(X r ). That’s why M (t) is called the moment
generating function — its expansion generates the moments.
• In the same way, µr is obtained from the expansion of E(et(X−E(X)) ) =
e−tE(X) MX (t).
• Note that the above step needs some justification, as we know the
that the expectation of a sum is equal to the sum of expectations only
when there are finitely many terms involved. This is justified whenever
M (t) < ∞ for all t in an open interval containing 0.
Computation Formula
M (0) = 1.
MaX+b (t) = etb MX (at).
r
0 r d
µr = E(X ) = MX (t) .
dtr t=0
48
Example. Binomial.
Example. Exponential.
49
Example. Uniform
50
Gamma function. R∞
An expression that often appears is an integral of the form 0 e−x xα−1 dx,
where α > 0. It can be shown that for all α > 0, the above integral is finite.
Obviously, its value depends on α. We denote the value by Γ(α), i.e.,
Z ∞
Γ(α) = e−x xα−1 dx,
0
called the gamma function. Note that
R∞
• Γ(1) = 0 e−x dx = 1.
• If α > 1, then by integration by parts,
Z ∞
∞
Γ(α + 1) = xα (−e−x ) 0 − (−e−x )αxα−1 dx = αΓ(α).
0
• In particular, for any positive integer n > 1,
Γ(n) = (n−1)Γ(n−1) = (n−1)(n−2)Γ(n−2) = · · · = (n−1)(n−2) · · · 1Γ(1) = (n−1)!.
51
What determines the distribution uniquely?
• If X and Y have bounded supports, and share the same moment se-
quence, then they have the same distribution.
• If MX (t) = MY (t) < ∞ in a neighborhood of 0, then X and Y have
the same distribution.
The above also holds in a limiting sense.
• If Xn is a sequence of random variables and X is another such that
MXn (t) → MX (t), ∀t in a neighborhood of 0,
then the distribution of Xn “converges” to the distribution of X. We
say that Xn converges to X in distribution.
(If X is continuous, the “convergence” means that FXn (x) → FX (x) for all
continuous x.)
(If Xn and X are all discrete taking values on {0, 1, . . .}, then “convergence”
means convergence of pmf’s.)
Example. Convergence of binomial to Poisson.
X the number of successes in n trials with probability of success p.
n large, p small, but np is moderate.
n → ∞, p → 0, np → λ.
52