0% found this document useful (0 votes)
26 views23 pages

Random Variable Transformations Explained

Uploaded by

gudwnsrla1202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

Random Variable Transformations Explained

Uploaded by

gudwnsrla1202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Transformation and Expectation

1 Function of a random variable


Assume that X is a random variable with pmf/pdf fX , cdf FX .
Denote the sample space of X by X .
Then any function of X, say Y = g(X), is also a random variable.

• Examples: Y = X, X + 5, 3X 2 , e−X , |X|, I{X > 0}.

• Denote the sample space of Y by Y. So g : X → Y.

Question: How to determine the distribution of Y ?


For any subset A ⊂ Y, we have

P (Y ∈ A) = P (g(X) ∈ A)
= P ({x ∈ X : g(x) ∈ A})
= P (X ∈ g −1 (A)),

where
g −1 (A) = {x ∈ X : g(x) ∈ A}.
Note that

• g −1 (A) is the set of points in X that g(x) takes into the set A

• g −1 is an inverse mapping from subsets of Y to subsets of X . It can


be defined for any function g (g is not necessarily one-to-one and/or
onto).

2 Transformation of Discrete X
Assume X is discrete with the pmf fX (x) = P (X = x).
Let Y = g(X), then the sample space of Y is

Y = {y : y = g(x), x ∈ X }.

• Since X is countable, so is Y. Therefore, Y is also a discrete random


variable.

27
• The pmf of Y can be computed as following

fY (y) = P (Y = y) = P (g(X) = y)
= P (X ∈ g −1 ({y}))
X
= fX (x), for any y ∈ Y.
x∈g −1 (y)

Example. The distribution of X is


x −2 −1 0 1 2
fX (x) 0.1 0.2 0.4 0.2 0.1

Y = |X|. Find the pmf of Y .

Example: (Binomial Transformation) Toss a coin n times whose


probability of head is p. X is the number of heads. Then X has a binomial
distribution, denoted as ∼ Bin(n, p), with the pmf
 
n x
fX (x) = P (X = x) = p (1 − p)n−x , x = 0, · · · , n.
x

Let Y denote the number of tails, i.e., Y = n − X. Find the pmf of Y .

28
3 Transformation of Continuous X
Assume that both X and Y are continuous. It is convenient to define

X = {x : fX (x) > 0}, Y = {y : y = g(x) for some x ∈ X }.

The set {x : fX (x) > 0} is the support set of X.


The cdf of Y = g(X) is

FY (y) = P (Y ≤ y) = P (g(X) ≤ y)
= P ({x ∈ X : g(x) ≤ y})
Z
= fX (x)dx.
{x∈X :g(x)≤y}

Example: (Uniform Transformation) Suppose X has a uniform distri-


bution on the interval (0, 2π). Let Y = sin2 (X). Describe the cdf of Y .

29
3.1 g Strictly Monotone (Increasing or Decreasing)

Theorem. Let X ∼ FX with support X . Let Y = g(X) ∼ FY with the


sample space Y.
(a) If g is increasing, then FY (y) = FX (g −1 (y)).
(b) If g is decreasing, then FY (y) = 1 − FX (g −1 (y)).
(c) Assume X has density fX . If g is monotone and g −1 is continuously
differentiable on Y. Then
(
d −1
fX (g −1 (y)) dy g (y) , y ∈ Y,
fY (y) =
0, otherwise.

Proof.

30
Example. X ∼ Unif(0, 1).
Y = X3

Y = 1/X

Y = − log X. (uniform-exponential)

31
Example. (Inverted Gamma) Assume X ∼ Gamma(α, β). Y = 1/X.

Theorem. (cdf Transformation)


Let X have continuous cdf FX (x). Define the cdf transformation Y =
FX (X). Then Y ∼ Unif[0, 1].

32
Application: The cdf transformation is a general method to generating
random variables.
Let F be any cdf. If you want to generate an observation from a popu-
lation with cdf F , do the following

(i) Generate a uniform random number u from (0, 1), i.e. U ∼ Unif(0, 1).

(ii) If F is strictly monotone, define x = F −1 (u); otherwise define x as


inf{x : F (x) ≥ u}.

Then X ∼ F .

Example. Exponential F (x) = 1 − e−x/β , x > 0.

ex
Example. Logistic F (x) = 1+ex , x > 0.

33
Example. (Normal-chi squared relationship) Assume X ∼ N (0, 1), with
the pdf
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞.

Let Y = X 2

Let Y = X 4 .

35
Example. Assume X ∼ Uniform (−1, 1).
Let Y = X 2 for X ≤ 0 and Y = X for X > 0.

Example. Assume X ∼ N (0, 1).


Let Y = X 2 for X ≤ 0 and Y = X for X > 0.

36
4 Expected Values

Let X be a random variable with pdf or pmf f (x). The expected value or
mean of g(X) is defined as
 R∞
E(g(X)) = P −∞ g(x)f (x)dx, if X is continuous,
x∈X g(x)f (x), if X is discrete.

provided that the integral or sum exists.


If E|g(X)| = ∞, we say that the expectation does not exist.

Example. X ∼ Unif(0, 1). Find E(X) and E(X 2 ).

Example. X ∼ Binomial(n, p) with pmf


 
n x
fX (x) = p (1 − p)n−x , for x = 0, . . . , n.
x

38
Example. X has a Exp(β), β > 0, distribution with pdf

fX (x) = β −1 e−x/β , for x > 0; = 0 otherwise.

Find E(X) and E(eX ).

Example. (Cauchy mean) X has pdf fX (x) = π −1 (1 + x2 )−1 for all x.

39
Theorem.
Assume Eg1 (X) and Eg2 (X) exist.
E(ag1 (X) + bg2 (X) + c) = aE(g1 (X)) + bE(g2 (X)) + c.
g(x) ≥ 0 =⇒ E(g(X)) ≥ 0.
g1 (x) ≥ g2 (x) =⇒ E(g1 (X)) ≥ E(g2 (X)).
a ≤ g(x) ≤ b =⇒ a ≤ E(g(X)) ≤ b.
E(X − E(X)) = 0.

Result. The expected value of X is a good predictor of X in the


following sense, E(X − b)2 is minimized by b = E(X).

40
Remark 1: Assume Y = g(X), and X ∼ fX , Y ∼ fY . There are two ways
of computing E(Y )R
(1) E(g(X))R= g(x)fX (x)dx.
(2) E(Y ) = yfY (y)dy.
Are they equal?

Example. X ∼ Unif(0, 1). Let g(X) = − log(X).

Remark 2: (Expectation of non-negative random variables)


Assume X ≥ 0 and has the cdf FX . Then
 R∞
E(X) = P 0 (1 − FX (x))dx, X continuous,

k=0 (1 − FX (k)), X discrete.

Example. mean of exponential.

41
5 Median.
Assume X is continuous and has the cdf FX . Its median m is the value
which satisfies FX (m) = 1/2, that is,
Z m Z ∞
1
FX (m) = fX (x)dx = fX (x)dx = .
−∞ m 2

Equivalently, m = FX−1 (1/2).

Example. Assume X has the pdf f (x) = 3x2 , 0 < x < 1.

Example. Assume X ∼ Exponential(β).

Example. Cauchy.

Example. Symmetric.

42
6 Variance and Standard Deviation
The variance of a random variable X is defined as

var(X) = E(X − E(X))2 .

Use σ 2 = var(X). The standard deviation of X is the square root of var(X)


p
σ = var(X).

• Both variance and standard deviation measure the degree of spread of


a distribution around its mean E(X).

Theorem. Computational formula:

var(X) = E(X 2 ) − (E(X))2 ,


var(aX + b) = a2 var(X).

Example. X ∼ Unif[0,1]. Compute var(X).

Example. Binomial. Assume X is discrete and has the pmf


 
n x
fX (x) = p (1 − p)n−x , for x = 0, . . . , n.
x

43
7 Moments
For each positive integer r, the rth (raw) moment of X is

µ0r = E(X r ).

The rth central moment

µr = E(X − E(X))r .

Note that

µ00 = µ0 = 1, µ01 = E(X), µ1 = 0, µ2 = var(X).

Example. Binomial.

The rth factorial moment of X is defined as

µ[r] = E(X(X − 1) · · · (X − r + 1)), ∀r ≥ 1.

It is useful for the calculation of moments of discrete distributions.

44
Two special moments: Skewness and Kurtosis

Skewness of a distribution measures its departure from symmetry


3/2
γ1 = µ3 /σ 3 = µ3 /µ2 .

• For symmetric random variables, µ3 = E(X − E(X))3 = 0, so is γ1 .

• γ1 is not unit free.

Example. Exponential.

45
Kurtosis of a distribution measures its peakedness of the distribution, de-
fined using the fourth moment
µ4 µ4
γ2 = 2 − 3 = 4 − 3.
µ2 σ

• Division by σ 4 is to make kurtosis a pure number.

• Subtraction of 3 is a convention, so that kurtosis is zero for the normal


distribution.

• γ2 > 0 leptokurtic (high peak, fat tail), γ2 < 0 platyokurtic (low peak,
thin tail).

• For the normal, γ1 = γ2 = 0.

Example. Uniform.

46
8 Moment Generating Function.
Assume X has the cdf FX and the pmf/pdf fX . The moment generating
function (MGF) of X is defined as
 P tx
e f (x), if X discrete,
MX (t) = E(e ) = R ∞ txX
tX
−∞ e fX (x)dx, if X continuous.

• Interpretation: Note

tX t2 X 2
etX = 1 + + + ···
1! 2!
so intuitively

tE(X) t2 E(X 2 )
E(etX ) = 1 + + + ···
1! 2!
Thus, the coefficient of tr /r! in the infinite Taylor series expansion
of MX (t) is µ0r = E(X r ). That’s why M (t) is called the moment
generating function — its expansion generates the moments.

• In the same way, µr is obtained from the expansion of E(et(X−E(X)) ) =


e−tE(X) MX (t).

• Note that the above step needs some justification, as we know the
that the expectation of a sum is equal to the sum of expectations only
when there are finitely many terms involved. This is justified whenever
M (t) < ∞ for all t in an open interval containing 0.

Computation Formula

M (0) = 1.
MaX+b (t) = etb MX (at).
 r 
0 r d
µr = E(X ) = MX (t) .
dtr t=0

48
Example. Binomial.

Example. Exponential.

49
Example. Uniform

50
Gamma function. R∞
An expression that often appears is an integral of the form 0 e−x xα−1 dx,
where α > 0. It can be shown that for all α > 0, the above integral is finite.
Obviously, its value depends on α. We denote the value by Γ(α), i.e.,
Z ∞
Γ(α) = e−x xα−1 dx,
0

called the gamma function. Note that


R∞
• Γ(1) = 0 e−x dx = 1.

• If α > 1, then by integration by parts,


Z ∞
∞
Γ(α + 1) = xα (−e−x ) 0 − (−e−x )αxα−1 dx = αΓ(α).

0

• In particular, for any positive integer n > 1,

Γ(n) = (n−1)Γ(n−1) = (n−1)(n−2)Γ(n−2) = · · · = (n−1)(n−2) · · · 1Γ(1) = (n−1)!.

51
What determines the distribution uniquely?

• If X and Y have bounded supports, and share the same moment se-
quence, then they have the same distribution.

• If MX (t) = MY (t) < ∞ in a neighborhood of 0, then X and Y have


the same distribution.

The above also holds in a limiting sense.

• If Xn is a sequence of random variables and X is another such that

MXn (t) → MX (t), ∀t in a neighborhood of 0,

then the distribution of Xn “converges” to the distribution of X. We


say that Xn converges to X in distribution.

(If X is continuous, the “convergence” means that FXn (x) → FX (x) for all
continuous x.)
(If Xn and X are all discrete taking values on {0, 1, . . .}, then “convergence”
means convergence of pmf’s.)

Example. Convergence of binomial to Poisson.

X the number of successes in n trials with probability of success p.


n large, p small, but np is moderate.
n → ∞, p → 0, np → λ.

52

You might also like