0% found this document useful (0 votes)

13 views47 pages

Inverse Problems: Gaussian Sampling Techniques

The document discusses inverse problems in the context of sampling from multivariate Gaussian distributions and prior modeling, focusing on techniques such as inverse transform sampling and the coloring transformation. It explains the mathematical foundations for sampling, including change of variables and the use of Gaussian priors, and provides examples of how to implement these methods in MATLAB. Additionally, it addresses challenges in modeling prior distributions and introduces impulse priors for specific applications in image processing.

Uploaded by

clearningaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views47 pages

Inverse Problems: Gaussian Sampling Techniques

Uploaded by

clearningaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Inverse Problems

Sommersemester 2022

Vesa Kaarnioja
[Link]@[Link]

FU Berlin, FB Mathematik und Informatik

Seventh lecture, June 13, 2022

Today’s lecture

Sampling from multivariate Gaussian distributions, inverse transform

sampling
Prior modeling
The linear Gaussian setting
Numerical example
Change of variables

Consider two random variables x ∈ Rn and y ∈ Rn which are related via

the formula
y = f (x),
where f is continuously differentiable and one-to-one (these conditions can
be relaxed).
Then, for any B ∈ B(Rn ), it holds that
Z Z
P(x ∈ B) = P(y ∈ f (B)) = πy (y ) dy = πy (f (x))| det Df (x)| dx,
f (B) B

where Df (x) ∈ Rn×n is the Jacobian matrix of f . In consequence

πx (x) = πy (f (x))| det Df (x)|.

Sampling from Gaussian distributions

Suppose that we want to create a sample of realizations for a multivariate

Gaussian random variable x ∼ N (x0 , C ), with the probability density
1/2
1 1 T −1
πx (x) = n
exp − (x − x0 ) C (x − x0 ) .
(2π) det C 2

Since C −1 is (by assumption) symmetric and positive definite, it has a

Cholesky decomposition
C −1 = R T R,
where R is an upper triangular matrix. The probability density of x can be
alternatively written as
1/2
1 1 2
πx (x) = n
exp − kR(x − x0 )k .
(2π) det C 2

Let us define a new random variable w = R(x − x0 ) ⇔ x = R −1 w + x0 .

On the last slide, we defined w = R(x − x0 ) ⇔ x = R −1 w + x0 , where
x ∼ N (x0 , C ). The change of variables formula yields

πw (w ) = πx (R −1 w + x0 )| det R −1 | = πx (R −1 w + x0 )| det R|−1 .

Noting that
1
= det(C −1 ) = det R T det R = det(R)2 ,
det C
we obtain

1 1
πw (w ) = exp − kw k2 .
(2π)n/2 2

In consequence, w is Gaussian white noise, i.e.,

w ∼ N (0, I ).
Coloring transformation
Let x ∼ N (x0 , C ) (C p.d.) and w ∼ N (0, I ), with C −1 = R T R. Then it
holds that
w = R(x − x0 ) ⇔ x = R −1 w + x0 .
This is the basis of the coloring transformation:

1. Draw w ∈ Rn from N (0, I ).

2. A realization of x ∈ Rn from N (x0 , C ) can be obtained via

x = R −1 w + x0 .

Remark: MATLAB also has the function mvnrnd which can be used to
draw from multivariate normal distributions. However, if the Cholesky
factor R (or its inverse) has been precomputed, it may be slightly more
efficient to apply the coloring transformation, e.g., as

x = R \ randn(n,1) + x0;
Sampling from general univariate distributions

In order to sample a real-valued random variable x directly, we can use its

inverse distribution function. Let us assume that the probability density
π(x) of x is almost surely positive (this condition can be relaxed). Then,
the cumulative distribution function Φ: R → (0, 1) of x is defined by
Z t
Φ(t) = P(x < t) = π(x) dx.
−∞

In other words, Φ is the antiderivative of π. It follows from the

fundamental theorem of calculus that Φ is strictly increasing. In particular,
its inverse Φ−1 : (0, 1) → R exists.
Now, we define a new random variable u = Φ(x). First, we observe that

P(u < t) = P(Φ(x) < t) = P(x < Φ−1 (t))

for all t ∈ (0, 1). However, by definition of the cumulative distribution

function,
Z Φ−1 (t) Z Φ−1 (t)
−1
P(x < Φ (t)) = π(x) dx = Φ0 (x) dx
−∞ −∞
= Φ(Φ−1 (t)) − lim Φ(x) = t.
x→−∞

Hence P(u < t) = t, meaning that u ∼ U(0, 1) is distributed uniformly on

the interval [0, 1]. On the other hand, if u ∼ U(0, 1) is given, then we
obtain a random variable x with density π by setting x = Φ−1 (u). This
reduces drawing a sample from the distribution π to drawing a sample
from a uniform distribution, which can for example be performed in
MATLAB using the rand command.
Inverse transform sampling (“Golden rule”)

An algorithm for drawing from the density π with CDF Φ:

1. Draw t ∼ U(0, 1).

2. Calculate x = Φ−1 (t).

If a closed form expression for the inverse CDF is not available, then a
computationally attractive formula for obtaining the value Φ−1 (t) at a
point t ∈ (0, 1) is based on the identity

Φ−1 (t) = inf{x | Φ(x) ≥ t}.

Remark: The above formula is the expression for the generalized inverse
CDF: the formula with the infimum is valid even in the general case of
weakly monotonic and right-continuous CDFs.
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
●

“Draw t ∼ U(0, 1) and find the smallest value of x such that Φ(x) ≥ t.”
Remarks:
The inverse transform sampling method can be used to sample
univariate densities π(u). However, if the components of a
multivariate density are mutually independent, i.e.,
π(u1 , . . . , un ) = π(u1 ) · · · π(un ) holds a.e., then inverse transform
sampling can be used to generate samples componentwise.
Unfortunately, the components of multivariate posterior distributions
are generally not mutually independent. In the next two weeks, we
will discuss importance sampling and MCMC methods for sampling
high-dimensional (posterior) distributions. These methods are
applicable even when the components of multivariate distributions are
not mutually independent.
Example
Suppose that we have the PDF π(x) := (6x − 6x 2 )χ(0,1) (x). We can
design the following simple scheme based on inverse transform sampling to
draw samples from this distribution.

n = 1e5; % number of samples

x = linspace(0,1);
p = @(x) 6*x-6*x.^2; % PDF
P = cumsum(p(x)); P = P/P(end); % "empirical" CDF of p
samples = [];
for iter = 1:n
u = rand; % draw sample from U(0,1)
ind = find(u <= P,1,’first’); % inverse CDF rule
samples = [samples,x(ind)]; % store sample
end
histogram(samples,’Normalization’,’pdf’); % draw a histogram
hold on, plot(x,p(x),’LineWidth’,3), legend(’samples’,’pdf’);
hold off;
Figure: 105 samples drawn from the distribution given on the previous page
organized as a histogram.
Prior modeling

The prior density should reflect our beliefs on the unknown variable of
interest before taking the measurements into account.

Often, the prior knowledge is qualitative in nature, and transferring the

information into quantitative form expressed through a prior density can
be challenging.

The prior probability distribution should be concentrated on those values

of x we expect to see and assign a clearly higher probability to them than
to the unexpected ones.
Gaussian priors

Gaussian densities

1 1 2
π(x) = √ exp − kx − mkC −1
(2π)d/2 det C 2

are the most used prior distribution in statistical inverse problems. They
are easy to construct and form a versatile class of distributions. They also
often lead to explicit estimators.

Random samples from a standard normal distribution N (0, I ) can usually

be generated directly, for example in Matlab via randn. Samples from a
general normal distribution N (m, C ) and from a wide class of other
distributions can then be derived from those, so that it is often not
necessary to employ the inverse transform method.
Let us consider an image. We divide
this region into n × n pixels and label
the pixels fi,j for i, j ∈ {1, . . . , n}.

Pi,j := {(x, y ); −1 + 2 j−1 j i−1 i

n < x < −1 + 2 n , −1 + 2 n < y < −1 + 2 n }
It is convenient to reshape the
matrix/image (fi,j ) into a vector x of
length d = n2 so that

x(j−1)n+i = fi,j , i, j ∈ {1, . . . , n}.

The image on the left illustrates the

new numbering corresponding to the
pixels.

Note that x = f(:) and f = reshape(x,n,n).

As an example, consider a problem where the unknown is a
two-dimensional pixel image, arranged as a vector x ∈ Rd . The
components xj represent the intensity of the j th pixel. Since we consider
images it is natural to add a positivity constraint to our prior. Assuming
that xi and xj are independent for i 6= j, the Gaussian white noise density
with positivity constraint is

1
π(x) ∝ χ+ (x) exp − 2 kxk2 ,
2α

where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise.

Since we assumed that each component is independent of the others,
random draws can be performed componentwise.
Impulse priors
We assume again that the unknown is a two-dimensional pixel image.
Assume that our prior information is that the image contains small and
well localized objects in an almost constant background.
In such a case we could assume an impulse prior density, which means that
it gives a low average amplitude but allows outliers. The tail of such a
prior distribution is long, although the expected value is small.
Let x ∈ Rd represent the pixel image, where the component xj is the
intensity of the j th pixel. In what follows, xi and xj are assumed to be
independent for i 6= j.
One example of an impulse prior is the `1 prior. It has the density
α d
π(x) = exp(−αkxk1 )
2
with α > 0, where the `1 -norm is defined as
d
X
kxk1 = |xj |.
j=1
The impulse effect can be enhanced by choosing P an even smaller power
p ∈ (0, 1) of the components of x, that is, using dj=1 |xj |p instead of the
`1 -norm.
Another choice that produces images with few distinctly different pixels
and a low-amplitude background is the Cauchy density
n
α n Y 1
π(x) =
π 1 + α2 xj2
j=1

with α > 0.
Since we consider images we add a positivity constraint to our prior. For
the `1 prior, we set
π(x) = αd χ+ (x) exp(−αkxk1 ),
where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise. The
components xj are independent and each have the cumulative distribution
function Z t
Φ(t) = α e−αs ds = 1 − e−αt for all t ≥ 0.
0
Now, we can draw samples of xj using
1
xj = Φ−1 (uj ) = − ln(1 − uj ),
α
where the uj are independent random draws from the uniform distribution
U(0, 1).
Similarly, the components xj of the Cauchy prior with positivity constraint
are independent and have the CDF
2α t
Z
1 2
Φ(t) = 2 2
ds = arctan αt,
π 0 1+α s π
so that the inverse cumulative distribution is Φ−1 (t) = α1 tan πt

2 .
Random draws from the white noise prior with positivity constraint, the
impulse (`1 ) prior, and the Cauchy prior:

10 10 10

20 20 20

30 30 30

40 40 40

50 50 50

60 60 60

70 70 70

80 80 80

90 90 90

100 100 100

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Note that as along as all components are independent, drawing can be

done componentwise using inverse transform sampling. Here, for each
pixel xj , we draw tj from U(0, 1) and calculate xj = Φ−1 (tj ).
Discontinuities
Assume that we want to estimate a one-dimensional signal f : [0, 1] → R
with f (0) = 0 from indirect observations. Our prior knowledge is that the
signal is usually relatively stable but can have large jumps every now and
then. We may also have information on the size of the jumps or the rate
of their occurrence.
We obtain one possible prior by taking the finite difference approximation
of the derivative of f and assigning an impulsive noise distribution to it.
Let us discretize the interval [0, 1] by points tj = j/d and write xj = f (tj ).
Consider the density
d
α d Y 1
π(x) = .
π 1+ α2 (xj − xj−1 )2
j=1

To draw samples from the above distribution we define new random

variables for the jumps
uj = xj − xj−1 , j = 1, . . . , d.
These each have the density
d
α d Y 1
π(u) = .
π 1 + α2 uj2
j=1

In particular, the uj are independent from each other, so that they can be
drawn from a one-dimensional Cauchy density. Also note that
x = (x1 , . . . , xd )T ∈ Rd satisfies x = Lu, where L ∈ Rd×d is a lower
triangular matrix with Lij = 1 for i ≥ j.† Generalizing the idea behind the
above prior leads, e.g., to total variation priors.

†
Note that in MATLAB, it is more efficient to implement this as x = cumsum(u).
Hierarchical models

The prior density may depend on some parameter, such as variance or

mean. So far we have assumed that these parameters are known.
However, we often do not know how to choose them. If a parameter is not
known, it can be estimated as a part of the statistical inference problem on
the data. This leads to hierarchical models that include hypermodels for
the parameters defining the prior density.
Assume that the prior distribution depends on a parameter α, which is
assumed to be unknown. We then write the prior as a conditional density

P(x|α).

We model the unknown α with a hyperprior P(α) = πh (α) and write the
joint distribution of x and α as

P(x, α) = P(x|α) P(α).

Assuming we have a likelihood model P(y |x) for the measurement y , we
get the posterior density for x and α, given y , using Bayes’ formula

P(x, α|y ) ∝ P(y |x, α) P(x, α) = P(y |x, α) P(x|α) P(α).

The hyperprior density πh may again depend on some hyperparameter α0 .

The main reason for the use of a hyperprior model is that the construction
of the posterior is assumed to be more robust with respect to fixing a
value for the hyperparameter α0 than fixing a value for α.
The linear Gaussian setting

In this chapter we study the linear Gaussian setting, where the forward
map F is linear and both the prior distribution and the distribution of the
observational noise η are Gaussian.
For several reasons, it plays a central role in the study of inverse problems.
It arises frequently in applications, either directly or in the form of
posterior distributions that are asymptotically Gaussian in the large data
limit. It also allows computing explicit solutions which can be used to gain
a general understanding. Apart from that, many methods employed in a
nonlinear or non-Gaussian setting build on ideas from the linear Gaussian
case by performing linearization or Gaussian approximation.
Let us suppose that the unknown x ∈ Rd and the data y ∈ Rk follow the
relation
y = Ax + η, (1)
where
1 The forward model is linear, i.e., A ∈ Rk×d .

2 The prior distribution is Gaussian: x ∼ π(x) = N (x , Γ ), where Γ

0 pr pr
is symmetric and positive definite.
3 The noise is Gaussian: η ∼ ν(η) = N (η , Γ ), where Γ is symmetric
0 n n
and positive definite.
4 x and η are independent.

Theorem
Under assumptions 1–4, the posterior distribution corresponding to (1) is
Gaussian with x|y ∼ N (µpost , Γpost ), where we have the posterior mean

µpost = (Γ−1 T −1 −1 T −1 −1
pr + A Γn A) (A Γn (y − η0 ) + Γpr x0 )

and covariance
Γpost = (Γ−1 T −1 −1
pr + A Γn A) .
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain

y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
= exp − y Γn y − y T Γ−1 T −1
n Ax − y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

− η0T Γ−1 T −1 T −1
n y + η0 Γn Ax + η0 Γn η0

+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 + x0 Γpr x0

1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain

y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0

+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 +x0 Γpr x0

1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain

y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0

+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 +x0 Γpr x0

1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 ) .
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
On the previous slide, we arrived at

y 1 T −1 T −1
π (x) ∝ exp − x Γpost x − 2x Γpost µpost .
2
To finish the proof, we “complete the square” by multiplying and dividing by
−1
exp(− 12 µT
post Γpost µpost ). Since this term does not depend on x, we can absorb the
denominator into the implied coefficient to obtain

1 1
π y (x) ∝ exp − x T Γ−1 T −1
exp − µT −1

post x − 2x Γpost µpost post Γpost µpost
2 2

1
= exp − x T Γ−1 T −1 T −1
post x − 2x Γ µ
post post + µ Γ µ
post post post
2

1 T −1 T −1 T −1
= exp − (x − µpost ) Γpost (x − µpost ) + 2x Γpost µpost − 2x Γpost µpost
2

1
= exp − (x − µpost )T Γ−1

post (x − µpost ) ,
2
as desired.
Remark: The previous proof shows that if x ∼ N (x0 , Γpr ) and
η ∼ N (η0 , Γn ), then
x|y ∼ N (µpost , Γpost ),
where
Γpost = (Γ−1 T −1
pr + A Γn A)
−1
(2)
and
µpost = Γpost (AT Γ−1 −1
n (y − η0 ) + Γpr x0 ). (3)

One also has the following alternative representations for the posterior
mean
µpost = x0 + Γpr AT (AΓpr AT + Γn )−1 (y − Ax0 − η0 ) (4)
and the posterior covariance
Γpost = Γpr − Γpr AT (AΓpr AT + Γn )−1 AΓpr . (5)
Formula (5) can be proved, e.g., by using the
Sherman–Morrison–Woodbury formula on (2). Formula (4) can be proved
by plugging the formula (5) into (3) and simplifying the expression
(homework).
As the posterior distribution is Gaussian, its mean and its mode coincide.
This means that the conditional mean estimator and the MAP estimator
coincide in the linear Gaussian setting.
Corollary
The conditional mean estimator and the maximum a posteriori estimator
coincide in the linear Gaussian setting and are given by
x̂CM = x̂MAP = µpost .
Example
γ2
Let Γn = γ 2 I , η0 = 0, Γpr = σ 2 I , x0 = 0, and set λ = σ2
. Then µpost
minimizes
Jλ (x) := ky − Axk2 + λkxk2 .
and therefore satisfies

(AT A + λI )µpost = AT y . (6)

This example provides a connection between Bayesian inference and

variational regularization: Jλ can be interpreted as the objective functional
in a linear regression model with a regularization term λkxk2 . Equation
(6) for µpost is then exactly the normal equation. In the general case,
equation µpost = (Γ−1 T −1 −1 T −1 −1
pr + A Γn A) (A Γn (y − η0 ) + Γpr x0 ) can thus be
viewed as a generalized normal equation. This point of view helps to
understand the structure of Bayesian regularization by linking it to
well-understood optimization approaches for inverse problems.
Numerical example: one-dimensional deconvolution
Let us revisit the deconvolution example from last week: we are interested
in estimating a signal f : [0, 1] → R from noisy, blurred observations
modeled as
Z 1
yi = y (si ) = K (si , t)f (t) dt + ηi , i ∈ {1, . . . , k},
0
where the blurring kernel is

1 2
K (s, t) = exp − 2 (s − t) , ω = 0.5,
2ω
and we have Gaussian measurement noise η ∼ N (η0 , Γnoise ) with a
symmetric, positive definite covariance matrix Γnoise .
1
If si = ki − 2k for i ∈ {1, . . . , k} and we discretize the integral using the
midpoint rule with tj = dj − 2d 1
and xj = f (tj ) for j ∈ {1, . . . , d}, then we
have the discrete linear model
1
y = Ax + η, where Ai,j = K (si , tj ).
d
Linear Gaussian setting

Suppose that we set a Gaussian prior for the unknown x ∼ N (x0 , Γpr ),
where Γpr is a symmetric, positive definite covariance matrix.
Now the posterior probability density of x given the measurement y is

y 1 T −1
π (x) ∝ exp − (x − x) Γpost (x − x) ,
2

where we have the posterior mean

x = x0 + Γpr AT (AΓpr AT + Γnoise )−1 (y − Ax0 − η0 )

and posterior covariance

Γpost = Γpr − Γpr AT (AΓpr AT + Γnoise )−1 AΓpr .

With additive noise η ∼ ν(η) = N (η0 , σ 2 I ), we have the likelihood

1 2
P(y |x) = ν(y − Ax) ∝ exp − 2 ky − Ax − η0 k .
2σ
Let L = tridiag(−1, 2, −1) and consider the following priors

1 2
πpr,1 (x) ∝ exp − 2 kx − x0 k with covariance Γpr,1 = γ 2 I ;
2γ

1 2
πpr,2 (F ) ∝ exp − 2 kL(x − x0 )k
2γ

1
with covariance
T T
= exp − 2 (x − x0 ) (L L)(x − x0 ) Γpr,2 = γ 2 (LT L)−1 ,
2γ
where x0 ∈ Rd is the prior mean (assumed to be the same in both cases).
Hence (from the previous page)
xj = x0 + Γpr,j AT Gj−1 (y − Ax0 − η0 ),
Γpost,j = Γpr,j − Γpr,j AT Gj−1 AΓpr,j ,

where Gj = AΓpr,j AT + Γnoise and Γnoise = σ 2 I .

For the numerical experiment, we simulate measurements using the
(smooth) ground truth signal

f (t) = 8t 3 − 16t 2 + 8t,

which satisfies f (0) = f (1) = 0. The measurements are contaminated with

zero-mean 10% relative noise (σ ≈ 0.0618) and we set d = k = 120.
Remark: When we simulate the measurement data, it is important to
avoid the inverse crime. One way to do this is to generate the
measurement data using a denser grid and then interpolate the forward
solution onto a coarser computational grid, which is actually used to
compute the reconstruction.
Since both the prior and the posterior are now Gaussian, we can use the
coloring transformation to draw samples from the prior and posterior.

See the MATLAB script week7.m on the course webpage!

A note on marginal Gaussian distributions

Let
π(x) ∝ exp(− 21 (x − µ)T Γ−1 (x − µ))
be a multivariate Gaussian PDF with mean µ and positive definite and
symmetric covariance matrix Γ.
Q: What is Γii ?
A: σi2 := Γii is the variance of the marginal distribution with PDF
Z
π(xi ) = π(x1 , . . . , xi , . . . , xn ) dx1 · · · dxi−1 dxi+1 · · · dxn ,
Rn−1

which is itself a (univariate) Gaussian PDF with mean µi .

This is why we can obtain the credibility envelopes by taking the square
roots of the diagonal values of Γpost,j .
Samples drawn from the white noise prior, = 0.2 Samples drawn from the smoothness prior, = 0.001
0.8 0.3

0.6 0.2

0.4 0.1

0.2 0

0 -0.1

-0.2 -0.2

-0.4 -0.3

-0.6 -0.4

-0.8 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, = 0.6864 Samples drawn from the smoothness prior, = 0.0064
2.5 1

2
0.5
1.5

1
0
0.5

0 -0.5

-0.5
-1
-1

-1.5
-1.5
-2

-2.5 -2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, =2 Samples drawn from the smoothness prior, = 0.02
8 10

6 8

6
4

4
2
2
0
0
-2
-2

-4
-4

-6 -6

-8 -8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the white noise prior and the smoothness prior for several values of γ.
Samples drawn from the posterior with white noise prior, = 0.2 Samples drawn from the posterior with smoothness prior, = 0.001
1.4 1.2
ground truth ground truth
posterior mean 2 posterior mean 2
1.2
1

0.8
0.8

0.6 0.6

0.4
0.4

0.2

0.2
0

-0.2 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, = 0.6864 Samples drawn from the posterior with smoothness prior, = 0.0064
4 1.6
ground truth ground truth
posterior mean 2 1.4 posterior mean 2
3
1.2

2 1

0.8
1
0.6
0
0.4

-1 0.2

0
-2
-0.2

-3 -0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, =2 Samples drawn from the posterior with smoothness prior, = 0.02
8 2.5
ground truth ground truth
posterior mean 2 posterior mean 2
6
2

4
1.5

2
1
0

0.5
-2

0
-4

-6 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the posterior corresponding to both the white noise prior and the
smoothness prior for several values of γ. We also plot the ground truth solution and the
posterior mean.

Monte Carlo Simulation Techniques in Python
No ratings yet
Monte Carlo Simulation Techniques in Python
5 pages
Rarefied Gas Dynamics - DSMC Course
No ratings yet
Rarefied Gas Dynamics - DSMC Course
50 pages
Bayesian Inverse Problems Overview
No ratings yet
Bayesian Inverse Problems Overview
32 pages
sheet_5_sol
No ratings yet
sheet_5_sol
5 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
Statistics 202C: Sampling Techniques Guide
No ratings yet
Statistics 202C: Sampling Techniques Guide
14 pages
Sjart st0229
No ratings yet
Sjart st0229
7 pages
Transforming Random Variables Explained
No ratings yet
Transforming Random Variables Explained
5 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
Variables Aleatorias 2
No ratings yet
Variables Aleatorias 2
34 pages
Introduction to Statistical Inference
No ratings yet
Introduction to Statistical Inference
16 pages
Stochastic Methods in Data Analysis
No ratings yet
Stochastic Methods in Data Analysis
50 pages
Dirichlet Mixture Model Sampling Techniques
No ratings yet
Dirichlet Mixture Model Sampling Techniques
18 pages
Inverse Transform Sampling Explained
No ratings yet
Inverse Transform Sampling Explained
4 pages
Bayesian Time Series Econometrics Guide
No ratings yet
Bayesian Time Series Econometrics Guide
72 pages
Gibbs Sampling in MATLAB
No ratings yet
Gibbs Sampling in MATLAB
1 page
Transformations of Random Variables
No ratings yet
Transformations of Random Variables
7 pages
Monte Carlo Methods in Sampling Techniques
No ratings yet
Monte Carlo Methods in Sampling Techniques
30 pages
Sampling Techniques in AI Foundations
No ratings yet
Sampling Techniques in AI Foundations
73 pages
Understanding Random Vectors and Distributions
No ratings yet
Understanding Random Vectors and Distributions
7 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Inverse Transform Sampling Explained
No ratings yet
Inverse Transform Sampling Explained
5 pages
Noninformative Priors for Covariance Matrices
No ratings yet
Noninformative Priors for Covariance Matrices
14 pages
Statistics and Sampling Distributions
No ratings yet
Statistics and Sampling Distributions
7 pages
CS 281: Advanced Machine Learning Course
100% (1)
CS 281: Advanced Machine Learning Course
88 pages
Bayesian Time Series Econometrics Guide
No ratings yet
Bayesian Time Series Econometrics Guide
73 pages
Sampling from Probability Distributions
No ratings yet
Sampling from Probability Distributions
2 pages
Sampling Distributions in Statistics
No ratings yet
Sampling Distributions in Statistics
25 pages
Data Analysis Techniques in Statistics
No ratings yet
Data Analysis Techniques in Statistics
10 pages
Ancestral Sampling in Machine Learning
No ratings yet
Ancestral Sampling in Machine Learning
30 pages
Chi-Squared and Student's t Distributions
No ratings yet
Chi-Squared and Student's t Distributions
28 pages
Core Statistics Overview and Concepts
100% (4)
Core Statistics Overview and Concepts
256 pages
Expected Value of Dice Rolls in ECMT1020
No ratings yet
Expected Value of Dice Rolls in ECMT1020
8 pages
Generating Random Samples from Distributions
No ratings yet
Generating Random Samples from Distributions
10 pages
Discrete Distribution Simulation Methods
No ratings yet
Discrete Distribution Simulation Methods
3 pages
Importance Sampling Techniques Explained
No ratings yet
Importance Sampling Techniques Explained
13 pages
Problems in Mathematical Statistics
No ratings yet
Problems in Mathematical Statistics
282 pages
Understanding Inverse Problems in Statistics
No ratings yet
Understanding Inverse Problems in Statistics
33 pages
Gibbs Sampling in R: A Step-by-Step Guide
No ratings yet
Gibbs Sampling in R: A Step-by-Step Guide
11 pages
Simulating Normal Random Variables in R
No ratings yet
Simulating Normal Random Variables in R
13 pages
Bayesian Inference and Updating Explained
No ratings yet
Bayesian Inference and Updating Explained
6 pages
Sampling Bias in Maximum Entropy Methods
No ratings yet
Sampling Bias in Maximum Entropy Methods
10 pages
Estimation in Statistical Inference
No ratings yet
Estimation in Statistical Inference
49 pages
Monte Carlo Sampling Methods Overview
No ratings yet
Monte Carlo Sampling Methods Overview
32 pages
Random Variables in Econometrics
No ratings yet
Random Variables in Econometrics
35 pages
Inverse Problems: Concepts and Methods
No ratings yet
Inverse Problems: Concepts and Methods
58 pages
Inverse Problems and Data Assimilation
No ratings yet
Inverse Problems and Data Assimilation
206 pages
Bayesian Model for Consumption Disasters
No ratings yet
Bayesian Model for Consumption Disasters
37 pages
Market Risk Measurement Techniques
No ratings yet
Market Risk Measurement Techniques
146 pages
Prior Normalization for MCMC in Bayesian Inverse Problems
No ratings yet
Prior Normalization for MCMC in Bayesian Inverse Problems
35 pages
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
100% (4)
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
282 pages
Wind: Strength Amidst Adversity
No ratings yet
Wind: Strength Amidst Adversity
2 pages
LLM Web Scraping for Novices: A Benchmark
No ratings yet
LLM Web Scraping for Novices: A Benchmark
9 pages
D&D 5E Character Tools & Spell Sheets
No ratings yet
D&D 5E Character Tools & Spell Sheets
62 pages
Grade 9 Review Unit 4 Exercises
No ratings yet
Grade 9 Review Unit 4 Exercises
6 pages
Tiêu Chuẩn Hướng Dẫn Nghiên Cứu Sinh
No ratings yet
Tiêu Chuẩn Hướng Dẫn Nghiên Cứu Sinh
3 pages
MP-BGP EVPN VxLAN Control Plane Lab
No ratings yet
MP-BGP EVPN VxLAN Control Plane Lab
34 pages
Future Tense: Will vs. Going To
No ratings yet
Future Tense: Will vs. Going To
2 pages
Computer Basics Worksheet for Grade VI
No ratings yet
Computer Basics Worksheet for Grade VI
3 pages
"Confronting Labels and Personal Damage"
No ratings yet
"Confronting Labels and Personal Damage"
4 pages
XML Web Development Assignment 04
No ratings yet
XML Web Development Assignment 04
3 pages
Apache Run Python Script - Control WebPanel Wiki PDF
No ratings yet
Apache Run Python Script - Control WebPanel Wiki PDF
4 pages
NCERT Class 8 Maths Solutions PDF
No ratings yet
NCERT Class 8 Maths Solutions PDF
5 pages
Oda Nobunaga in Modern Media Analysis
No ratings yet
Oda Nobunaga in Modern Media Analysis
51 pages
Boillot Michel - Understanding FORTRAN
No ratings yet
Boillot Michel - Understanding FORTRAN
532 pages
Analyzing Prose: Structure and Technique
No ratings yet
Analyzing Prose: Structure and Technique
8 pages
Intermediate Chinese II Course Overview
No ratings yet
Intermediate Chinese II Course Overview
11 pages
Excel 365 VBA Programming Guide
100% (1)
Excel 365 VBA Programming Guide
113 pages
Level AA Reading Resources
No ratings yet
Level AA Reading Resources
15 pages
Science Action Plan 2019-2020
No ratings yet
Science Action Plan 2019-2020
3 pages
State-wise Location Directory
No ratings yet
State-wise Location Directory
804 pages
Present Perfect Exercises and Practice
No ratings yet
Present Perfect Exercises and Practice
2 pages
Industrial Training on Data Structures
No ratings yet
Industrial Training on Data Structures
40 pages
Unit 1 Problem Solving With Computer Part I
No ratings yet
Unit 1 Problem Solving With Computer Part I
38 pages
Document on Indian Religious Texts
100% (1)
Document on Indian Religious Texts
3,871 pages
z196 Cryptography and Security Overview
No ratings yet
z196 Cryptography and Security Overview
29 pages
Business Letter Format Guide
100% (1)
Business Letter Format Guide
6 pages
Understanding Mixed Numbers and Fractions
No ratings yet
Understanding Mixed Numbers and Fractions
23 pages
Ann's Bracelet Circumference Calculation
100% (1)
Ann's Bracelet Circumference Calculation
15 pages
Discovering Music Theory g3 Sample Pages
0% (2)
Discovering Music Theory g3 Sample Pages
5 pages
Wk4, Sermon On The Mount (Blomberg) - 240210 - 100344
No ratings yet
Wk4, Sermon On The Mount (Blomberg) - 240210 - 100344
18 pages

Inverse Problems: Gaussian Sampling Techniques

Uploaded by

Inverse Problems: Gaussian Sampling Techniques

Uploaded by

Inverse Problems

FU Berlin, FB Mathematik und Informatik

Seventh lecture, June 13, 2022

Sampling from multivariate Gaussian distributions, inverse transform

Consider two random variables x ∈ Rn and y ∈ Rn which are related via

where Df (x) ∈ Rn×n is the Jacobian matrix of f . In consequence

πx (x) = πy (f (x))| det Df (x)|.

Suppose that we want to create a sample of realizations for a multivariate

Since C −1 is (by assumption) symmetric and positive definite, it has a

Let us define a new random variable w = R(x − x0 ) ⇔ x = R −1 w + x0 .

πw (w ) = πx (R −1 w + x0 )| det R −1 | = πx (R −1 w + x0 )| det R|−1 .

In consequence, w is Gaussian white noise, i.e.,

1. Draw w ∈ Rn from N (0, I ).

In order to sample a real-valued random variable x directly, we can use its

In other words, Φ is the antiderivative of π. It follows from the

P(u < t) = P(Φ(x) < t) = P(x < Φ−1 (t))

for all t ∈ (0, 1). However, by definition of the cumulative distribution

Hence P(u < t) = t, meaning that u ∼ U(0, 1) is distributed uniformly on

An algorithm for drawing from the density π with CDF Φ:

1. Draw t ∼ U(0, 1).

Φ−1 (t) = inf{x | Φ(x) ≥ t}.

n = 1e5; % number of samples

Often, the prior knowledge is qualitative in nature, and transferring the

The prior probability distribution should be concentrated on those values

Random samples from a standard normal distribution N (0, I ) can usually

Pi,j := {(x, y ); −1 + 2 j−1 j i−1 i

x(j−1)n+i = fi,j , i, j ∈ {1, . . . , n}.

The image on the left illustrates the

Note that x = f(:) and f = reshape(x,n,n).

where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise.

100 100 100

Note that as along as all components are independent, drawing can be

To draw samples from the above distribution we define new random

The prior density may depend on some parameter, such as variance or

P(x, α) = P(x|α) P(α).

P(x, α|y ) ∝ P(y |x, α) P(x, α) = P(y |x, α) P(x|α) P(α).

The hyperprior density πh may again depend on some hyperparameter α0 .

2 The prior distribution is Gaussian: x ∼ π(x) = N (x , Γ ), where Γ

(AT A + λI )µpost = AT y . (6)

This example provides a connection between Bayesian inference and

where we have the posterior mean

x = x0 + Γpr AT (AΓpr AT + Γnoise )−1 (y − Ax0 − η0 )

and posterior covariance

Γpost = Γpr − Γpr AT (AΓpr AT + Γnoise )−1 AΓpr .

where Gj = AΓpr,j AT + Γnoise and Γnoise = σ 2 I .

f (t) = 8t 3 − 16t 2 + 8t,

which satisfies f (0) = f (1) = 0. The measurements are contaminated with

See the MATLAB script week7.m on the course webpage!

which is itself a (univariate) Gaussian PDF with mean µi .

You might also like