0% found this document useful (0 votes)
13 views47 pages

Inverse Problems: Gaussian Sampling Techniques

The document discusses inverse problems in the context of sampling from multivariate Gaussian distributions and prior modeling, focusing on techniques such as inverse transform sampling and the coloring transformation. It explains the mathematical foundations for sampling, including change of variables and the use of Gaussian priors, and provides examples of how to implement these methods in MATLAB. Additionally, it addresses challenges in modeling prior distributions and introduces impulse priors for specific applications in image processing.

Uploaded by

clearningaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views47 pages

Inverse Problems: Gaussian Sampling Techniques

The document discusses inverse problems in the context of sampling from multivariate Gaussian distributions and prior modeling, focusing on techniques such as inverse transform sampling and the coloring transformation. It explains the mathematical foundations for sampling, including change of variables and the use of Gaussian priors, and provides examples of how to implement these methods in MATLAB. Additionally, it addresses challenges in modeling prior distributions and introduces impulse priors for specific applications in image processing.

Uploaded by

clearningaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Inverse Problems

Sommersemester 2022

Vesa Kaarnioja
[Link]@[Link]

FU Berlin, FB Mathematik und Informatik

Seventh lecture, June 13, 2022


Today’s lecture

Sampling from multivariate Gaussian distributions, inverse transform


sampling
Prior modeling
The linear Gaussian setting
Numerical example
Change of variables

Consider two random variables x ∈ Rn and y ∈ Rn which are related via


the formula
y = f (x),
where f is continuously differentiable and one-to-one (these conditions can
be relaxed).
Then, for any B ∈ B(Rn ), it holds that
Z Z
P(x ∈ B) = P(y ∈ f (B)) = πy (y ) dy = πy (f (x))| det Df (x)| dx,
f (B) B

where Df (x) ∈ Rn×n is the Jacobian matrix of f . In consequence

πx (x) = πy (f (x))| det Df (x)|.


Sampling from Gaussian distributions

Suppose that we want to create a sample of realizations for a multivariate


Gaussian random variable x ∼ N (x0 , C ), with the probability density
 1/2  
1 1 T −1
πx (x) = n
exp − (x − x0 ) C (x − x0 ) .
(2π) det C 2

Since C −1 is (by assumption) symmetric and positive definite, it has a


Cholesky decomposition
C −1 = R T R,
where R is an upper triangular matrix. The probability density of x can be
alternatively written as
 1/2  
1 1 2
πx (x) = n
exp − kR(x − x0 )k .
(2π) det C 2

Let us define a new random variable w = R(x − x0 ) ⇔ x = R −1 w + x0 .


On the last slide, we defined w = R(x − x0 ) ⇔ x = R −1 w + x0 , where
x ∼ N (x0 , C ). The change of variables formula yields

πw (w ) = πx (R −1 w + x0 )| det R −1 | = πx (R −1 w + x0 )| det R|−1 .

Noting that
1
= det(C −1 ) = det R T det R = det(R)2 ,
det C
we obtain
 
1 1
πw (w ) = exp − kw k2 .
(2π)n/2 2

In consequence, w is Gaussian white noise, i.e.,

w ∼ N (0, I ).
Coloring transformation
Let x ∼ N (x0 , C ) (C p.d.) and w ∼ N (0, I ), with C −1 = R T R. Then it
holds that
w = R(x − x0 ) ⇔ x = R −1 w + x0 .
This is the basis of the coloring transformation:

1. Draw w ∈ Rn from N (0, I ).


2. A realization of x ∈ Rn from N (x0 , C ) can be obtained via

x = R −1 w + x0 .

Remark: MATLAB also has the function mvnrnd which can be used to
draw from multivariate normal distributions. However, if the Cholesky
factor R (or its inverse) has been precomputed, it may be slightly more
efficient to apply the coloring transformation, e.g., as

x = R \ randn(n,1) + x0;
Sampling from general univariate distributions

In order to sample a real-valued random variable x directly, we can use its


inverse distribution function. Let us assume that the probability density
π(x) of x is almost surely positive (this condition can be relaxed). Then,
the cumulative distribution function Φ: R → (0, 1) of x is defined by
Z t
Φ(t) = P(x < t) = π(x) dx.
−∞

In other words, Φ is the antiderivative of π. It follows from the


fundamental theorem of calculus that Φ is strictly increasing. In particular,
its inverse Φ−1 : (0, 1) → R exists.
Now, we define a new random variable u = Φ(x). First, we observe that

P(u < t) = P(Φ(x) < t) = P(x < Φ−1 (t))

for all t ∈ (0, 1). However, by definition of the cumulative distribution


function,
Z Φ−1 (t) Z Φ−1 (t)
−1
P(x < Φ (t)) = π(x) dx = Φ0 (x) dx
−∞ −∞
= Φ(Φ−1 (t)) − lim Φ(x) = t.
x→−∞

Hence P(u < t) = t, meaning that u ∼ U(0, 1) is distributed uniformly on


the interval [0, 1]. On the other hand, if u ∼ U(0, 1) is given, then we
obtain a random variable x with density π by setting x = Φ−1 (u). This
reduces drawing a sample from the distribution π to drawing a sample
from a uniform distribution, which can for example be performed in
MATLAB using the rand command.
Inverse transform sampling (“Golden rule”)

An algorithm for drawing from the density π with CDF Φ:

1. Draw t ∼ U(0, 1).


2. Calculate x = Φ−1 (t).

If a closed form expression for the inverse CDF is not available, then a
computationally attractive formula for obtaining the value Φ−1 (t) at a
point t ∈ (0, 1) is based on the identity

Φ−1 (t) = inf{x | Φ(x) ≥ t}.

Remark: The above formula is the expression for the generalized inverse
CDF: the formula with the infimum is valid even in the general case of
weakly monotonic and right-continuous CDFs.
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”

“Draw t ∼ U(0, 1) and find the smallest value of x such that Φ(x) ≥ t.”
Remarks:
The inverse transform sampling method can be used to sample
univariate densities π(u). However, if the components of a
multivariate density are mutually independent, i.e.,
π(u1 , . . . , un ) = π(u1 ) · · · π(un ) holds a.e., then inverse transform
sampling can be used to generate samples componentwise.
Unfortunately, the components of multivariate posterior distributions
are generally not mutually independent. In the next two weeks, we
will discuss importance sampling and MCMC methods for sampling
high-dimensional (posterior) distributions. These methods are
applicable even when the components of multivariate distributions are
not mutually independent.
Example
Suppose that we have the PDF π(x) := (6x − 6x 2 )χ(0,1) (x). We can
design the following simple scheme based on inverse transform sampling to
draw samples from this distribution.

n = 1e5; % number of samples


x = linspace(0,1);
p = @(x) 6*x-6*x.^2; % PDF
P = cumsum(p(x)); P = P/P(end); % "empirical" CDF of p
samples = [];
for iter = 1:n
u = rand; % draw sample from U(0,1)
ind = find(u <= P,1,’first’); % inverse CDF rule
samples = [samples,x(ind)]; % store sample
end
histogram(samples,’Normalization’,’pdf’); % draw a histogram
hold on, plot(x,p(x),’LineWidth’,3), legend(’samples’,’pdf’);
hold off;
Figure: 105 samples drawn from the distribution given on the previous page
organized as a histogram.
Prior modeling

The prior density should reflect our beliefs on the unknown variable of
interest before taking the measurements into account.

Often, the prior knowledge is qualitative in nature, and transferring the


information into quantitative form expressed through a prior density can
be challenging.

The prior probability distribution should be concentrated on those values


of x we expect to see and assign a clearly higher probability to them than
to the unexpected ones.
Gaussian priors

Gaussian densities
 
1 1 2
π(x) = √ exp − kx − mkC −1
(2π)d/2 det C 2

are the most used prior distribution in statistical inverse problems. They
are easy to construct and form a versatile class of distributions. They also
often lead to explicit estimators.

Random samples from a standard normal distribution N (0, I ) can usually


be generated directly, for example in Matlab via randn. Samples from a
general normal distribution N (m, C ) and from a wide class of other
distributions can then be derived from those, so that it is often not
necessary to employ the inverse transform method.
Let us consider an image. We divide
this region into n × n pixels and label
the pixels fi,j for i, j ∈ {1, . . . , n}.

Pi,j := {(x, y ); −1 + 2 j−1 j i−1 i


n < x < −1 + 2 n , −1 + 2 n < y < −1 + 2 n }
It is convenient to reshape the
matrix/image (fi,j ) into a vector x of
length d = n2 so that

x(j−1)n+i = fi,j , i, j ∈ {1, . . . , n}.

The image on the left illustrates the


new numbering corresponding to the
pixels.

Note that x = f(:) and f = reshape(x,n,n).


As an example, consider a problem where the unknown is a
two-dimensional pixel image, arranged as a vector x ∈ Rd . The
components xj represent the intensity of the j th pixel. Since we consider
images it is natural to add a positivity constraint to our prior. Assuming
that xi and xj are independent for i 6= j, the Gaussian white noise density
with positivity constraint is
 
1
π(x) ∝ χ+ (x) exp − 2 kxk2 ,

where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise.


Since we assumed that each component is independent of the others,
random draws can be performed componentwise.
Impulse priors
We assume again that the unknown is a two-dimensional pixel image.
Assume that our prior information is that the image contains small and
well localized objects in an almost constant background.
In such a case we could assume an impulse prior density, which means that
it gives a low average amplitude but allows outliers. The tail of such a
prior distribution is long, although the expected value is small.
Let x ∈ Rd represent the pixel image, where the component xj is the
intensity of the j th pixel. In what follows, xi and xj are assumed to be
independent for i 6= j.
One example of an impulse prior is the `1 prior. It has the density
 α d
π(x) = exp(−αkxk1 )
2
with α > 0, where the `1 -norm is defined as
d
X
kxk1 = |xj |.
j=1
The impulse effect can be enhanced by choosing P an even smaller power
p ∈ (0, 1) of the components of x, that is, using dj=1 |xj |p instead of the
`1 -norm.
Another choice that produces images with few distinctly different pixels
and a low-amplitude background is the Cauchy density
n
 α n Y 1
π(x) =
π 1 + α2 xj2
j=1

with α > 0.
Since we consider images we add a positivity constraint to our prior. For
the `1 prior, we set
π(x) = αd χ+ (x) exp(−αkxk1 ),
where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise. The
components xj are independent and each have the cumulative distribution
function Z t
Φ(t) = α e−αs ds = 1 − e−αt for all t ≥ 0.
0
Now, we can draw samples of xj using
1
xj = Φ−1 (uj ) = − ln(1 − uj ),
α
where the uj are independent random draws from the uniform distribution
U(0, 1).
Similarly, the components xj of the Cauchy prior with positivity constraint
are independent and have the CDF
2α t
Z
1 2
Φ(t) = 2 2
ds = arctan αt,
π 0 1+α s π
so that the inverse cumulative distribution is Φ−1 (t) = α1 tan πt

2 .
Random draws from the white noise prior with positivity constraint, the
impulse (`1 ) prior, and the Cauchy prior:

10 10 10

20 20 20

30 30 30

40 40 40

50 50 50

60 60 60

70 70 70

80 80 80

90 90 90

100 100 100


10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

Note that as along as all components are independent, drawing can be


done componentwise using inverse transform sampling. Here, for each
pixel xj , we draw tj from U(0, 1) and calculate xj = Φ−1 (tj ).
Discontinuities
Assume that we want to estimate a one-dimensional signal f : [0, 1] → R
with f (0) = 0 from indirect observations. Our prior knowledge is that the
signal is usually relatively stable but can have large jumps every now and
then. We may also have information on the size of the jumps or the rate
of their occurrence.
We obtain one possible prior by taking the finite difference approximation
of the derivative of f and assigning an impulsive noise distribution to it.
Let us discretize the interval [0, 1] by points tj = j/d and write xj = f (tj ).
Consider the density
d
 α d Y 1
π(x) = .
π 1+ α2 (xj − xj−1 )2
j=1

To draw samples from the above distribution we define new random


variables for the jumps
uj = xj − xj−1 , j = 1, . . . , d.
These each have the density
d
 α d Y 1
π(u) = .
π 1 + α2 uj2
j=1

In particular, the uj are independent from each other, so that they can be
drawn from a one-dimensional Cauchy density. Also note that
x = (x1 , . . . , xd )T ∈ Rd satisfies x = Lu, where L ∈ Rd×d is a lower
triangular matrix with Lij = 1 for i ≥ j.† Generalizing the idea behind the
above prior leads, e.g., to total variation priors.


Note that in MATLAB, it is more efficient to implement this as x = cumsum(u).
Hierarchical models

The prior density may depend on some parameter, such as variance or


mean. So far we have assumed that these parameters are known.
However, we often do not know how to choose them. If a parameter is not
known, it can be estimated as a part of the statistical inference problem on
the data. This leads to hierarchical models that include hypermodels for
the parameters defining the prior density.
Assume that the prior distribution depends on a parameter α, which is
assumed to be unknown. We then write the prior as a conditional density

P(x|α).

We model the unknown α with a hyperprior P(α) = πh (α) and write the
joint distribution of x and α as

P(x, α) = P(x|α) P(α).


Assuming we have a likelihood model P(y |x) for the measurement y , we
get the posterior density for x and α, given y , using Bayes’ formula

P(x, α|y ) ∝ P(y |x, α) P(x, α) = P(y |x, α) P(x|α) P(α).

The hyperprior density πh may again depend on some hyperparameter α0 .


The main reason for the use of a hyperprior model is that the construction
of the posterior is assumed to be more robust with respect to fixing a
value for the hyperparameter α0 than fixing a value for α.
The linear Gaussian setting

In this chapter we study the linear Gaussian setting, where the forward
map F is linear and both the prior distribution and the distribution of the
observational noise η are Gaussian.
For several reasons, it plays a central role in the study of inverse problems.
It arises frequently in applications, either directly or in the form of
posterior distributions that are asymptotically Gaussian in the large data
limit. It also allows computing explicit solutions which can be used to gain
a general understanding. Apart from that, many methods employed in a
nonlinear or non-Gaussian setting build on ideas from the linear Gaussian
case by performing linearization or Gaussian approximation.
Let us suppose that the unknown x ∈ Rd and the data y ∈ Rk follow the
relation
y = Ax + η, (1)
where
1 The forward model is linear, i.e., A ∈ Rk×d .

2 The prior distribution is Gaussian: x ∼ π(x) = N (x , Γ ), where Γ


0 pr pr
is symmetric and positive definite.
3 The noise is Gaussian: η ∼ ν(η) = N (η , Γ ), where Γ is symmetric
0 n n
and positive definite.
4 x and η are independent.

Theorem
Under assumptions 1–4, the posterior distribution corresponding to (1) is
Gaussian with x|y ∼ N (µpost , Γpost ), where we have the posterior mean

µpost = (Γ−1 T −1 −1 T −1 −1
pr + A Γn A) (A Γn (y − η0 ) + Γpr x0 )

and covariance
Γpost = (Γ−1 T −1 −1
pr + A Γn A) .
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
   
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
= exp − y Γn y − y T Γ−1 T −1
n Ax − y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

− η0T Γ−1 T −1 T −1
n y + η0 Γn Ax + η0 Γn η0

+ x T Γ−1 T −1 T −1 
pr x − 2x Γpr x0 + x0 Γpr x0
 
1 T −1 T −1 T T −1 −1 
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
   
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0

+ x T Γ−1 T −1 T −1 
pr x − 2x Γpr x0 +x0 Γpr x0
 
1 T −1 T −1 T T −1 −1 
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
   
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2

1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0

−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0

+ x T Γ−1 T −1 T −1 
pr x − 2x Γpr x0 +x0 Γpr x0
 
1 T −1 T −1 T T −1 −1 
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 ) .
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
On the previous slide, we arrived at
 
y 1 T −1 T −1 
π (x) ∝ exp − x Γpost x − 2x Γpost µpost .
2
To finish the proof, we “complete the square” by multiplying and dividing by
−1
exp(− 12 µT
post Γpost µpost ). Since this term does not depend on x, we can absorb the
denominator into the implied coefficient to obtain
   
1 1
π y (x) ∝ exp − x T Γ−1 T −1
exp − µT −1

post x − 2x Γpost µpost post Γpost µpost
2 2
 
1
= exp − x T Γ−1 T −1 T −1 
post x − 2x Γ µ
post post + µ Γ µ
post post post
2
 
1 T −1 T −1 T −1 
= exp − (x − µpost ) Γpost (x − µpost ) + 2x Γpost µpost − 2x Γpost µpost
2
 
1
= exp − (x − µpost )T Γ−1

post (x − µpost ) ,
2
as desired.
Remark: The previous proof shows that if x ∼ N (x0 , Γpr ) and
η ∼ N (η0 , Γn ), then
x|y ∼ N (µpost , Γpost ),
where
Γpost = (Γ−1 T −1
pr + A Γn A)
−1
(2)
and
µpost = Γpost (AT Γ−1 −1
n (y − η0 ) + Γpr x0 ). (3)

One also has the following alternative representations for the posterior
mean
µpost = x0 + Γpr AT (AΓpr AT + Γn )−1 (y − Ax0 − η0 ) (4)
and the posterior covariance
Γpost = Γpr − Γpr AT (AΓpr AT + Γn )−1 AΓpr . (5)
Formula (5) can be proved, e.g., by using the
Sherman–Morrison–Woodbury formula on (2). Formula (4) can be proved
by plugging the formula (5) into (3) and simplifying the expression
(homework).
As the posterior distribution is Gaussian, its mean and its mode coincide.
This means that the conditional mean estimator and the MAP estimator
coincide in the linear Gaussian setting.
Corollary
The conditional mean estimator and the maximum a posteriori estimator
coincide in the linear Gaussian setting and are given by
x̂CM = x̂MAP = µpost .
Example
γ2
Let Γn = γ 2 I , η0 = 0, Γpr = σ 2 I , x0 = 0, and set λ = σ2
. Then µpost
minimizes
Jλ (x) := ky − Axk2 + λkxk2 .
and therefore satisfies

(AT A + λI )µpost = AT y . (6)

This example provides a connection between Bayesian inference and


variational regularization: Jλ can be interpreted as the objective functional
in a linear regression model with a regularization term λkxk2 . Equation
(6) for µpost is then exactly the normal equation. In the general case,
equation µpost = (Γ−1 T −1 −1 T −1 −1
pr + A Γn A) (A Γn (y − η0 ) + Γpr x0 ) can thus be
viewed as a generalized normal equation. This point of view helps to
understand the structure of Bayesian regularization by linking it to
well-understood optimization approaches for inverse problems.
Numerical example: one-dimensional deconvolution
Let us revisit the deconvolution example from last week: we are interested
in estimating a signal f : [0, 1] → R from noisy, blurred observations
modeled as
Z 1
yi = y (si ) = K (si , t)f (t) dt + ηi , i ∈ {1, . . . , k},
0
where the blurring kernel is
 
1 2
K (s, t) = exp − 2 (s − t) , ω = 0.5,

and we have Gaussian measurement noise η ∼ N (η0 , Γnoise ) with a
symmetric, positive definite covariance matrix Γnoise .
1
If si = ki − 2k for i ∈ {1, . . . , k} and we discretize the integral using the
midpoint rule with tj = dj − 2d 1
and xj = f (tj ) for j ∈ {1, . . . , d}, then we
have the discrete linear model
1
y = Ax + η, where Ai,j = K (si , tj ).
d
Linear Gaussian setting

Suppose that we set a Gaussian prior for the unknown x ∼ N (x0 , Γpr ),
where Γpr is a symmetric, positive definite covariance matrix.
Now the posterior probability density of x given the measurement y is
 
y 1 T −1
π (x) ∝ exp − (x − x) Γpost (x − x) ,
2

where we have the posterior mean

x = x0 + Γpr AT (AΓpr AT + Γnoise )−1 (y − Ax0 − η0 )

and posterior covariance

Γpost = Γpr − Γpr AT (AΓpr AT + Γnoise )−1 AΓpr .


With additive noise η ∼ ν(η) = N (η0 , σ 2 I ), we have the likelihood
 
1 2
P(y |x) = ν(y − Ax) ∝ exp − 2 ky − Ax − η0 k .

Let L = tridiag(−1, 2, −1) and consider the following priors
 
1 2
πpr,1 (x) ∝ exp − 2 kx − x0 k with covariance Γpr,1 = γ 2 I ;

 
1 2
πpr,2 (F ) ∝ exp − 2 kL(x − x0 )k


1
 with covariance
T T
= exp − 2 (x − x0 ) (L L)(x − x0 ) Γpr,2 = γ 2 (LT L)−1 ,

where x0 ∈ Rd is the prior mean (assumed to be the same in both cases).
Hence (from the previous page)
xj = x0 + Γpr,j AT Gj−1 (y − Ax0 − η0 ),
Γpost,j = Γpr,j − Γpr,j AT Gj−1 AΓpr,j ,

where Gj = AΓpr,j AT + Γnoise and Γnoise = σ 2 I .


For the numerical experiment, we simulate measurements using the
(smooth) ground truth signal

f (t) = 8t 3 − 16t 2 + 8t,

which satisfies f (0) = f (1) = 0. The measurements are contaminated with


zero-mean 10% relative noise (σ ≈ 0.0618) and we set d = k = 120.
Remark: When we simulate the measurement data, it is important to
avoid the inverse crime. One way to do this is to generate the
measurement data using a denser grid and then interpolate the forward
solution onto a coarser computational grid, which is actually used to
compute the reconstruction.
Since both the prior and the posterior are now Gaussian, we can use the
coloring transformation to draw samples from the prior and posterior.

See the MATLAB script week7.m on the course webpage!


A note on marginal Gaussian distributions

Let
π(x) ∝ exp(− 21 (x − µ)T Γ−1 (x − µ))
be a multivariate Gaussian PDF with mean µ and positive definite and
symmetric covariance matrix Γ.
Q: What is Γii ?
A: σi2 := Γii is the variance of the marginal distribution with PDF
Z
π(xi ) = π(x1 , . . . , xi , . . . , xn ) dx1 · · · dxi−1 dxi+1 · · · dxn ,
Rn−1

which is itself a (univariate) Gaussian PDF with mean µi .


This is why we can obtain the credibility envelopes by taking the square
roots of the diagonal values of Γpost,j .
Samples drawn from the white noise prior, = 0.2 Samples drawn from the smoothness prior, = 0.001
0.8 0.3

0.6 0.2

0.4 0.1

0.2 0

0 -0.1

-0.2 -0.2

-0.4 -0.3

-0.6 -0.4

-0.8 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, = 0.6864 Samples drawn from the smoothness prior, = 0.0064
2.5 1

2
0.5
1.5

1
0
0.5

0 -0.5

-0.5
-1
-1

-1.5
-1.5
-2

-2.5 -2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, =2 Samples drawn from the smoothness prior, = 0.02
8 10

6 8

6
4

4
2
2
0
0
-2
-2

-4
-4

-6 -6

-8 -8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the white noise prior and the smoothness prior for several values of γ.
Samples drawn from the posterior with white noise prior, = 0.2 Samples drawn from the posterior with smoothness prior, = 0.001
1.4 1.2
ground truth ground truth
posterior mean 2 posterior mean 2
1.2
1

0.8
0.8

0.6 0.6

0.4
0.4

0.2

0.2
0

-0.2 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, = 0.6864 Samples drawn from the posterior with smoothness prior, = 0.0064
4 1.6
ground truth ground truth
posterior mean 2 1.4 posterior mean 2
3
1.2

2 1

0.8
1
0.6
0
0.4

-1 0.2

0
-2
-0.2

-3 -0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, =2 Samples drawn from the posterior with smoothness prior, = 0.02
8 2.5
ground truth ground truth
posterior mean 2 posterior mean 2
6
2

4
1.5

2
1
0

0.5
-2

0
-4

-6 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the posterior corresponding to both the white noise prior and the
smoothness prior for several values of γ. We also plot the ground truth solution and the
posterior mean.

You might also like