Inverse Problems: Gaussian Sampling Techniques
Inverse Problems: Gaussian Sampling Techniques
Sommersemester 2022
Vesa Kaarnioja
[Link]@[Link]
Noting that
1
= det(C −1 ) = det R T det R = det(R)2 ,
det C
we obtain
1 1
πw (w ) = exp − kw k2 .
(2π)n/2 2
w ∼ N (0, I ).
Coloring transformation
Let x ∼ N (x0 , C ) (C p.d.) and w ∼ N (0, I ), with C −1 = R T R. Then it
holds that
w = R(x − x0 ) ⇔ x = R −1 w + x0 .
This is the basis of the coloring transformation:
x = R −1 w + x0 .
Remark: MATLAB also has the function mvnrnd which can be used to
draw from multivariate normal distributions. However, if the Cholesky
factor R (or its inverse) has been precomputed, it may be slightly more
efficient to apply the coloring transformation, e.g., as
x = R \ randn(n,1) + x0;
Sampling from general univariate distributions
If a closed form expression for the inverse CDF is not available, then a
computationally attractive formula for obtaining the value Φ−1 (t) at a
point t ∈ (0, 1) is based on the identity
Remark: The above formula is the expression for the generalized inverse
CDF: the formula with the infimum is valid even in the general case of
weakly monotonic and right-continuous CDFs.
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
“Draw t ∼ U(0, 1). Then find smallest value of x such that Φ(x) ≥ t.”
●
“Draw t ∼ U(0, 1) and find the smallest value of x such that Φ(x) ≥ t.”
Remarks:
The inverse transform sampling method can be used to sample
univariate densities π(u). However, if the components of a
multivariate density are mutually independent, i.e.,
π(u1 , . . . , un ) = π(u1 ) · · · π(un ) holds a.e., then inverse transform
sampling can be used to generate samples componentwise.
Unfortunately, the components of multivariate posterior distributions
are generally not mutually independent. In the next two weeks, we
will discuss importance sampling and MCMC methods for sampling
high-dimensional (posterior) distributions. These methods are
applicable even when the components of multivariate distributions are
not mutually independent.
Example
Suppose that we have the PDF π(x) := (6x − 6x 2 )χ(0,1) (x). We can
design the following simple scheme based on inverse transform sampling to
draw samples from this distribution.
The prior density should reflect our beliefs on the unknown variable of
interest before taking the measurements into account.
Gaussian densities
1 1 2
π(x) = √ exp − kx − mkC −1
(2π)d/2 det C 2
are the most used prior distribution in statistical inverse problems. They
are easy to construct and form a versatile class of distributions. They also
often lead to explicit estimators.
with α > 0.
Since we consider images we add a positivity constraint to our prior. For
the `1 prior, we set
π(x) = αd χ+ (x) exp(−αkxk1 ),
where χ+ (x) = 1 if xj > 0 for all j and χ+ (x) = 0 otherwise. The
components xj are independent and each have the cumulative distribution
function Z t
Φ(t) = α e−αs ds = 1 − e−αt for all t ≥ 0.
0
Now, we can draw samples of xj using
1
xj = Φ−1 (uj ) = − ln(1 − uj ),
α
where the uj are independent random draws from the uniform distribution
U(0, 1).
Similarly, the components xj of the Cauchy prior with positivity constraint
are independent and have the CDF
2α t
Z
1 2
Φ(t) = 2 2
ds = arctan αt,
π 0 1+α s π
so that the inverse cumulative distribution is Φ−1 (t) = α1 tan πt
2 .
Random draws from the white noise prior with positivity constraint, the
impulse (`1 ) prior, and the Cauchy prior:
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
70 70 70
80 80 80
90 90 90
In particular, the uj are independent from each other, so that they can be
drawn from a one-dimensional Cauchy density. Also note that
x = (x1 , . . . , xd )T ∈ Rd satisfies x = Lu, where L ∈ Rd×d is a lower
triangular matrix with Lij = 1 for i ≥ j.† Generalizing the idea behind the
above prior leads, e.g., to total variation priors.
†
Note that in MATLAB, it is more efficient to implement this as x = cumsum(u).
Hierarchical models
P(x|α).
We model the unknown α with a hyperprior P(α) = πh (α) and write the
joint distribution of x and α as
In this chapter we study the linear Gaussian setting, where the forward
map F is linear and both the prior distribution and the distribution of the
observational noise η are Gaussian.
For several reasons, it plays a central role in the study of inverse problems.
It arises frequently in applications, either directly or in the form of
posterior distributions that are asymptotically Gaussian in the large data
limit. It also allows computing explicit solutions which can be used to gain
a general understanding. Apart from that, many methods employed in a
nonlinear or non-Gaussian setting build on ideas from the linear Gaussian
case by performing linearization or Gaussian approximation.
Let us suppose that the unknown x ∈ Rd and the data y ∈ Rk follow the
relation
y = Ax + η, (1)
where
1 The forward model is linear, i.e., A ∈ Rk×d .
Theorem
Under assumptions 1–4, the posterior distribution corresponding to (1) is
Gaussian with x|y ∼ N (µpost , Γpost ), where we have the posterior mean
µpost = (Γ−1 T −1 −1 T −1 −1
pr + A Γn A) (A Γn (y − η0 ) + Γpr x0 )
and covariance
Γpost = (Γ−1 T −1 −1
pr + A Γn A) .
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2
1 T −1
= exp − y Γn y − y T Γ−1 T −1
n Ax − y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0
− η0T Γ−1 T −1 T −1
n y + η0 Γn Ax + η0 Γn η0
+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 + x0 Γpr x0
1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2
1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0
−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0
+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 +x0 Γpr x0
1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 )
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
Proof. Noting that Γpost = (Γ−1 T −1
pr + A Γn A)
−1
and
µpost = Γpost (AT Γ−1
n (y − η0 ) + Γ−1
x
pr 0 ), we obtain
y 1 T −1 1 T −1
π (x) ∝ exp − (y − Ax − η0 ) Γn (y − Ax − η0 ) exp − (x − x0 ) Γpr (x − x0 )
2 2
1 T −1
∝ exp − y Γn y − x T AT Γ−1 T −1
n y −y Γn η0
2
− x T AT Γ−1 T T −1 T T −1
n y + x A Γn Ax + x A Γn η0
−η0T Γ−1 T T −1 T T −1
n y + x A Γn η0 η0 Γn η0
+ x T Γ−1 T −1 T −1
pr x − 2x Γpr x0 +x0 Γpr x0
1 T −1 T −1 T T −1 −1
= exp − x (Γpr + A Γn A)x − 2x (A Γn (y − η0 ) + Γpr x0 ) .
2 | {z } | {z }
=Γ−1
post =Γ−1
post µpost
On the previous slide, we arrived at
y 1 T −1 T −1
π (x) ∝ exp − x Γpost x − 2x Γpost µpost .
2
To finish the proof, we “complete the square” by multiplying and dividing by
−1
exp(− 12 µT
post Γpost µpost ). Since this term does not depend on x, we can absorb the
denominator into the implied coefficient to obtain
1 1
π y (x) ∝ exp − x T Γ−1 T −1
exp − µT −1
post x − 2x Γpost µpost post Γpost µpost
2 2
1
= exp − x T Γ−1 T −1 T −1
post x − 2x Γ µ
post post + µ Γ µ
post post post
2
1 T −1 T −1 T −1
= exp − (x − µpost ) Γpost (x − µpost ) + 2x Γpost µpost − 2x Γpost µpost
2
1
= exp − (x − µpost )T Γ−1
post (x − µpost ) ,
2
as desired.
Remark: The previous proof shows that if x ∼ N (x0 , Γpr ) and
η ∼ N (η0 , Γn ), then
x|y ∼ N (µpost , Γpost ),
where
Γpost = (Γ−1 T −1
pr + A Γn A)
−1
(2)
and
µpost = Γpost (AT Γ−1 −1
n (y − η0 ) + Γpr x0 ). (3)
One also has the following alternative representations for the posterior
mean
µpost = x0 + Γpr AT (AΓpr AT + Γn )−1 (y − Ax0 − η0 ) (4)
and the posterior covariance
Γpost = Γpr − Γpr AT (AΓpr AT + Γn )−1 AΓpr . (5)
Formula (5) can be proved, e.g., by using the
Sherman–Morrison–Woodbury formula on (2). Formula (4) can be proved
by plugging the formula (5) into (3) and simplifying the expression
(homework).
As the posterior distribution is Gaussian, its mean and its mode coincide.
This means that the conditional mean estimator and the MAP estimator
coincide in the linear Gaussian setting.
Corollary
The conditional mean estimator and the maximum a posteriori estimator
coincide in the linear Gaussian setting and are given by
x̂CM = x̂MAP = µpost .
Example
γ2
Let Γn = γ 2 I , η0 = 0, Γpr = σ 2 I , x0 = 0, and set λ = σ2
. Then µpost
minimizes
Jλ (x) := ky − Axk2 + λkxk2 .
and therefore satisfies
Suppose that we set a Gaussian prior for the unknown x ∼ N (x0 , Γpr ),
where Γpr is a symmetric, positive definite covariance matrix.
Now the posterior probability density of x given the measurement y is
y 1 T −1
π (x) ∝ exp − (x − x) Γpost (x − x) ,
2
Let
π(x) ∝ exp(− 21 (x − µ)T Γ−1 (x − µ))
be a multivariate Gaussian PDF with mean µ and positive definite and
symmetric covariance matrix Γ.
Q: What is Γii ?
A: σi2 := Γii is the variance of the marginal distribution with PDF
Z
π(xi ) = π(x1 , . . . , xi , . . . , xn ) dx1 · · · dxi−1 dxi+1 · · · dxn ,
Rn−1
0.6 0.2
0.4 0.1
0.2 0
0 -0.1
-0.2 -0.2
-0.4 -0.3
-0.6 -0.4
-0.8 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, = 0.6864 Samples drawn from the smoothness prior, = 0.0064
2.5 1
2
0.5
1.5
1
0
0.5
0 -0.5
-0.5
-1
-1
-1.5
-1.5
-2
-2.5 -2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, =2 Samples drawn from the smoothness prior, = 0.02
8 10
6 8
6
4
4
2
2
0
0
-2
-2
-4
-4
-6 -6
-8 -8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior and the smoothness prior for several values of γ.
Samples drawn from the posterior with white noise prior, = 0.2 Samples drawn from the posterior with smoothness prior, = 0.001
1.4 1.2
ground truth ground truth
posterior mean 2 posterior mean 2
1.2
1
0.8
0.8
0.6 0.6
0.4
0.4
0.2
0.2
0
-0.2 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, = 0.6864 Samples drawn from the posterior with smoothness prior, = 0.0064
4 1.6
ground truth ground truth
posterior mean 2 1.4 posterior mean 2
3
1.2
2 1
0.8
1
0.6
0
0.4
-1 0.2
0
-2
-0.2
-3 -0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, =2 Samples drawn from the posterior with smoothness prior, = 0.02
8 2.5
ground truth ground truth
posterior mean 2 posterior mean 2
6
2
4
1.5
2
1
0
0.5
-2
0
-4
-6 -0.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior corresponding to both the white noise prior and the
smoothness prior for several values of γ. We also plot the ground truth solution and the
posterior mean.