0% found this document useful (0 votes)

11 views32 pages

Bayesian Inverse Problems Overview

The document outlines a lecture on inverse problems, focusing on Bayes' theorem and its application in estimating signals from noisy observations. It discusses the modeling of prior and likelihood distributions, and presents a case study on one-dimensional deconvolution with examples of posterior distributions and Bayesian estimators. The document also emphasizes the challenges of high-dimensional problems and the importance of credible sets in quantifying uncertainty.

Uploaded by

clearningaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views32 pages

Bayesian Inverse Problems Overview

Uploaded by

clearningaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Inverse Problems

Sommersemester 2022

Vesa Kaarnioja
[Link]@[Link]

FU Berlin, FB Mathematik und Informatik

Sixth lecture, May 30, 2022

Public holiday next Monday (June 6)

Monday June 6 (next week) is a public holiday.

→ There will be no lecture or exercise session on Monday June 6.
→ The next lecture and exercise session will be on Monday June 13
(two weeks from today)! The deadline for the sixth exercise sheet will also
be on Monday June 13.
Recap: Bayes’ formula for inverse problems
We are interested in the inverse problem of solving x ∈ Rd from
y = F (x) + η,
where y ∈ Rk is the measurement vector, F : Rd → Rk the forward
mapping, and η ∈ Rk is noise. We model x, y , and η as random variables.
Then we have:
Theorem (Bayes’ theorem)
We assume:
The noise η has the probability density ν on Rk .
The parameter x has the probability density π on Rd .
The random variables x and η are independent.
Then the likelihood is P(y |x) = ν(y − F (x)) and we can write

P(y |x)P(x) ν(y − F (x))π(x)

π y (x) := P(x|y ) = =: ,
P(y ) Z (y )
R
provided that Z (y ) := Rd ν(y − F (x))π(x)dx > 0.
Bayes’ formula:
ν(y − F (x))π(x)
π y (x) = .
Z (y )

The prior model π(x) describes a priori information. It should assign

high probability to objects x which are typical in light of a priori
information, and low probability to unexpected x.
The likelihood model P(y |x) = ν(y − F (x)) processes measurement
information. It gives low probability to objects that produce simulated
data which is very different from the measured data.
The number Z (y ) can be seen as a normalization constant.
The posterior distribution π y (x) = P(x|y ) represents the updated
knowledge about the parameter of interest x, given the evidence y .
Since the normalization constant Z (y ) is often not of interest in our
considerations, we frequently write the Bayes’ formula as

π y (x) ∝ ν(y − F (x))π(x),

where the symbol ∝ means equality up to a constant factor.

Case study: one-dimensional deconvolution

As motivation† , suppose that we are interested in estimating a signal

f : [0, 1] → R from noisy, blurred observations modeled as
Z 1
yi = y (si ) = K (si , t)f (t) dt + ηi , i ∈ {1, . . . , k},
0

where the blurring kernel is

1 2
K (s, t) = exp − 2 (s − t) , ω = 0.5,
2ω

and η ∈ Rk is measurement noise.

†
We will consider the so-called “linear-Gaussian setting” as well as computational
techniques for sampling posterior densities in more detail in a couple of weeks.
Specifically, we will not consider the question of how to draw samples from the posterior
density today. We will revisit this question in more detail at a later time.
Discrete model
Midpoint rule:
Z 1 d
1X
yi = K (si , t)f (t) dt + ηi ≈ K (si , tj )xj + ηi ,
0 d
j=1
j 1
where tj = d − 2d and xj = f (tj ) for j ∈ {1, . . . , d}.
i 1
If we have si = k − 2k for i ∈ {1, . . . , k}, then we have the discrete linear
model
1
y = Ax + η, where Ai,j =
K (si , tj ).
d
To employ the Bayesian approach, we treat y , η, and x as random
variables. We assume that η is Gaussian noise with variance σ 2 I ,
1
η ∼ N (0, σ 2 I ), ν(η) ∝ exp − 2 kηk2 .
2σ
The likelihood is then given by
1
P(y |x) = ν(y − Ax) ∝ exp − 2 ky − Axk2 .
2σ
Next, we have to choose a prior distribution for the unknown. Assume
that we know that x(0) = x(1) = 0 and that x is quite smooth, that is,
the value of x(t) in a point is more or less the same as in its neighbor. We
will then model the unknown as
1
xj = (xj−1 + xj+1 ) + Wj , j = 1, . . . , k, (1)
2
where the term Wj follows a Gaussian distribution N (0, γ 2 ).
The variance γ 2 determines how much the reconstructed function x
departs from the smoothness model xj = 12 (xj−1 + xj+1 ). We can write (1)
as  
2 −1
−1 2 −1 
 
1 −1 2 −1 
Lx = W , where L :=  .

2 .. .. ..
 . . . 

 −1 2 −1
−1 2
This leads to the so-called smoothness prior
1
π(x) ∝ exp − 2 kLxk2 .
2γ
Using Bayes’ formula, we get the posterior distribution
1 1
π y (x) ∝ exp − 2 ky − Axk2 − 2 kLxk2 .
2σ 2γ
For the numerical experiment, we simulate measurements using the
(smooth) ground truth signal
f (t) = 8t 3 − 16t 2 + 8t,
which satisfies f (0) = f (1) = 0. The measurements are contaminated with
10% relative noise (σ ≈ 0.0618) and we set d = k = 120.
Let us draw samples from the prior and posterior. As comparison, we also
consider a posterior obtained using the white noise prior, i.e.,

y 1 2 1 2
π0 (x) ∝ − 2 ky − Axk πpr,0 (x), πpr,0 (x) ∝ exp − 2 kxk .
2σ 2γ
Remark: When we simulate the measurement data, it is important to
avoid the inverse crime. One way to do this is to generate the
measurement data using a denser grid and then interpolate the forward
solution onto a coarser computational grid, which is actually used to
compute the reconstruction. (See week6.m on the course webpage!)
Samples drawn from the white noise prior, = 0.2 Samples drawn from the smoothness prior, = 0.0005
0.8 0.6

0.6 0.5

0.4 0.4

0.2 0.3

0 0.2

-0.2 0.1

-0.4 0

-0.6 -0.1

-0.8 -0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, = 0.6864 Samples drawn from the smoothness prior, = 0.0032
3 2.5

2
2
1.5

1 1

0.5
0
0

-1 -0.5

-1
-2
-1.5

-3 -2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the white noise prior, =2 Samples drawn from the smoothness prior, = 0.01
8 6

5
6
4

4 3

2
2
1
0
0

-2 -1

-2
-4
-3

-6 -4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the white noise prior and the smoothness prior for several values of γ.
Samples drawn from the posterior with white noise prior, = 0.2 Samples drawn from the posterior with smoothness prior, = 0.0005
1.4 1.2
ground truth ground truth
1.2 posterior mean posterior mean
1
1

0.8 0.8

0.6
0.6
0.4

0.2 0.4

0
0.2
-0.2

-0.4 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, = 0.6864 Samples drawn from the posterior with smoothness prior, = 0.0032
3 1.4
ground truth ground truth
2.5 posterior mean posterior mean
1.2

2
1

1.5
0.8
1
0.6
0.5
0.4
0

0.2
-0.5

-1 0

-1.5 -0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, =2 Samples drawn from the posterior with smoothness prior, = 0.01
8 2.5
ground truth ground truth
posterior mean posterior mean
6 2

4 1.5

2 1

0 0.5

-2 0

-4 -0.5

-6 -1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Samples drawn from the posterior corresponding to both the white noise prior and the
smoothness prior for several values of γ. We also plot the ground truth solution and the posterior
mean. The solutions in the middle row roughly satisfy the Morozov discrepancy principle.
As the previous example illustrates, many practical problems tend to be
high-dimensional. The measurement model for the discretized
deconvolution example
y = Ax + η,
with A ∈ Rk×d , x ∈ Rd , and y , η ∈ Rk , where k corresponds to the
number of points s1 , . . . , sk where we observe the signal and d corresponds
to the number of quadrature points t1 , . . . , td discretizing the unknown
quantity x.
A grid with only k = d = 120 points already corresponds to a
120-dimensional posterior, so visualization of the posterior density is highly
nontrivial.
In practice, we are often interested in various point estimates, statistics,
samples, or the spread of the posterior distribution.
Bayesian estimators
The posterior distribution can be used to define estimators for the
conditional random variable x|y ∼ π y (x). In general, an estimator x̂ is any
function of the data y . The estimate x̂(y ) is itself an Rd -valued random
variable whose properties give information about the usefulness and quality
of the estimator.
Bayesian estimators are those defined via the posterior distribution π y . We
present the two most prominent ones. The conditional mean (CM)
estimator, which is defined as the mean
Z
x̂CM (y ) = E[x|y ] = uπ y (u)du
Rd

of the posterior distribution.

The maximum a posteriori (MAP) estimator, which is defined as the mode
x̂MAP (y ) = arg max π y (u)
u∈Rd

of the posterior distribution (if a unique mode exists).

One way to estimate spread are Bayesian credible sets. A level 1 − α
credible set Cα with α ∈ (0, 1) satisfies
Z
P(x ∈ Cα |y ) = π y (u)du = 1 − α.
Cα

For small α, it is a region that contains a large fraction of the posterior

mass.
Another way of quantifying uncertainty is to consider the problem

y † = F (x † ) + η,

where x † is thought to be a deterministic “true” value of the unknown.

We would then like to find random sets Cα that frequently contain the
truth u † , that is,
P(x † ∈ Cα ) = 1 − α.
Such a set Cα is called frequentist confidence region of level 1 − α.
Deconvolution example: posteriors with 2σ credibility envelopes.
Samples drawn from the posterior with white noise prior, = 0.2 Samples drawn from the posterior with smoothness prior, = 0.0005
1.4 1.2
ground truth ground truth
posterior mean 2 posterior mean 2
1.2
1

0.8
0.8

0.6 0.6

0.4
0.4

0.2

0.2
0

-0.2 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, = 0.6864 Samples drawn from the posterior with smoothness prior, = 0.0032
3 1.6
ground truth ground truth
2.5 posterior mean 2 1.4 posterior mean 2

2
1.2
1.5
1
1
0.8
0.5
0.6
0
0.4
-0.5
0.2
-1

-1.5 0

-2 -0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Samples drawn from the posterior with white noise prior, =2 Samples drawn from the posterior with smoothness prior, = 0.01
8 2.5
ground truth ground truth
posterior mean 2 posterior mean 2
6 2

4 1.5

2 1

0 0.5

-2 0

-4 -0.5

-6 -1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Example. Assume that x ∈ R and that the posterior density is given by

y c u 1−c u−1
π (u) = φ + φ ,
σ1 σ1 σ2 σ2
where c ∈ (0, 1), σ1 , σ2 > 0, and φis the density of the standard normal
2
distribution, φ(u) = √12π exp − u2 . In this case,
(
0 if c/σ1 > (1 − c)/σ2 ,
x̂CM = 1 − c and x̂MAP =
1 if c/σ1 < (1 − c)/σ2 .

If c = 21 and σ1 , σ2 are small, the probability that x takes values near x̂CM
is small. On the other hand, if σ1 = cσ2 , then c/σ1 = 1/σ2 > (1 − c)/σ2 ,
so that x̂MAP = 0. If c is small, this is, however, a bad estimate for x,
since the probability for x to take values near 0 is small. Last of all, we
notice that when the conditional mean gives a poor estimate, this is
reflected in a larger posterior variance
Z ∞
2
σ = (u − x̂CM )2 π y (u)du.
−∞
We cannot say that one estimator is better than the other in all
applications.

Left: the density with σ1 = 0.08, σ = 0.04, and c = 12 . The CM estimate represents the
distribution poorly. Notice that when the CM gives a poor estimate, this is reflected in wider
variance (1 standard deviation is depicted as a red line). Right: the density with σ1 = 0.001,
σ2 = 0.1, and c = 0.01. The MAP gives a poor estimate since it is in an unlikely part of the
computational domain.
The maximum likelihood estimate

x̂ML (y ) = arg max P(y |u)

u∈Rd

answers the question: “which value of the unknown is most likely to

produce the measured data?”
The ML estimate is a non-Bayesian estimate, and in the case of ill-posed
inverse problems, often not useful. It is analogous to solving a classical
inverse problem without regularization.
Well-posedness
Assume that the posterior density is given by
1
π y (x) = g (x)π(x)
Z
with likelihood g (x) and prior density π(x). Now consider an
approximation
1
πδy (x) = gδ (x)π(x)
Zδ
resulting from an approximated likelihood gδ (x). Such an approximation
can result, for example, from an approximation Fδ of the forward operator
F or from perturbed data yδ .

The question is therefore:

does |g − gδ | = O(δ) imply d(π y , πδy ) = O(δ)

for small enough δ > 0 and some metric d(·, ·) on probability densities?
Well-posedness refers to the continuity of the method of obtaining
the posterior distribution with respect to different perturbations in the
parameters. In practice, this could mean for example the following: If
we have two measurements close to each other, does this mean the
corresponding posterior distributions are close in some metric? Recall
that ill-posed problems generally are discontinuous in this regard, i.e.,
without regularization, small difference in measurements can induce
arbitrarily large difference in reconstruction. Does the Bayesian
approach then regularize the problem? The answer is yes under
certain assumptions on the modeling.
We will proceed to show that, under certain conditions, π y and πδy
satisfy
d(π y , πδy ) ≤ cδ
for δ small enough, some c > 0, and some metric d(·, ·) on probability
densities.
To this end, we define two metrics for probability densities: the total
variation distance and the Hellinger distance.
Metrics for probability densities

We introduce the total variation distance and the Hellinger distance, both
of which have been used to show well-posedness results. Here, we will use
the Hellinger distance to establish the well-posedness of Bayesian inverse
problems.
Let π and π 0 be the probability densities of two random variables with
values in Rd . We define the total variation distance between π and π 0 as
Z
1 1
dTV (π, π 0 ) = π(x) − π 0 (x) dx = kπ − π 0 kL1 ,
2 Rd 2
and the Hellinger distance between π and π 0 as
1
1 √ √
Z
0 1 p p 2 2
dH (π, π ) = π(x) − π 0 (x) dx =√ π − π0 .
2 Rd 2 L2

The normalization constants are chosen in such a way that the largest
possible distance between two densities is one, as can be seen in the
following lemma.
Lemma
For any two probability densites π and π 0 ,

0 ≤ dTV (π, π 0 ) ≤ 1 and 0 ≤ dH (π, π 0 ) ≤ 1.

Proof. The lower bounds follow immediately from the definition of dTV
and dH . It remains to prove the upper bounds. To this end, we estimate
Z Z Z
0 1 0 1 1
dTV (π, π ) = |π(x) − π (x)|dx ≤ π(x)dx + π 0 (x)dx = 1
2 Rd 2 Rd 2 Rd

and
Z
0 2 1 p p 2
dH (π, π ) = π(x) − π 0 (x) dx
2 Rd
Z
1
π(x) + π 0 (x) − 2 π(x)π 0 (x) dx
p
=
2 Rd
Z
1
π(x) + π 0 (x) dx = 1.

≤
2 Rd
In what follows, we will establish bounds between Hellinger and total
variation distance and show how both distances can be used to bound the
difference of expected values with respect to two different densities; these
results will be useful in subsequent lectures.
Lemma
For any two probability densities π and π 0 , the total variation and
Hellinger distance are related by the inequalities
1
√ dTV (π, π 0 ) ≤ dH (π, π 0 ) ≤ dTV (π, π 0 ).
p
2
Proof. Using the Cauchy–Schwarz inequality and (a + b)2 ≤ 2a2 + 2b 2
leads to
Z p
1
dTV (π, π 0 ) =
p p p
π(x) − π 0 (x) · π(x) + π 0 (x) dx
2 Rd
Z 1 Z 1
1 p p
0
2 2 1 p p
0
2 2
≤ π(x) − π (x) dx π(x) + π (x) dx
2 Rd 2 Rd
1
√
Z
0 1 0
2
= 2dH (π, π 0 ).

≤ dH (π, π ) 2π(x) + 2π (x) dx
2 Rd
p p p p
0 0
p | π(x) − π (x)| ≤ | π(x) + π (x)|, since
Notice that
p
π(x), π 0 (x) ≥ 0. Thus, we have
Z p
1 2
dH (π, π 0 )2 =
p
π(x) − π 0 (x) dx
2 Rd
Z p
1 p p p
≤ π(x) − π 0 (x) · π(x) + π 0 (x) dx
2 Rd
Z
1
= π(x) − π 0 (x) dx = dTV (π, π 0 ).
2 Rd
The following lemmata show that if two densities are close in total
variation or Hellinger distance, expectations computed with respect to
both densities are also close.
Lemma
Let f be a real valued function on Rd such that
0
Eπ [f 2 ] + Eπ [f 2 ] =: f22 < ∞, then
0
Eπ [f ] − Eπ [f ] ≤ 2f2 dH (π, π 0 ). (2)

Proof. We estimate
Z
π0
π
f (x) π(x) − π 0 (x) dx

E [f ] − E [f ] =
Rd
Z p p p p
= f (x) π(x) − π 0 (x) π(x) + π 0 (x) dx
Rd
Z 1 Z 1
1 p p
0
2 2
2
p p
0
2 2
≤ π(x)− π (x) dx 2 |f (x)| π(x)+ π (x) dx
2 Rd Rd
Z 1
2
0 2 0
= 2f2 dH (π, π 0 ).

≤ dH (π, π ) 4 |f (x)| π(x) + π (x) dx
Rd
Lemma
Let f be a real valued function on Rd such that
supx∈Rd |f (x)| =: kf k∞ < ∞, then
0
Eπ [f ] − Eπ [f ] ≤ 2kf k∞ dTV (π, π 0 ).

Moreover, the following variational characterization of the total variation

distance holds:
1 0
dTV (π, π 0 ) = sup Eπ [f ] − Eπ [f ] .
2 kf k∞ ≤1

Remark: Note that the result for the Hellinger distance only assumes that
f is square integrable with respect to π and π 0 , whereas the result for the
total variation distance requires that f is bounded.
Proof. For the first part of the lemma, note that
Z
0
Eπ [f ] − Eπ [f ] = f (x) π(x) − π 0 (x) dx

Z Rd
1
≤ 2kf k∞ · |π(x) − π 0 (x)|dx = 2kf k∞ dTV (π, π 0 ).
2 Rd
This in particular shows that, for any f with kf k∞ = 1,
1 π 0
dTV (π, π 0 ) ≥ E [f ] − Eπ [f ] .
2
Our goal now is to show a choice
of f withkf k∞ = 1 that achieves
equality. Define f (x) := sign π(x) − π 0 (x) , so that

f (x) π(x) − π 0 (x) = |π(x) − π 0 (x)|. Then, kf k∞ = 1 and
Z Z
0 1 0 1
dTV (π, π ) = |π(x) − π (x)| dx = f (x) π(x) − π 0 (x) dx
2 Rd 2 Rd
1 π 0
= E [f ] − Eπ [f ] .
2
This completes the proof of the variational characterization.
Approximation theorem

We denote by

g (x) = ν(y − F (x)) and gδ (x) = ν(y − Fδ (x))

the likelihoods associated with F and Fδ , so that

1 1
π y (x) = g (x)π(x) and πδy (x) = gδ (x)π(x)
Z Zδ
with corresponding normalising constants Z , Zδ > 0. We make the
following assumptions on g and gδ .
Assumption 1. There exist δ + > 0, constants K1 , K2 > 0, and a function
ϕ: Rd → R such that Eπ [ϕ2 ] ≤ K1 and for all δ ∈ (0, δ + ),
p p
1 g (x) − gδ (x) ≤ ϕ(x)δ for all x ∈ Rd ,
p p
2 g (x) + gδ (x) ≤ K2 for all x ∈ Rd .
Lemma
Under Assumption 1 there exist δ̃ + > 0, c1 , c2 ∈ (0, +∞) such that

|Z − Zδ | ≤ c1 δ and Z , Zδ > c2 , for δ ∈ (0, δ̃ + ).

R R
Proof. Since Z = Rd g (x)π(x)dx and Zδ = Rd gδ (x)π(x)dx we have
Z

|Z − Zδ | = g (x) − gδ (x) π(x)dx
Rd
Z p 2 1 Z p 2 1
2 2
p p
≤ g (x) − gδ (x) π(x)dx g (x) + gδ (x) π(x)dx
Rd Rd
Z 1 Z 1
2 2 2 2 2
≤ δ φ(x) π(x)dx K2 π(x)dx
d d
pR R
+
≤ K1 K2 δ, δ ∈ (0, δ ).
And when δ ≤ δ̃ + := min{ 2√KZ , δ + }, we have
1 K2

1
Zδ ≥ Z − |Z − Zδ | ≥ Z .
2
√
The lemma follows by taking c1 = K1 K2 and c2 = 12 Z .
Theorem (Well-posedness)

Under Assumption 1, there exist δe+ > 0 and c > 0 such that

dH (π y , πδy ) ≤ cδ for all δ ∈ (0, δe+ ).

Proof. We break the distance into two error parts, one caused by the
difference between Z and Zδ , the other caused by the difference between g
and gδ :
1 √ y
q
dH (π y , πδy ) = √ π − πδy 2
2 L
r r r r
1 gπ gπ gπ gδ π
= √ − + −
2 Z Zδ Zδ Zδ L2
r r r r
1 gπ gπ 1 gπ gδ π
≤ √ − + √ − .
2 Z Z δ L 2
2 Zδ Zδ L2
On the previous slide, we obtained
r r r r
y y 1 gπ gπ 1 gπ gδ π
dH (π , πδ ) ≤ √ − +√ − .
2 Z Zδ L2 2 Zδ Zδ L2

Using the previous Lemma, for δ ∈ (0, δ̃ + ), we have for the first term
r
gπ
r
gπ 1 1
Z 1
2
− = √ −√ g (x)π(x)dx
Z Zδ L2 Z Zδ d
| R
√
{z }
= Z
√ √ √
Z Zδ − Z |Z − Zδ | c1
= 1− √ = √ = √ √ √ ≤ δ.
Zδ Zδ ( Z + Zδ ) Zδ 2c2
For the second term, we obtain
r
gπ
r
gδ π 1
Z p
2 1 rK
2 1
p
− =√ g (x) − gδ (x) π(x)dx ≤ δ.
Zδ Zδ L2 Zδ R d c 2
Therefore r
y y 1 c1 1 K1
dH (π , πδ ) ≤ √ δ+√ δ = cδ,
2 2c2 2 c2
q
c1 K1
with c = √12 2c 2
+ √1
2 c2 independent of δ.
Notice that, together with (2), i.e., the inequality
0 0
Eπ [f ] − Eπ [f ] ≤ 2f2 dH (π, π 0 ), f22 := Eπ [f 2 ] + Eπ [f 2 ],

this theorem guarantees that expectations computed with respect to π y

and πδy are in the order of δ apart.

Bayesian Inference and Updating Explained
No ratings yet
Bayesian Inference and Updating Explained
6 pages
MCMC and Bayesian Modeling Overview
No ratings yet
MCMC and Bayesian Modeling Overview
27 pages
Bayesian Analysis in Econometrics
No ratings yet
Bayesian Analysis in Econometrics
37 pages
Bayesian Estimation in R and NIMBLE
No ratings yet
Bayesian Estimation in R and NIMBLE
43 pages
BS Lec02 Annotated
No ratings yet
BS Lec02 Annotated
46 pages
Bayesian Time Series Econometrics Guide
No ratings yet
Bayesian Time Series Econometrics Guide
72 pages
Stat 111: Bayesian Inference Overview
No ratings yet
Stat 111: Bayesian Inference Overview
7 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
EE3111_Tut4
No ratings yet
EE3111_Tut4
2 pages
Bayesian Inference and Modeling Techniques
No ratings yet
Bayesian Inference and Modeling Techniques
50 pages
Applied Probability and Statistics Examples
No ratings yet
Applied Probability and Statistics Examples
13 pages
Bayesian Learning and Models Explained
No ratings yet
Bayesian Learning and Models Explained
55 pages
Understanding Bayesian Statistics Principles
No ratings yet
Understanding Bayesian Statistics Principles
45 pages
Introduction to Bayesian Inference
No ratings yet
Introduction to Bayesian Inference
34 pages
MLPR Tutorial on Gaussians and Priors
No ratings yet
MLPR Tutorial on Gaussians and Priors
2 pages
Introduction to Bayesian Statistics
No ratings yet
Introduction to Bayesian Statistics
6 pages
Bayesian Inference and Modeling Techniques
No ratings yet
Bayesian Inference and Modeling Techniques
39 pages
Understanding Bayesian Inference
No ratings yet
Understanding Bayesian Inference
7 pages
Introduction to Bayesian Methods
No ratings yet
Introduction to Bayesian Methods
53 pages
Bayesian Modeling and Computation in Python 1st Edition Martin Full Ebook Version
100% (3)
Bayesian Modeling and Computation in Python 1st Edition Martin Full Ebook Version
72 pages
Bayesian Linear Regression Overview
No ratings yet
Bayesian Linear Regression Overview
23 pages
Bayesian Inference Explained
No ratings yet
Bayesian Inference Explained
45 pages
MAP Estimation in Machine Learning
No ratings yet
MAP Estimation in Machine Learning
5 pages
Bayesian Inference for Gaussian Models
No ratings yet
Bayesian Inference for Gaussian Models
11 pages
Bayesian Uncertainty Quantification Methods
No ratings yet
Bayesian Uncertainty Quantification Methods
23 pages
Modern Bayesian Econometrics Overview
No ratings yet
Modern Bayesian Econometrics Overview
100 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Inverse Problems: Gaussian Sampling Techniques
No ratings yet
Inverse Problems: Gaussian Sampling Techniques
47 pages
Bayesian Inference in Statistics
No ratings yet
Bayesian Inference in Statistics
31 pages
Bayesian Inference in Health Data Analysis
No ratings yet
Bayesian Inference in Health Data Analysis
31 pages
Bayesian Learning: Parameter Estimation
No ratings yet
Bayesian Learning: Parameter Estimation
8 pages
Understanding Bayesian Statistics
No ratings yet
Understanding Bayesian Statistics
80 pages
Bayesian Linear Regression Overview
No ratings yet
Bayesian Linear Regression Overview
23 pages
Bayesian Analysis in Empirical Economics
No ratings yet
Bayesian Analysis in Empirical Economics
18 pages
Bayesian Point Estimation Overview
No ratings yet
Bayesian Point Estimation Overview
5 pages
Conjugate Bayesian Analysis of Gaussian
100% (1)
Conjugate Bayesian Analysis of Gaussian
29 pages
Introduction to Bayesian Inference and MCMC
No ratings yet
Introduction to Bayesian Inference and MCMC
34 pages
Bayesian Estimation Methods Explained
No ratings yet
Bayesian Estimation Methods Explained
14 pages
sheet_5_sol
No ratings yet
sheet_5_sol
5 pages
Bayesian Probabilistic Modeling Guide
No ratings yet
Bayesian Probabilistic Modeling Guide
49 pages
Bayesian Estimation in Machine Learning
No ratings yet
Bayesian Estimation in Machine Learning
6 pages
Bayesian Inference for Gaussian Models
No ratings yet
Bayesian Inference for Gaussian Models
15 pages
Introduction to Bayesian Econometrics
No ratings yet
Introduction to Bayesian Econometrics
74 pages
Multi-Parameter Bayesian Inference Guide
No ratings yet
Multi-Parameter Bayesian Inference Guide
5 pages
Simple Particle Filter Tutorial
No ratings yet
Simple Particle Filter Tutorial
8 pages
Bayesian Inference Course Overview
No ratings yet
Bayesian Inference Course Overview
73 pages
Bayesian Statistics: Key Concepts and Methods
No ratings yet
Bayesian Statistics: Key Concepts and Methods
70 pages
Summarizing Univariate Posteriors
No ratings yet
Summarizing Univariate Posteriors
37 pages
Bayesian Models in Machine Learning
No ratings yet
Bayesian Models in Machine Learning
126 pages
Advanced Bayesian Inference Techniques
No ratings yet
Advanced Bayesian Inference Techniques
152 pages
Bayesian Modelling for Data Analysis
No ratings yet
Bayesian Modelling for Data Analysis
19 pages
Bayesian Inversion: A Gentle Tutorial
No ratings yet
Bayesian Inversion: A Gentle Tutorial
49 pages
A Study of Inguinal Hernia in Children
No ratings yet
A Study of Inguinal Hernia in Children
5 pages
Domagas v. Jensen: Summons Validity Issue
No ratings yet
Domagas v. Jensen: Summons Validity Issue
1 page
Understanding Organisational Diagnosis
100% (1)
Understanding Organisational Diagnosis
4 pages
Winter 2024-25 Fee Payment List
No ratings yet
Winter 2024-25 Fee Payment List
50 pages
Gender's Impact on Communication Styles
No ratings yet
Gender's Impact on Communication Styles
2 pages
Understanding Absolute and Connotative Concepts
100% (2)
Understanding Absolute and Connotative Concepts
13 pages
Zhenjiu Dacheng: Acupuncture Compendium
No ratings yet
Zhenjiu Dacheng: Acupuncture Compendium
3 pages
Knitting Pattern: Children's Bow Dress
No ratings yet
Knitting Pattern: Children's Bow Dress
6 pages
Future Tense Practice Exercises
No ratings yet
Future Tense Practice Exercises
4 pages
Highway Design: Geometric Elements Overview
No ratings yet
Highway Design: Geometric Elements Overview
3 pages
Blogs as Tools in EFL Writing Classes
No ratings yet
Blogs as Tools in EFL Writing Classes
7 pages
Childcare Giver III Standards Overview
No ratings yet
Childcare Giver III Standards Overview
64 pages
Science, Technology, and Society Overview
No ratings yet
Science, Technology, and Society Overview
73 pages
Seven Spiritual Laws for Success
No ratings yet
Seven Spiritual Laws for Success
23 pages
Saudi Arabia's Economic Transformation Challenges
No ratings yet
Saudi Arabia's Economic Transformation Challenges
2 pages
Daman Checkpost Tax Payment Receipt
No ratings yet
Daman Checkpost Tax Payment Receipt
1 page
Human Values in Engineering Ethics
No ratings yet
Human Values in Engineering Ethics
3 pages
Management of Blebs and Bullae in Pneumothorax
100% (1)
Management of Blebs and Bullae in Pneumothorax
47 pages
MCZ Strategic Plan 2024-2028 Overview
No ratings yet
MCZ Strategic Plan 2024-2028 Overview
12 pages
Environmental Justice and Global Home
No ratings yet
Environmental Justice and Global Home
6 pages
Hebrew Words for Faith Explained
No ratings yet
Hebrew Words for Faith Explained
10 pages
BSED English Graduates' Employability Study
100% (2)
BSED English Graduates' Employability Study
9 pages
The Evolution of 3D Internet
No ratings yet
The Evolution of 3D Internet
19 pages
Rizal's Poem on Love for Country
No ratings yet
Rizal's Poem on Love for Country
2 pages
Function Words and Stressed Lexical Units
No ratings yet
Function Words and Stressed Lexical Units
1 page
Indian Polity MCQs with Answers PDF
0% (1)
Indian Polity MCQs with Answers PDF
12 pages
Essential Elements of Literature Reviews
No ratings yet
Essential Elements of Literature Reviews
2 pages
Paul's Theological Use of Isaiah
No ratings yet
Paul's Theological Use of Isaiah
8 pages
CCL Products Q4FY19 Result Update
No ratings yet
CCL Products Q4FY19 Result Update
6 pages
Public Finance - Sample Final Questions
No ratings yet
Public Finance - Sample Final Questions
5 pages

Bayesian Inverse Problems Overview

Uploaded by

Bayesian Inverse Problems Overview

Uploaded by

Inverse Problems

FU Berlin, FB Mathematik und Informatik

Sixth lecture, May 30, 2022

Monday June 6 (next week) is a public holiday.

P(y |x)P(x) ν(y − F (x))π(x)

The prior model π(x) describes a priori information. It should assign

π y (x) ∝ ν(y − F (x))π(x),

where the symbol ∝ means equality up to a constant factor.

As motivation† , suppose that we are interested in estimating a signal

where the blurring kernel is

and η ∈ Rk is measurement noise.

of the posterior distribution.

of the posterior distribution (if a unique mode exists).

For small α, it is a region that contains a large fraction of the posterior

where x † is thought to be a deterministic “true” value of the unknown.

x̂ML (y ) = arg max P(y |u)

answers the question: “which value of the unknown is most likely to

The question is therefore:

does |g − gδ | = O(δ) imply d(π y , πδy ) = O(δ)

0 ≤ dTV (π, π 0 ) ≤ 1 and 0 ≤ dH (π, π 0 ) ≤ 1.

Moreover, the following variational characterization of the total variation

g (x) = ν(y − F (x)) and gδ (x) = ν(y − Fδ (x))

the likelihoods associated with F and Fδ , so that

|Z − Zδ | ≤ c1 δ and Z , Zδ > c2 , for δ ∈ (0, δ̃ + ).

dH (π y , πδy ) ≤ cδ for all δ ∈ (0, δe+ ).

this theorem guarantees that expectations computed with respect to π y

You might also like