0% found this document useful (0 votes)

73 views70 pages

MCMC Sampling Methods Overview

This document provides an introduction to Markov chain Monte Carlo (MCMC) methods. It discusses how MCMC can be used to sample from complicated probability distributions by creating a Markov chain with the desired limiting distribution. Specifically, it covers rejection sampling and importance sampling as classical solutions, and then introduces Markov chains and the key concepts of irreducibility and aperiodicity which ensure a Markov chain converges to a unique limiting distribution. It also discusses the detailed balance property of reversible Markov chains which can be used to determine the limiting distribution. The document emphasizes how MCMC sampling methods like Metropolis-Hastings allow sampling from complex high-dimensional distributions that arise in many applications.

Uploaded by

Paul Wanjoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views70 pages

MCMC Sampling Methods Overview

Uploaded by

Paul Wanjoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

CMU-10701
Markov Chain Monte Carlo Methods

Barnabás Póczos & Aarti Singh

Contents
Markov Chain Monte Carlo Methods
• Goal & Motivation
Sampling
• Rejection
• Importance
Markov Chains
• Properties
MCMC sampling
• Hastings-Metropolis
• Gibbs

2
Monte Carlo Methods

3
The importance of MCMC
A recent survey places the Metropolis algorithm among the

10 algorithms that have had the greatest influence on the

development and practice of science and engineering in the 20th
century (Beichl&Sullivan, 2000).

The Metropolis algorithm is an instance of a large class of sampling

algorithms, known as Markov chain Monte Carlo (MCMC).

4
MCMC Applications
MCMC plays significant role in statistics, econometrics, physics and
computing science.

Sampling from high-dimensional, complicated distributions

Bayesian inference and learning

Marginalization

Normalization

Expectation

Global optimization
5
The Monte Carlo principle
Our goal is to estimate the following integral:

The idea of Monte Carlo simulation is to draw an i.i.d. set of samples

{x(i) } from a target density p(x) defined on a high-dim. space X.

Estimator:

6
The Monte Carlo principle

Theorems a.s. consistent

Unbiased estimation

Independent of dimension d!

Asymptotically normal

7
The Monte Carlo principle

One “tiny” problem…

Monte Carlo methods need sample from distribution p(x).

When p(x) has standard form, e.g. Uniform or Gaussian, it is

straightforward to sample from it using easily available routines.

However, when this is not the case, we need to introduce more

sophisticated sampling techniques. ⇒ MCMC sampling

8
Sampling

Rejection sampling
Importance sampling

9
Main Goal

Sample from distribution p(x) that is only known up

to a proportionality constant

For example,
p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)

10
Rejection Sampling

11
Rejection Sampling Conditions
Suppose that
p(x) is known up to a proportionality constant
p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)

It is easy to sample from q(x) that satisfies p(x) ≤ M q(x), M < ∞

M is known

12
Rejection Sampling Algorithm

13
Rejection Sampling
Theorem

The accepted x(i ) can be shown to be sampled with probability p(x)

(Robert & Casella, 1999, p. 49).

Severe limitations:

It is not always possible to bound p(x)/q(x) with a reasonable

constant M over the whole space X.
If M is too large, the acceptance probability is too small.
In high dimensional spaces it can be exponentially slow to sample
points. (The points usually will be rejected)

14
Importance Sampling

15
Importance Sampling
Goal: Sample from distribution p(x) that is only known up to a
proportionality constant
Importance sampling is an alternative “classical” solution that goes
back to the 1940’s.

Let us introduce, again, an arbitrary importance proposal distribution

q(x) such that its support includes the support of p(x).

Then we can rewrite I(f) as follows:

16
Importance Sampling

Consequently,

17
Importance Sampling

Theorem
This estimator is unbiased
Under weak assumptions, the strong law of large numbers applies:

Some proposal distributions q(x) will obviously be preferable to others.

Which one should we choose?

18
Importance Sampling

Theorem
This estimator is unbiased
Under weak assumptions, the strong law of large numbers applies:

Some proposal distributions q(x) will obviously be preferable to others.

19
Importance Sampling
Find one that minimizes the variance of the estimator!

Theorem
The variance is minimal when we adopt the following
optimal importance distribution:

20
Importance Sampling
The optimal proposal is not very useful in the sense that it is not easy to
sample from

High sampling efficiency is achieved when we focus on sampling from p(x)

in the important regions where |f (x)|p(x) is relatively large; hence the
name importance sampling

Importance sampling estimates can be super-efficient:

For a given function f (x), it is possible to find a distribution q(x)
that yields an estimate with a lower variance than when using
q(x)= p(x)!
In high dimensions it is not efficient either…

21
MCMC sampling - Main ideas
Create a Markov chain, which has the desired limiting distribution!

22
Andrey Markov

Markov Chains

23
Markov Chains
Markov chain:

Homogen Markov chain:

24
Markov Chains
Assume that the state space is finite:

1-Step state transition matrix:

Lemma: The state transition matrix is stochastic:

t-Step state transition matrix:

Lemma:

25
Markov Chains Example
Markov chain with three states (s = 3)

Transition matrix Transition graph

26
Markov Chains,
stationary distribution
Definition:
[stationary distribution, invariant distribution, steady state distributions]

The stationary distribution might be not unique (e.g. T= identity matrix)

27
Markov Chains, limit distributions
Some Markov chains have unique limit distribution:

If the probability vector for the initial state is

it follows that

and, after several iterations (multiplications by T )

limit distribution
no matter what initial distribution µ(x1) was.

The chain has forgotten its past.

28
Markov Chains
Our goal is to find conditions under which the Markov chain
converges to a unique limit distribution (independently from its
starting state distribution)

Observation:
If this limiting distribution exists, it has to be the stationary distribution.

29
Limit Theorem of Markov Chains
Theorem:

If the Markov chain is Irreducible and Aperiodic, then:

That is, the chain will convergence to the unique stationary distribution
30
Markov Chains
Definition

Irreducibility:
For each pairs of states (i,j), there is a positive probability, starting in
state i, that the process will ever enter state j.
= The matrix T cannot be reduced to separate smaller matrices
= Transition graph is connected.

It is possible to get to any state from any state.

31
Markov Chains
Definition

Aperiodicity: The chain cannot get trapped in cycles.

Definition
A state i has period k if any return to state i, must occur in multiples of
k time steps. Formally, the period of a state i is defined as

(where "gcd" is the greatest common divisor)

For example, suppose it is possible to return to the state in

{6,8,10,12,...} time steps. Then k=2
32
Markov Chains
Definition

Aperiodicity: The chain cannot get trapped in cycles.

In other words,
a state i is aperiodic if there exists n such that for all n' ≥ n,

Definition
A Markov chain is aperiodic if every state is aperiodic.

33
Markov Chains
Example for periodic Markov chain:

Let

In this case

If we start the chain from (1,0), or (0,1), then the chain get
traps into a cycle, it doesn’t forget its past.

It has stationary distribution, but no limiting distribution!

34
Reversible Markov chains
(Detailed Balance Property)
How can we find the limiting distribution of an irreducible and aperiodic
Markov chain?

Definition: reversibility /detailed balance condition:

Theorem:

A sufficient, but not necessary, condition to ensure that a particular π is

the desired invariant distribution of the Markov chain is the detailed
balance condition.
35
How fast can Markov chains forget
the past?
MCMC samplers are

irreducible and aperiodic Markov chains

have the target distribution as the invariant distribution.
the detailed balance condition is satisfied.

It is also important to design samplers that converge quickly.

36
Spectral properties
Theorem: If

π is the left eigenvector of the matrix T with eigenvalue 1.

The Perron-Frobenius theorem from linear algebra tells us that the

remaining eigenvalues have absolute value less than 1.

The second largest eigenvalue, therefore, determines the rate of

convergence of the chain, and should be as small as possible.

37
The Hastings-Metropolis Algorithm

38
The Hastings-Metropolis Algorithm

Our goal:

Generate samples from the following discrete distribution:

We don’t know B !

The main idea is to construct a time-reversible Markov chain

with (π,…,πm) limit distributions

Later we will discuss what to do when the distribution is continuous 39

The Hastings-Metropolis Algorithm
Let {1,2,…,m} be the state space of a Markov chain that we
can simulate.

No rejection: we use all X1, X2,… Xn, … 40

Example for Large State Space
Let {1,2,…,m} be the state space of a Markov chain that we
can simulate.
d-dimensional grid:

Max 2d possible movements at each grid point (linear in d)

Exponentially large state space in dimension d

41
The Hastings-Metropolis Algorithm

Theorem

Proof
42
The Hastings-Metropolis Algorithm
Observation

Proof:
Corollary

Theorem

43
The Hastings-Metropolis Algorithm

Theorem

Proof:

Note:
44
The Hastings-Metropolis Algorithm

It is not rejection sampling, we use all the samples! 45

Continuous Distributions

The same algorithm can be used for

continuous distributions as well.

In this case, the state space is continuous.

46
Experiment with HM
An application for continuous distributions

Bimodal target distribution: p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)

q(x | x(i )) = N(x(i), 100), 5000 iterations 47
Good proposal distrib. is important

48
HM on Combinatorial Sets

Generate uniformly distributed samples from the set of permutations

Let n=3, and a=12: {1,2,3}: 1+4+9=14

{1,3,2}: 1+6+6=13
{2,3,1}: 2+6+3=11
{2,1,3}: 2+2+9=13
{3,1,2}: 3+2+6=11
{3,2,1}: 3+4+3=10
49
HM on Combinatorial Sets
To define a simple Markov chain on , we need the concept of
neighboring elements (permutations):

Definition: Two permutations are neighbors, if one results from

the interchange of two of the positions of the other:

(1,2,3,4) and (1,2,4,3) are neighbors.

(1,2,3,4) and (1,3,4,2) are not neighbors.

50
HM on Combinatorial Sets

That is what we wanted!

51
Gibbs Sampling: The Problem

Suppose that we can generate samples from

Our goal is to generate samples from

52
Gibbs Sampling: Pseudo Code

53
Gibbs Sampling: Theory
Consider the following HM sampler:
Let

and let

Observation: By construction, this HM sampler would sample from

We will prove that this HM sampler = Gibbs sampler. 54

Gibbs Sampling is a Special HM
Theorem: The Gibbs sampling is a special case of HM with

Proof:
By definition:

55
Gibbs Sampling is a Special HM
Proof:

56
Gibbs Sampling in Practice

57
Simulated Annealing

58
Simulated Annealing

Goal: Find

59
Simulated Annealing

Theorem:

Proof:

60
Simulated Annealing
Main idea
Let λ be big.
Generate a Markov chain with limit distribution Pλ(x).
In long run, the Markov chain will jump among the maximum points of
Pλ(x).

Introduce the relationship of neighboring vectors:

61
Simulated Annealing
Uniform distribution

Use the Hastings- Metropolis sampling:

62
Simulated Annealing: Pseudo Code

With prob. α accept the new state

with prob. (1-α) don't accept and stay

63
Simulated Annealing: Special case

In this special case:

With prob. α=1 accept the new state since

we increased V

64
Simulated Annealing: Problems

65
Simulated Annealing
Temperature = 1/ λ

66
Simulated Annealing

67
Monte Carlo EM
E Step:

Monte Carlo EM:

Then the integral can be approximated! ☺

68
Monte Carlo EM

69
Thanks for the Attention! ☺

Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
Markov Chain Monte Carlo Methods Explained
No ratings yet
Markov Chain Monte Carlo Methods Explained
66 pages
MCMC Theory and Algorithms Overview
No ratings yet
MCMC Theory and Algorithms Overview
77 pages
Markov Chain Monte Carlo Overview
No ratings yet
Markov Chain Monte Carlo Overview
29 pages
General State Space MCMC Overview
No ratings yet
General State Space MCMC Overview
64 pages
Markov Transition Kernels Explained
No ratings yet
Markov Transition Kernels Explained
9 pages
Markov Chain Monte Carlo in Machine Learning
No ratings yet
Markov Chain Monte Carlo in Machine Learning
74 pages
Monte Carlo Sampling Techniques Explained
No ratings yet
Monte Carlo Sampling Techniques Explained
101 pages
MCMC and Gibbs Sampling Overview
No ratings yet
MCMC and Gibbs Sampling Overview
24 pages
MCMC Methods in Machine Learning Q&A
No ratings yet
MCMC Methods in Machine Learning Q&A
14 pages
Markov Chain Monte Carlo Overview
No ratings yet
Markov Chain Monte Carlo Overview
8 pages
Markov Chain Monte Carlo Methods Overview
No ratings yet
Markov Chain Monte Carlo Methods Overview
79 pages
Metropolis-Hastings Algorithm Explained
No ratings yet
Metropolis-Hastings Algorithm Explained
4 pages
MCMC Methods: Gibbs Sampler Lectures
No ratings yet
MCMC Methods: Gibbs Sampler Lectures
7 pages
ML UNIT-V word 5.3.26
No ratings yet
ML UNIT-V word 5.3.26
16 pages
Stochastic Methods in Statistical Mechanics
No ratings yet
Stochastic Methods in Statistical Mechanics
24 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
10 pages
MCMC Methods: Gibbs & Metropolis-Hastings
No ratings yet
MCMC Methods: Gibbs & Metropolis-Hastings
32 pages
Time-Homogeneous Markov Chains Overview
No ratings yet
Time-Homogeneous Markov Chains Overview
13 pages
Graphical Models and MCMC Methods
No ratings yet
Graphical Models and MCMC Methods
19 pages
Markov Chains: Theory and Applications
No ratings yet
Markov Chains: Theory and Applications
15 pages
Understanding Markov Chains and Their Applications
No ratings yet
Understanding Markov Chains and Their Applications
42 pages
Markov Chain Monte Carlo Overview
No ratings yet
Markov Chain Monte Carlo Overview
29 pages
Understanding Basic Markov Chains
No ratings yet
Understanding Basic Markov Chains
75 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
12 pages
Bayesian Inference and Sampling Methods
No ratings yet
Bayesian Inference and Sampling Methods
41 pages
Markov Chains and Bayesian Statistics Guide
No ratings yet
Markov Chains and Bayesian Statistics Guide
29 pages
Markov Chains in Data Science Applications
No ratings yet
Markov Chains in Data Science Applications
19 pages
MCMC Methods in Machine Learning
No ratings yet
MCMC Methods in Machine Learning
42 pages
MCMC: Top Algorithm for Sampling
100% (1)
MCMC: Top Algorithm for Sampling
31 pages
Understanding Markov Models and Chains
No ratings yet
Understanding Markov Models and Chains
14 pages
Markov Chain Monte Carlo Overview
No ratings yet
Markov Chain Monte Carlo Overview
6 pages
Monte Carlo Sampling Methods Overview
No ratings yet
Monte Carlo Sampling Methods Overview
32 pages
MCMC Methods in Bayesian Statistics
No ratings yet
MCMC Methods in Bayesian Statistics
2 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
MCMC Methods in Machine Learning
No ratings yet
MCMC Methods in Machine Learning
13 pages
Markov Chains and Chapman-Kolmogorov
No ratings yet
Markov Chains and Chapman-Kolmogorov
76 pages
Understanding Markov Chains
No ratings yet
Understanding Markov Chains
23 pages
Gibbs Sampling in MCMC Methods
No ratings yet
Gibbs Sampling in MCMC Methods
59 pages
Markov Chains: Concepts and Applications
No ratings yet
Markov Chains: Concepts and Applications
61 pages
Introduction to Discrete-Time Markov Chains
No ratings yet
Introduction to Discrete-Time Markov Chains
22 pages
Gibbs Sampling in Bayesian Analysis
No ratings yet
Gibbs Sampling in Bayesian Analysis
35 pages
Markov Models: Current Next Transition Probabilities Current
100% (1)
Markov Models: Current Next Transition Probabilities Current
53 pages
Understanding Markov Chain Monte Carlo
No ratings yet
Understanding Markov Chain Monte Carlo
4 pages
Markov Chain Monte Carlo Techniques
100% (1)
Markov Chain Monte Carlo Techniques
69 pages
Markov Chain Monte Carlo Methods Explained
No ratings yet
Markov Chain Monte Carlo Methods Explained
24 pages
Acceptance-Rejection Method in MCMC
No ratings yet
Acceptance-Rejection Method in MCMC
3 pages
Markov Chain Lecture Notes
100% (1)
Markov Chain Lecture Notes
108 pages
Overview of Markov Chains
100% (1)
Overview of Markov Chains
29 pages
Quantum Monte Carlo Methods Algorithms For Lattice Models 1st Edition James Gubernatis Instant Download
No ratings yet
Quantum Monte Carlo Methods Algorithms For Lattice Models 1st Edition James Gubernatis Instant Download
94 pages
Markov Chain Monte Carlo Techniques
No ratings yet
Markov Chain Monte Carlo Techniques
27 pages
Understanding Markov Chain Monte Carlo
No ratings yet
Understanding Markov Chain Monte Carlo
17 pages
Acceptance-Rejection Method in MCMC
No ratings yet
Acceptance-Rejection Method in MCMC
5 pages
Markov Chains and Monte Carlo Methods: Ioana A. Cosma and Ludger Evers
No ratings yet
Markov Chains and Monte Carlo Methods: Ioana A. Cosma and Ludger Evers
97 pages
Markov Chain Models in Bioinformatics
No ratings yet
Markov Chain Models in Bioinformatics
28 pages
Understanding Discrete-Time Markov Chains
No ratings yet
Understanding Discrete-Time Markov Chains
37 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
144 pages
Markov Models for Inventory and Forecasting
No ratings yet
Markov Models for Inventory and Forecasting
20 pages
Normality Test Results for Pretest/Posttest
No ratings yet
Normality Test Results for Pretest/Posttest
8 pages
Exam P Actuarial Formula Sheet
No ratings yet
Exam P Actuarial Formula Sheet
9 pages
Understanding Sampling Distributions
No ratings yet
Understanding Sampling Distributions
5 pages
T-Test Results: Public vs Private Sector
No ratings yet
T-Test Results: Public vs Private Sector
4 pages
Small Sample Inference Procedures
No ratings yet
Small Sample Inference Procedures
44 pages
Updated: January 2026: Andtype-2-Diabetes-238533420
100% (1)
Updated: January 2026: Andtype-2-Diabetes-238533420
64 pages
Class 9 Probability MCQ Exercises
No ratings yet
Class 9 Probability MCQ Exercises
2 pages
Probability Theory in Stochastic Processes
No ratings yet
Probability Theory in Stochastic Processes
57 pages
Exercise 3B: NP P X
No ratings yet
Exercise 3B: NP P X
3 pages
Tinggi dan Jumlah Daun Sawi Hijau
No ratings yet
Tinggi dan Jumlah Daun Sawi Hijau
11 pages
Essentials of Probability Theor - Michael A. Proschan
100% (2)
Essentials of Probability Theor - Michael A. Proschan
361 pages
Estimation Techniques in Statistics
No ratings yet
Estimation Techniques in Statistics
2 pages
Conditional Probability in EE313
No ratings yet
Conditional Probability in EE313
19 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
25 pages
Hull RMFI6 e CH 12 V2
No ratings yet
Hull RMFI6 e CH 12 V2
20 pages
Decision Analysis in Operations Research
No ratings yet
Decision Analysis in Operations Research
17 pages
Section 8.1: Sampling Distributions
No ratings yet
Section 8.1: Sampling Distributions
65 pages
ARDL Modelling for Cointegration Analysis
No ratings yet
ARDL Modelling for Cointegration Analysis
47 pages
MGF Method for Random Variables
No ratings yet
MGF Method for Random Variables
8 pages
Understanding Standard Scores in Research
No ratings yet
Understanding Standard Scores in Research
67 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
23 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
20 pages
Probability and Statistics Exam Questions
No ratings yet
Probability and Statistics Exam Questions
4 pages
Classical Statistics: Estimation & CIs
No ratings yet
Classical Statistics: Estimation & CIs
15 pages
ACTL3301 Midterm Exam 1 Instructions
No ratings yet
ACTL3301 Midterm Exam 1 Instructions
8 pages
Discrete Distributions in Actuarial Modeling
No ratings yet
Discrete Distributions in Actuarial Modeling
11 pages
Understanding Reliability Theory Concepts
No ratings yet
Understanding Reliability Theory Concepts
83 pages
A2 Mathematics Statistics Homework Guide
No ratings yet
A2 Mathematics Statistics Homework Guide
61 pages
Medical Statistics Overview at UoN
No ratings yet
Medical Statistics Overview at UoN
23 pages
Probability and Random Processes Exam
No ratings yet
Probability and Random Processes Exam
4 pages

MCMC Sampling Methods Overview

Uploaded by

MCMC Sampling Methods Overview

Uploaded by

Introduction to Machine Learning

Barnabás Póczos & Aarti Singh

10 algorithms that have had the greatest influence on the

The Metropolis algorithm is an instance of a large class of sampling

Sampling from high-dimensional, complicated distributions

Bayesian inference and learning

The idea of Monte Carlo simulation is to draw an i.i.d. set of samples

Theorems a.s. consistent

One “tiny” problem…

When p(x) has standard form, e.g. Uniform or Gaussian, it is

However, when this is not the case, we need to introduce more

Sample from distribution p(x) that is only known up

It is easy to sample from q(x) that satisfies p(x) ≤ M q(x), M < ∞

The accepted x(i ) can be shown to be sampled with probability p(x)

It is not always possible to bound p(x)/q(x) with a reasonable

Let us introduce, again, an arbitrary importance proposal distribution

Then we can rewrite I(f) as follows:

Some proposal distributions q(x) will obviously be preferable to others.

Which one should we choose?

Some proposal distributions q(x) will obviously be preferable to others.

High sampling efficiency is achieved when we focus on sampling from p(x)

Importance sampling estimates can be super-efficient:

Homogen Markov chain:

1-Step state transition matrix:

Lemma: The state transition matrix is stochastic:

t-Step state transition matrix:

Transition matrix Transition graph

The stationary distribution might be not unique (e.g. T= identity matrix)

If the probability vector for the initial state is

and, after several iterations (multiplications by T )

The chain has forgotten its past.

If the Markov chain is Irreducible and Aperiodic, then:

It is possible to get to any state from any state.

Aperiodicity: The chain cannot get trapped in cycles.

(where "gcd" is the greatest common divisor)

For example, suppose it is possible to return to the state in

Aperiodicity: The chain cannot get trapped in cycles.

It has stationary distribution, but no limiting distribution!

Definition: reversibility /detailed balance condition:

A sufficient, but not necessary, condition to ensure that a particular π is

irreducible and aperiodic Markov chains

It is also important to design samplers that converge quickly.

π is the left eigenvector of the matrix T with eigenvalue 1.

The Perron-Frobenius theorem from linear algebra tells us that the

The second largest eigenvalue, therefore, determines the rate of

Generate samples from the following discrete distribution:

The main idea is to construct a time-reversible Markov chain

Later we will discuss what to do when the distribution is continuous 39

No rejection: we use all X1, X2,… Xn, … 40

Max 2d possible movements at each grid point (linear in d)

It is not rejection sampling, we use all the samples! 45

The same algorithm can be used for

In this case, the state space is continuous.

Bimodal target distribution: p(x) ∝ 0.3 exp(−0.2x2) +0.7 exp(−0.2(x − 10)2)

Generate uniformly distributed samples from the set of permutations

Let n=3, and a=12: {1,2,3}: 1+4+9=14

Definition: Two permutations are neighbors, if one results from

(1,2,3,4) and (1,2,4,3) are neighbors.

That is what we wanted!

Suppose that we can generate samples from

Our goal is to generate samples from

Observation: By construction, this HM sampler would sample from

We will prove that this HM sampler = Gibbs sampler. 54

Introduce the relationship of neighboring vectors:

Use the Hastings- Metropolis sampling:

With prob. α accept the new state

with prob. (1-α) don't accept and stay

In this special case:

With prob. α=1 accept the new state since

Monte Carlo EM:

Then the integral can be approximated! ☺

You might also like