0% found this document useful (0 votes)

86 views16 pages

MCMC with Caching for Gaussian Processes

This document summarizes research on using temporary mapping and caching to improve the efficiency of Markov chain Monte Carlo (MCMC) methods for approximating distributions like the posterior in Gaussian process regression. It describes using a subset of the data to form an approximating distribution π* that allows mapping proposals to a temporary space to mix faster before mapping back. Experiments on a synthetic dataset show this "mapping" method leads to faster mixing than standard MCMC, as evidenced by shorter autocorrelation times. Ongoing work includes exploring other approximation methods and using "tempered transitions" for the mappings.

Uploaded by

Chunyi Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views16 pages

MCMC with Caching for Gaussian Processes

Uploaded by

Chunyi Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MCMC with Temporary Mapping and Caching with Application on Gaussian Process Regression

Advisor: Professor Radford Neal Chunyi Wang Department of Statistics, University of Toronto

Joint Statistical Meeting August 3rd, 2011

Markov Chain Monte Carlo Methods

We construct a ergodic Markov Chain with transition T (x |x) which leaves the target distribution (x) invariant, i.e. (x)T (x |x)dx = (x ) Metropolis algorithm: propose to move from x to x (according to a proposal distribution S(x |x)), accept the proposal with probability min[1, (x )/(x)]. This satises the detailed balance condition (x)T (x |x) = (x )T (x|x ) and thus the chain (called reversible) will leave the target distribution invariant.

Caching Results for Future Re-use

The evaluation of (x) is typically hard. So we should always save the result of (x) if it might be used in the future: If Metropolis proposal x is rejected, the current state will still be x and therefore (x) is needed for the next update; If a Metropolis proposal x is accepted then the current state will be x and so (x ) is needed for the next update; If the state space is discrete;

MCMC with Temporary Mapping

We can combine three stochastic mappings T , T and T to form the transition T (x |x), as follows: x y y x where x X is the original sample space and y Y is a temporary space. To leave the target distribution invariant these mappings have to satisfy (x)T (y|x)dx = (y) (y)T (y |y)dy = (y ) (y )T (x |y )dy = (x )
T T T

Mapping to a Discretizing Chain

Suppose we have a Markov Chain which leaves a distribution invariant. We can map to a space of realizations of such a chain. The current state x is mapped to a chain with one time step (whose value is x) marked. X y x Y T We dont actually compute everything beforehand, but simulate new states (and save them for future re-use) when needed.

Mapping to a Discretizing Chain - Continued

We then attempt to move the marker along the chain to another state (whose value is x ), with acceptance probability min[1, (x )/ (x ) ]. We can do multiple such updates in this space (x)/ (x) before mapping back to the original space. X x x T y T Y T (Solid line segments are the updates that are actually simulated, while the dashed segments are not).

Gaussian Process Regression: Model

We observe n training cases (x1 , y1 ), ..., (xn , yn ) where xi is a vector of inputs of length p, and yi is the corresponding scalar response, which we assume is a function of the inputs plus some noise: yi = f (xi ) + where
iid i i

N (0, 2 )

In a Gaussian Process Regression model, the prior mean of the function f is 0, and the covariance of the response is Cov(yi , yj ) = k(xi , xj ) + 2 ij

Gaussian Process Regression: Prediction

We wish to predict the response y , for a test case x based on the training [Link] predictive distribution for the response y is Gaussian: E[y |y] = k T C 1 y V ar[y |y] = v k T C 1 k where C is the covariance matrix for the training responses, k is the vector of covariances between y and each of yi , and v is the prior variance of y , [i.e. Cov(y , y )]. To do this in the Bayesian framework, we obtain a random sample from the posterior density for the hyper-parameter :
n 1 (|y) (2) 2 det(C)1 exp y T C 1 y () 2

where () is the prior for .

Complexity for the GP Regression Model

The posterior density is
n 1 (|y) (2) 2 det(C)1 exp y T C 1 y () 2

The time needed to perform the following major computations are (asymptotically, with an implementation-specic constant coecient): C det(C) C 1 T 1 y C y pn2 n3 n3 n2

In practice we compute C (pn2 ), and the Cholesky decomposition of C (n3 ), then we can cheaply obtain det(C) and y T C 1 y.

as an Approximation of : dimension reduction

We wish to nd some such that its easier to compute (than ) while similar in distribution as . Some approximation methods as listed as candidates: Subset of data (SoD): is the posterior given only a subset (of m observations) of (x1 , y1 ), ..., (xn , yn ). Need time proportional to pm2 to compute C , and m3 to invert C . Linear combination of responses: Let y = Ay where A is of rank m. y is also Gaussian, with lower dimension. is the posterior based on the covariance matrix for y , C = ACAT , of rank m. Others: SoR, Bayesian Committe Machine, etc...

as an Approximation of : diagonal plus low rank

C is usually of the form 2 I + C 0 , where C 0 is non-negative denite. If C 0 can be approximated by some lower rank matrix C 0 , then we can reduce the computation by these lemmas (D + U W V T )1 = D1 D1 U (W 1 + V T D1 U )1 V T D1 det(D + U W V T ) = det(W 1 + V T D1 U ) det(W ) det(D) Eigen-method: C = 2 I + Bm B T , where m is the diagonal matrix with eigenvalues 1 2 , ..., m of C on its diagonal, and B is an n m matrix whose columns are the corresponding orthonormal eigenvectors. Need to compute C (pn2 ) and the rst m eigenvalues and eigenvectors of C (mn2 , with a large constant factor). Nystrm methods: C = 2 I + C 0 o [C 0 ]1 C 0 where
(n,m) (m,m) (m,n) 0 C(n,m) is a n m matrix, whose m columns are m randomly 0 selected columns from C 0 . Need to compute C(n,m) (pmn), then nd the Cholesky decomposition of some m m matrix, (m3 ).

Example: Use SoD to form the

We generate a synthetic dataset as follows: y = 3 sin(x2 ) + 2 sin(1.5x + 1) + where x Unif(0, 3) and N (0, 0.52 ). We generated 500 observations as the training set, and another 1000 for the testing set.
We use the a squared exponential covariance function: 102 + 2 exp (x x )2 2 + , 2
2 0 y

Training Set 6

and the priors are log 2 N (3, 32 )

log 2 N (2, 32 ) log 2 N (0, 32 )

4 6 0

0.5

1.5 x

2.5

Example: Use SoD to form the - Predictions

The rst 50 observations are used as the subset to form the to implement the MCMC (with mapping), and compare the results to a Metropolis MCMC. The sample ACFs are adjusted so that they reect the same amount of evaluations of (x).
Predictions 6

0 y 2 4 6

testing cases metropolis mapping 0.5 1 1.5 x 2 2.5 3

8 0

Example: Use SoD to form the - Autocorrelations

Sample ACF Metropolis 1

Sample Autocorrelation

0.5

60 Lag

100

Sample ACF Mapping with SoD 1

Sample Autocorrelation

0.5

15 Lag

Comparison of autocorrelation times: Metropolis Mapping log 2 37.9 8.5 log 2 31.5 7.1 log 2 12.0 1.9

Ongoing Works

Other approximation methods for Among various approximation methods, which one is the best, or which one is better for certain situations Mapping with Tempered Transitions Instead of directly map a state into the temporary space, we borrow the idea of tempered transitions and form a sequence of mappings.

References
1. Neal, R. M. (1998) Constructing Ecient MCMC Methods Using
Temporary Mapping and Caching, Talk at Columbia University, December 2006 2. Neal, R. M. (1998) Regression and Classication Using Gaussian Process Priors Bayesian Statistics 6, pp. 475-501 Oxford University Press 3. Neal, R. M. (2008) Approximate Gaussian Process Regression Using Matrix Approximations and Linear Response Combinations Tech. Report (Draft), Dept. of Statistics, University of Toronto 4. Quionero-Candela, J., Rasmussen, C.E. and Williams, C. K. I. (2007) n Approximation Methods for Gaussian Process Regression Tech. Report MSR-TR-2007-124, Microsoft Research 5. Rasmussen, C. E. and Williams, C. K. I. (2006) Gaussian Processes for Machine Learning, The MIT Press.

Scalable Monte Carlo Methods in Bayesian Learning
No ratings yet
Scalable Monte Carlo Methods in Bayesian Learning
244 pages
Bayesian Time Series Econometrics Guide
No ratings yet
Bayesian Time Series Econometrics Guide
72 pages
Understanding Markov Chain Monte Carlo
No ratings yet
Understanding Markov Chain Monte Carlo
17 pages
MCMC for Approximate Bayesian Inference
No ratings yet
MCMC for Approximate Bayesian Inference
37 pages
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
No ratings yet
Markov Chain Monte Carlo in Practice (W R Gilks, S Richardson, D J Spiegelhalter
485 pages
Convergence of Gibbs Sampler in VARX
No ratings yet
Convergence of Gibbs Sampler in VARX
31 pages
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
No ratings yet
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
487 pages
Bayesian Inference and Sampling Methods
No ratings yet
Bayesian Inference and Sampling Methods
41 pages
Bayesian Unsupervised Learning Advances
No ratings yet
Bayesian Unsupervised Learning Advances
235 pages
Topics in Bayesian Statistics
No ratings yet
Topics in Bayesian Statistics
72 pages
Bayesian Learning and Models Explained
No ratings yet
Bayesian Learning and Models Explained
55 pages
Bayesian Input Selection with MCMC
No ratings yet
Bayesian Input Selection with MCMC
10 pages
Fast MCMC Algorithms for Gaussian Processes
No ratings yet
Fast MCMC Algorithms for Gaussian Processes
23 pages
MCMCglmm GLMM Course Notes
No ratings yet
MCMCglmm GLMM Course Notes
141 pages
sheet_5_sol
No ratings yet
sheet_5_sol
5 pages
Bayesian Computation Methods Overview
No ratings yet
Bayesian Computation Methods Overview
11 pages
Introduction to Probabilistic Graphical Models
No ratings yet
Introduction to Probabilistic Graphical Models
65 pages
MCMC Regression Analysis Lecture Notes
No ratings yet
MCMC Regression Analysis Lecture Notes
13 pages
Monte Carlo Methods for Statistical Inference
No ratings yet
Monte Carlo Methods for Statistical Inference
138 pages
Monte Carlo Sampling Techniques Explained
No ratings yet
Monte Carlo Sampling Techniques Explained
101 pages
Regression and Classification Using GR Priors
No ratings yet
Regression and Classification Using GR Priors
16 pages
Statistics 202C: Sampling Techniques Guide
No ratings yet
Statistics 202C: Sampling Techniques Guide
14 pages
MCMC and Bayesian Statistics Overview
No ratings yet
MCMC and Bayesian Statistics Overview
76 pages
Bayesian Model Selection Without Likelihoods
No ratings yet
Bayesian Model Selection Without Likelihoods
32 pages
MCMC Methods Without Likelihoods
No ratings yet
MCMC Methods Without Likelihoods
5 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
10 pages
Annurev Statistics 022513 115540
No ratings yet
Annurev Statistics 022513 115540
26 pages
Annurev Statistics 031219 041300
No ratings yet
Annurev Statistics 031219 041300
26 pages
The Vessel of Elseland by D.A.N.T.E Instant Download Ebook Testbank Solutions Power Reader Edition
100% (1)
The Vessel of Elseland by D.A.N.T.E Instant Download Ebook Testbank Solutions Power Reader Edition
54 pages
Markov Chain Monte Carlo Techniques
100% (1)
Markov Chain Monte Carlo Techniques
69 pages
MCMC and Bayesian Modeling Overview
No ratings yet
MCMC and Bayesian Modeling Overview
27 pages
Bayesian Learning in Gaussian Process Models
No ratings yet
Bayesian Learning in Gaussian Process Models
9 pages
Gibbs Sampling and MCMC Techniques
No ratings yet
Gibbs Sampling and MCMC Techniques
12 pages
Introduction to Gibbs Sampling in MCMC
100% (1)
Introduction to Gibbs Sampling in MCMC
7 pages
Bayesian Statistics: Key Concepts and Methods
No ratings yet
Bayesian Statistics: Key Concepts and Methods
70 pages
Murphy's ML: Probabilistic Solutions Manual
No ratings yet
Murphy's ML: Probabilistic Solutions Manual
100 pages
MCMCglmm GLMM Course Notes
No ratings yet
MCMCglmm GLMM Course Notes
141 pages
Machine Learning Assignment 1 Overview
No ratings yet
Machine Learning Assignment 1 Overview
3 pages
Bayesian Algorithms for Econometrics
No ratings yet
Bayesian Algorithms for Econometrics
33 pages
Accelerating ABC with ejMCMC and GP
No ratings yet
Accelerating ABC with ejMCMC and GP
33 pages
Monte Carlo Integration Techniques Explained
No ratings yet
Monte Carlo Integration Techniques Explained
30 pages
Introduction to Pattern Recognition and Machine Learning
No ratings yet
Introduction to Pattern Recognition and Machine Learning
35 pages
Bayesian Inference for Nonparanormal Models
No ratings yet
Bayesian Inference for Nonparanormal Models
40 pages
Bayesian Optimization in Machine Learning
No ratings yet
Bayesian Optimization in Machine Learning
84 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Gaussian Processes for Machine Learning
No ratings yet
Gaussian Processes for Machine Learning
9 pages
Learning Probability Distributions in AI
No ratings yet
Learning Probability Distributions in AI
103 pages
Bayesian Computation in Nonlinear Regression
No ratings yet
Bayesian Computation in Nonlinear Regression
33 pages
1
No ratings yet
1
130 pages
Acceptance-Rejection Method in MCMC
No ratings yet
Acceptance-Rejection Method in MCMC
3 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
MCMC Sampling with Approximations
No ratings yet
MCMC Sampling with Approximations
17 pages
Bayesian Neural Networks: VA vs MCMC
No ratings yet
Bayesian Neural Networks: VA vs MCMC
10 pages
MCMC Package for R: Simulation Tools
No ratings yet
MCMC Package for R: Simulation Tools
21 pages
CS6923 Fall 2017 Machine Learning Homework 2
No ratings yet
CS6923 Fall 2017 Machine Learning Homework 2
7 pages
Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little ebook full digital file set
No ratings yet
Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little ebook full digital file set
31 pages
AI Question Bank for GTU 4350705
No ratings yet
AI Question Bank for GTU 4350705
3 pages
Fourier Transform Properties in Signals
No ratings yet
Fourier Transform Properties in Signals
2 pages
Assignment 4 Solution
No ratings yet
Assignment 4 Solution
3 pages
MATLAB Numerical Methods Overview
No ratings yet
MATLAB Numerical Methods Overview
34 pages
Exploring Chern-Simons Theory and QFT
No ratings yet
Exploring Chern-Simons Theory and QFT
1 page
Operations Research Course Overview
No ratings yet
Operations Research Course Overview
2 pages
Statistical Analysis of Market Graphs
No ratings yet
Statistical Analysis of Market Graphs
13 pages
Gravity Model of Trade Analysis
No ratings yet
Gravity Model of Trade Analysis
20 pages
CMU Database Systems Homework #3
No ratings yet
CMU Database Systems Homework #3
4 pages
Descriptive Statistics Analysis Report
No ratings yet
Descriptive Statistics Analysis Report
5 pages
(Water Science and Technology Library 30) Vijay P. Singh (Auth.) - Entropy-Based Parameter Estimation in Hydrology-Springer Netherlands (1998)
No ratings yet
(Water Science and Technology Library 30) Vijay P. Singh (Auth.) - Entropy-Based Parameter Estimation in Hydrology-Springer Netherlands (1998)
381 pages
Radial Basis Function Networks Overview
No ratings yet
Radial Basis Function Networks Overview
8 pages
Decision Making (IODM 1st 2025 2026)
No ratings yet
Decision Making (IODM 1st 2025 2026)
36 pages
Control Systems Objective Question Bank
No ratings yet
Control Systems Objective Question Bank
9 pages
Probabilistic Analysis of Hiring Costs
No ratings yet
Probabilistic Analysis of Hiring Costs
3 pages
Comprehensive Cryptography Study Guide
No ratings yet
Comprehensive Cryptography Study Guide
13 pages
Maxwell's Thermodynamic Relations Explained
No ratings yet
Maxwell's Thermodynamic Relations Explained
12 pages
Wavelet Local Multiple Correlation
No ratings yet
Wavelet Local Multiple Correlation
15 pages
Machine Learning in Particle Physics
No ratings yet
Machine Learning in Particle Physics
1 page
Digital Signal Processing Problems
No ratings yet
Digital Signal Processing Problems
2 pages
Cost Analysis for Alta Production, Inc.
No ratings yet
Cost Analysis for Alta Production, Inc.
2 pages
Cramer's Rule for Linear Equations
No ratings yet
Cramer's Rule for Linear Equations
8 pages
Introduction to Cryptography Basics
No ratings yet
Introduction to Cryptography Basics
33 pages
AI Techniques for Early Heart Disease Prediction
No ratings yet
AI Techniques for Early Heart Disease Prediction
4 pages
Power System Security & State Estimation
No ratings yet
Power System Security & State Estimation
9 pages
Class 9 AI Preboard 50 Marks Paper
No ratings yet
Class 9 AI Preboard 50 Marks Paper
2 pages
Quant Finance Difficulty
No ratings yet
Quant Finance Difficulty
12 pages
Mod B Predictions in Business Intelligence
No ratings yet
Mod B Predictions in Business Intelligence
26 pages
Data Analysis and Visualization Techniques
No ratings yet
Data Analysis and Visualization Techniques
2 pages
AI & ML Lab Experiments Overview 2024-25
No ratings yet
AI & ML Lab Experiments Overview 2024-25
100 pages