0% found this document useful (0 votes)

88 views29 pages

State-Space Models in HMMs and Kalman Filters

1. State-space models are probabilistic graphical models used to model sequential data using hidden states. 2. Hidden Markov models (HMMs) are state-space models where the hidden states form a Markov chain and observations depend only on the current state. 3. The forward-backward and Viterbi algorithms can be used for inference in HMMs to compute state probabilities and find the most likely state sequence. The Baum-Welch algorithm is used for learning model parameters.

Uploaded by

Rohit Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views29 pages

State-Space Models in HMMs and Kalman Filters

Uploaded by

Rohit Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Probabilis*c

Graphical Models

Lecture 8: State-Space Models

Based on slides by Richard Zemel

Sequential data
Turn a@en*on to sequen*al data
–  Time-series: stock market, speech, video analysis
–  Ordered: text, gene

Simple example: Dealer A is fair; Dealer B is not C=h

C=t C=t
Process (let Z be dealer A or B): A B
Loop until tired:
1.  Flip coin C, use it to decide whether to switch dealer C=h
2.  Chosen dealer rolls die, record result

Fully observable formulation: data is sequence of dealer selections

AAAABBBBAABBBBBBBAAAAABBBBB
Simple example: Markov model

•  If underlying process unknown, can construct model to predict

next le@er in sequence
•  In general, product rule expresses joint distribu*on for sequence

•  First-order Markov chain: each observa*on independent of all

previous observa*ons except most recent

•  ML parameter es*mates are easy

•  Each pair of outputs is a training case; in this example:
P(Xt =B| Xt-1=A) = #[t s.t. Xt = B, Xt-1 = A] / #[t s.t. Xt-1 = A]
Higher-order Markov models
•  Consider example of text
•  Can capture some regularities with bigrams (e.g., q nearly
always followed by u, very rarely by j) – probability of a
le?er given just its preceding le?er
•  But probability of a le?er depends on more than just
previous le?er
•  Can formulate as second-order Markov model (trigram
model)

•  Need to take care: many counts may be zero in training
dataset

Character
recognition:
Transition
probabilities
Hidden Markov model (HMM)
•  Return to casino example -- now imagine that do not observe
ABBAA, but instead just sequence of die rolls (1-6)

•  Genera*ve process:
Loop un*l *red:
1. Flip coin C (Z = A or B)
2. Chosen dealer rolls die, record result X

Z is now hidden state variable – 1st order Markov chain generates
state sequence (path), governed by transi2on matrix A

Observa*ons governed by emission probabili2es, convert state path
into sequence of observable symbols or vectors:
Relationship to other models
•  Can think of HMM as:
–  Markov chain with stochastic measurements

–  Mixture model with states coupled across time

•  Hidden state is 1st-order Markov, but output not Markov of any

order
•  Future is independent of past give present, but conditioning on
observations couples hidden states
Character Recognition Example

Which le?ers are these?

HMM: Character Recognition Example

Context ma?ers: recognition easier based on sequence of

characters
How to apply HMM to this character string?
Main elements: states? emission, transition probabilities?

HMM: Semantics

z1 = {a,..., z} z2 = {a,..., z} z3 = {a,..., z} z4 = {a,..., z} z5 = {a,..., z}

x1 = x2 = x3 = x4 = x5 =

Need 3 distributions:
1.  Initial state: P(Z1)
2.  Transition model: P(Zt|Zt-1)
3.  Observation model (emission probabilities): P(Xt|Zt)

HMM: Main tasks

•  Joint probabilities of hidden states and outputs:

T
P(x, z) = P( z1 ) P( x1 | z1 )∏t =2 P( zt | zt −1 ) P( xt | zt )
•  Three problems
1.  Computing probability of observed sequence: forward-
backward algorithm [good for recognition]
2.  Infer most likely hidden state sequence: Viterbi algorithm
[useful for interpretation]
3.  Learning parameters: Baum-Welch algorithm (version of
EM)
Fully observed HMM
Learning fully observed HMM (observe both X and Z) is easy:
1.  Initial state: P(Z1) – proportion of words start with each
le?er
2.  Transition model: P(Zt|Zt-1) – proportion of times a given
le?er follows another (bigram statistics)
3.  Observation model (emission probabilities): P(Xt|Zt) – how
often particular image represents speciﬁc character, relative
to all images

But still have to do inference at test time: work out states given
observations

HMMs often used where hidden states are identiﬁed: words in
speech recognition; activity recognition; spatial position of
rat; genes; POS tagging
HMM: Inference tasks
Important to infer distributions over hidden states:
§  If states are interpretable, infer interpretations
§  Also essential for learning

Can break down hidden state inference tasks to solve (each

= P( X t | Z t )∑ z P( Z t | zt −1 ) P( zt −1 | X 1:t −1 )
t −1

Filtering: for online estimation of state

Pr(state) =observation probability * transition-model

Smoothing: post hoc estimation of state (similar computation)
Prediction is ﬁltering, but with no new evidence:
P( Z t + k | X 1:t ) = ∑ P( Z t +k | zt + k −1 ) P( zt + k −1 | X 1:t )
zt +k −1
HMM: Maximum likelihood
Having observed some dataset, use ML to learn the parameters
of the HMM

Need to marginalize over the latent variables:
X
p(X|✓) = p(X, Z|✓)
Z
Diﬃcult:
–  does not factorize over time steps
–  involves generalization of a mixture model

Approach: utilize EM for learning

Focus ﬁrst on how to do inference eﬃciently
Forward recursion (α)

Clever recursion can compute huge sum eﬃciently

Backward recursion (β)

α(zt,j): total inﬂow of prob. to node (t,j)

β(zt,j): total outﬂow of prob. from node (t,j)
Forward-Backward algorithm
Estimate hidden state given observations

One forward pass to compute all α(zt,i), one backward

pass to compute all β(zt,i): total cost O(K2T)
Can compute likelihood at any time t based on α (zt,j)
and β(zt,j)
Baum-Welch training algorithm: Summary

Can estimate HMM parameters using maximum

likelihood
If state path known, then parameter estimation easy
Instead must estimate states, update parameters, re-
estimate states, etc. -- Baum-Welch (form of EM)
State estimation via forward-backward, also need
transition statistics (see next slide)
Update parameters (transition matrix A, emission
parameters) to maximize likelihood
Transition statistics
Need statistics for adjacent time-steps:

Expected number of transitions from state i to state j that

begin at time t-1, given the observations
Can be computed with the same α(zt,j) and β(zt,j)
recursions
Parameter updates
Initial state distribution: expected counts in state k at time 1

Estimate transition probabilities:

Emission probabilities are expected number of times observe

symbol in particular state:
Using HMMs for recognition
Can train an HMM to classify a sequence:
1. train a separate HMM per class
2. evaluate prob. of unlabelled sequence under each
HMM
3. classify: HMM with highest likelihood

Assumes can solve two problems:
1. estimate model parameters given some training
sequences (we can ﬁnd local maximum of
parameter space near initial position)
2. given model, can evaluate prob. of a sequence
Probability of observed sequence
Want to determine if given observation sequence is likely
under the model (for learning, or recognition)

Compute marginals to evaluate prob. of observed seq.: sum
across all paths of joint prob. of observed outputs and state

Take advantage of factorization to avoid exp. cost (#paths = KT)

Variants on basic HMM
•  Input-output HMM
–  Have additional observed variables U

•  Semi-Markov HMM
–  Improve model of state duration

•  Autoregressive HMM
–  Allow observations to depend on some previous
observations directly

•  Factorial HMM
–  Expand dim. of latent state
State Space Models
Instead of discrete latent state of the HMM, model Z as a
continuous latent variable
Standard formulation: linear-Gaussian (LDS), with (hidden
state Z, observation Y, other variables U)
–  Transition model is linear
zt = A t zt 1 + Bt u t + ✏ t
–  with Gaussian noise
✏t = N (0, Qt )
–  Observation model is linear
y t = C t zt + D t u t + t
–  with Gaussian noise
t = N (0, Rt )

Model parameters typically independent of time: stationary
Kalman Filter
Algorithm for ﬁltering in linear-Gaussian state space model
Everything is Gaussian, so can compute updates exactly

Dynamics update: predict next belief state
Z
p(zt |y1:t 1 , u1:t ) = N (zt |At zt 1 + Bt ut , Qt )N (zt 1 |µt 1 , ⌃t 1 )dzt 1

= N (zt |µt|t 1 , ⌃t|t 1 )

µt|t 1 = At µt 1 + Bt u t
T
⌃t|t 1 = A t ⌃t 1 At + Qt
Kalman Filter: Measurement Update
Key step: update hidden state given new measurement:
p(zt |y1:t , u1:t ) / p(yt |zt , ut )p(zt |y1:t 1 , u1:t )

First term a bit complicated, but can apply various identities
(such as the matrix inversion lemma, Bayes rule), obtain:
p(z |y , u ) = N (z |µ , ⌃ )
t 1:t 1:t t t t

The mean update depends on Kalman gain matrix K, and the
residual or innovation r = y – E[y]
µt = µt|t 1 + Kt rt
Kt = ⌃t|t T
1 Ct S t
1

ŷ = E[yt |y1:t 1 , ut ] = Ct µt|t + Dt ut

1
T
St = cov[rt |y1:t 1 , u1:t ] = Ct ⌃t|t 1 t + Rt
C
Kalman Filter: Extensions
Learning similar to HMM
–  Need to solve inference problem – local posterior marginals
for latent variables
–  Use Kalman smoothing instead of forward-backward in E
step, re-derive updates in M step

Many extensions and elaborations

–  Non-linear models: extended KF, unscented KF
–  Non-Gaussian noise
–  More general posteriors (multi-modal, discrete, etc.)
–  Large systems with sparse structure (sparse information
ﬁlter)
Viterbi decoding
How to choose single best path through state space?
Choose state with largest probability at each time t: maximize
expected number of correct states
But this may not be the best path, with highest likelihood of
generating the data

To ﬁnd best path – Viterbi decoding, form of dynamic
programming (forward-backward algorithm)
Same recursions, but replace ∑ with max (“brace” example)
Forward: retain best path into each node at time t
Backward: retrace path back from state where most
probable path ends

Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
5 pages
Hidden Markov Models in Machine Learning
No ratings yet
Hidden Markov Models in Machine Learning
69 pages
Understanding Hidden Markov Models in NLP
No ratings yet
Understanding Hidden Markov Models in NLP
107 pages
Hidden Markov Models Lab Guide
No ratings yet
Hidden Markov Models Lab Guide
16 pages
HMMs: Pros and Cons in Applications
No ratings yet
HMMs: Pros and Cons in Applications
34 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
26 pages
Hidden Markov Models Overview
No ratings yet
Hidden Markov Models Overview
30 pages
Hidden Markov Models in Speech Recognition
No ratings yet
Hidden Markov Models in Speech Recognition
35 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
10 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
51 pages
HMM and Baum-Welch Algorithm Overview
No ratings yet
HMM and Baum-Welch Algorithm Overview
78 pages
Advanced AI Lab Manual for B.E. AI & DS
No ratings yet
Advanced AI Lab Manual for B.E. AI & DS
20 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
38 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
15 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
56 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
5 pages
Hidden Markov Models Explained
No ratings yet
Hidden Markov Models Explained
8 pages
Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
32 pages
Hidden Markov Models Explained
No ratings yet
Hidden Markov Models Explained
7 pages
Overview of Hidden Markov Models
No ratings yet
Overview of Hidden Markov Models
23 pages
Знімок екрана 2022-10-31 о 18.56.30
No ratings yet
Знімок екрана 2022-10-31 о 18.56.30
96 pages
Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
36 pages
MLE for Hidden Markov Models Explained
No ratings yet
MLE for Hidden Markov Models Explained
2 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
55 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
35 pages
Hidden Markov Models in NLP
No ratings yet
Hidden Markov Models in NLP
33 pages
CUDA Baum-Welch for HMMs
No ratings yet
CUDA Baum-Welch for HMMs
8 pages
Hidden Markov Model Overview
No ratings yet
Hidden Markov Model Overview
6 pages
Understanding Hidden Markov Models in ML
No ratings yet
Understanding Hidden Markov Models in ML
2 pages
AI Guest Lectures and HMM Overview
No ratings yet
AI Guest Lectures and HMM Overview
56 pages
Markov Chain Models in Machine Learning
No ratings yet
Markov Chain Models in Machine Learning
28 pages
AI Learning Algorithms Overview
No ratings yet
AI Learning Algorithms Overview
67 pages
Understanding Markov Chains and HMMs
No ratings yet
Understanding Markov Chains and HMMs
32 pages
EM Algorithm in Pattern Recognition
No ratings yet
EM Algorithm in Pattern Recognition
13 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
6 pages
Hidden Markov Model Overview and Algorithms
No ratings yet
Hidden Markov Model Overview and Algorithms
6 pages
Hidden Markov Models in NLP
No ratings yet
Hidden Markov Models in NLP
60 pages
HMM Tutorial for Engineers
No ratings yet
HMM Tutorial for Engineers
15 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
26 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
17 pages
Markov and Hidden Markov Models Explained
No ratings yet
Markov and Hidden Markov Models Explained
54 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
19 pages
Hidden Markov Model Applications Review
No ratings yet
Hidden Markov Model Applications Review
8 pages
HMMs for POS Tagging and Sequence Tasks
No ratings yet
HMMs for POS Tagging and Sequence Tasks
37 pages
Algorithms - Hidden Markov Models
No ratings yet
Algorithms - Hidden Markov Models
7 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
15 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
59 pages
Efficient HMM Training Methods Explained
No ratings yet
Efficient HMM Training Methods Explained
7 pages
Hidden Markov Models & Bayesian Networks
No ratings yet
Hidden Markov Models & Bayesian Networks
34 pages
AI Models in Engineering Analysis
No ratings yet
AI Models in Engineering Analysis
11 pages
Baum-Welch Algorithm in HMMs
No ratings yet
Baum-Welch Algorithm in HMMs
24 pages
Hidden Markov Models in AI
No ratings yet
Hidden Markov Models in AI
79 pages
HMMs and Forward-Backward Algorithm
No ratings yet
HMMs and Forward-Backward Algorithm
7 pages
Machine Learning Concepts and Models
No ratings yet
Machine Learning Concepts and Models
7 pages
Probabilistic Reasoning in AI Models
No ratings yet
Probabilistic Reasoning in AI Models
38 pages
Essential Books on War and Politics
No ratings yet
Essential Books on War and Politics
4 pages
Deep Generative Image Models with GANs
No ratings yet
Deep Generative Image Models with GANs
10 pages
JNU Admission Eligibility 2015-16
No ratings yet
JNU Admission Eligibility 2015-16
2 pages
Emirates' Competitive Advantages Explained
No ratings yet
Emirates' Competitive Advantages Explained
3 pages
GIPE Hostel Admission Process 2014
No ratings yet
GIPE Hostel Admission Process 2014
3 pages
Quadratic Equations Problem Set
No ratings yet
Quadratic Equations Problem Set
2 pages
Emirates' Competitive Advantages Explained
No ratings yet
Emirates' Competitive Advantages Explained
2 pages
Macroeconomic Theory: Diﬀerence Equations
No ratings yet
Macroeconomic Theory: Diﬀerence Equations
18 pages
IDBI Bank: Objectives and Functions
0% (1)
IDBI Bank: Objectives and Functions
2 pages
Bearish Harami-29 October 2013 The Second Bodys Open Price Is Less Than Previous Day and The First Body Is Green and The Second Body Red
No ratings yet
Bearish Harami-29 October 2013 The Second Bodys Open Price Is Less Than Previous Day and The First Body Is Green and The Second Body Red
1 page
Innovations Driving India's Economic Growth
No ratings yet
Innovations Driving India's Economic Growth
2 pages
Wilson vs. White on Public Administration
No ratings yet
Wilson vs. White on Public Administration
2 pages
Science Assessment for Teachers
No ratings yet
Science Assessment for Teachers
6 pages
Dementia Care Strategies in Cavite
100% (1)
Dementia Care Strategies in Cavite
13 pages
Importance of Communication in Life
No ratings yet
Importance of Communication in Life
11 pages
Teaching Strategies at PCNHS: Impact on Students
100% (1)
Teaching Strategies at PCNHS: Impact on Students
21 pages
CBT Thought Record Worksheet
No ratings yet
CBT Thought Record Worksheet
2 pages
Rhetorical Analysis of Filipino Beauty Norms
No ratings yet
Rhetorical Analysis of Filipino Beauty Norms
3 pages
Business Research Methodology Overview
No ratings yet
Business Research Methodology Overview
23 pages
Data Analytics Life Cycle Insights
No ratings yet
Data Analytics Life Cycle Insights
41 pages
UET Undergraduate Prospectus 2025
No ratings yet
UET Undergraduate Prospectus 2025
206 pages
Artikel
No ratings yet
Artikel
5 pages
Heart Disease Prediction Using XGBoost
No ratings yet
Heart Disease Prediction Using XGBoost
7 pages
Statistics Tutorial Overview and Insights
No ratings yet
Statistics Tutorial Overview and Insights
2 pages
Syllabus for Organizational Psychology Course
No ratings yet
Syllabus for Organizational Psychology Course
3 pages
Evaluation of NGO Project Implementation
No ratings yet
Evaluation of NGO Project Implementation
107 pages
Introduction to AI and Ethics Course
No ratings yet
Introduction to AI and Ethics Course
8 pages
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
50% (2)
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
18 pages
The Influence of Korean Dramas On Romantic Communi
No ratings yet
The Influence of Korean Dramas On Romantic Communi
5 pages
Prospectus 2020
No ratings yet
Prospectus 2020
289 pages
Data Science Session Report 2024-25
No ratings yet
Data Science Session Report 2024-25
5 pages
From Promise To Practice Towards The Realisation
No ratings yet
From Promise To Practice Towards The Realisation
12 pages
Fostering a Love for Reading in Children
No ratings yet
Fostering a Love for Reading in Children
20 pages
Leadership and Decision Making Essentials
100% (1)
Leadership and Decision Making Essentials
24 pages
Dispute Resolution & Crisis Management Overview
85% (13)
Dispute Resolution & Crisis Management Overview
15 pages
Peer Relationships Impact on Adolescent Mental Health
No ratings yet
Peer Relationships Impact on Adolescent Mental Health
6 pages
Shaping Singapore Through Urban Design
No ratings yet
Shaping Singapore Through Urban Design
103 pages
Factors Behind Chicken Feed Shortage
No ratings yet
Factors Behind Chicken Feed Shortage
11 pages
Understanding Big Data in Data Science
No ratings yet
Understanding Big Data in Data Science
3 pages
Kindergarten Addition Lesson Plan
No ratings yet
Kindergarten Addition Lesson Plan
5 pages
Performans-Iş Tatmii-Ni
No ratings yet
Performans-Iş Tatmii-Ni
88 pages

State-Space Models in HMMs and Kalman Filters

Uploaded by

State-Space Models in HMMs and Kalman Filters

Uploaded by

Probabilis*c

Simple example: Dealer A is fair; Dealer B is not C=h

Fully observable formulation: data is sequence of dealer selections

• If underlying process unknown, can construct model to predict

• First-order Markov chain: each observa*on independent of all

• ML parameter es*mates are easy

– Mixture model with states coupled across time

• Hidden state is 1st-order Markov, but output not Markov of any

Which le?ers are these?

Context ma?ers: recognition easier based on sequence of

z1 = {a,..., z} z2 = {a,..., z} z3 = {a,..., z} z4 = {a,..., z} z5 = {a,..., z}

• Joint probabilities of hidden states and outputs:

Can break down hidden state inference tasks to solve (each

Filtering: for online estimation of state

Approach: utilize EM for learning

Clever recursion can compute huge sum eﬃciently

α(zt,j): total inﬂow of prob. to node (t,j)

One forward pass to compute all α(zt,i), one backward

Can estimate HMM parameters using maximum

Expected number of transitions from state i to state j that

Estimate transition probabilities:

Emission probabilities are expected number of times observe

Take advantage of factorization to avoid exp. cost (#paths = KT)

= N (zt |µt|t 1 , ⌃t|t 1 )

ŷ = E[yt |y1:t 1 , ut ] = Ct µt|t + Dt ut

Many extensions and elaborations

You might also like

•  If underlying process unknown, can construct model to predict

•  First-order Markov chain: each observa*on independent of all

•  ML parameter es*mates are easy

–  Mixture model with states coupled across time

•  Hidden state is 1st-order Markov, but output not Markov of any

•  Joint probabilities of hidden states and outputs: