0% found this document useful (0 votes)

34 views42 pages

Sequence Labeling with HMMs in NLP

Uploaded by

1162407364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views42 pages

Sequence Labeling with HMMs in NLP

Uploaded by

1162407364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Natural Language

Processing
Lecture 4: Sequence Labeling with Hidden Markov
Models. Part-of-Speech Tagging.

11/6/2020

COMS W4705
Yassine Benajiba
Garden-Path Sentences
• The horse raced past the barn.

• The horse raced past the barn fell.

• The old dog the footsteps of the young.

Garden-Path Sentences
• Why does this happen?

past tense verb

VBD ???
The horse raced past the barn fell

• raced can be a past tense verb or a a past participle

(indicating passive voice).

• The verb interpretation is more likely before fell is read.

Garden-Path Sentences
• Why does this happen?

past participle
VBN VBD
[The horse raced past the barn] fell
NP

• raced can be a past tense verb or a a past participle

(indicating passive voice).

• Once fell is read, the verb interpretation is impossible.

Garden-Path Sentences
• Why does this happen?

adjective
JJ NN
[The old dog] [the footsteps of the young]
NP NP

• dog can be a noun or a verb (plural, present tense)

Garden-Path Sentences
• Why does this happen?

NN VB
[The old] dog [the footsteps of the young]
NP NP

• dog can be a noun or a verb (plural, present tense)

Parts-of-Speech
• Classes of words that behave alike:
• Appear in similar contexts.
• Perform a similar grammatical function in the sentence.
• Undergo similar morphological transformations.

• ~9 traditional parts-of-speech:
• noun, pronoun, determiner, adjective, verb, adverb,
preposition, conjunction, interjection
Syntactic Ambiguities and
Parts-of-Speech
N / V? N / V? V / Preposition
• Time flies like an arrow.
Syntactic Ambiguities and
Parts-of-Speech
N N V
• [Time flies] like an arrow.
NP
Why do we need P.O.S.?
• Interacts with most levels of linguistic representation.

• Speech processing:
• object, object
• content, content

• Syntactic parsing
• …
• P.O.S. tag-set should contain morphological and maybe syntactic
information.
Penn Treebank Tagset
CC Coordinating conjunction PRP$ Possessive pronoun

CD Cardinal number RB Adverb

DT Determiner RBR Adverb, comparative

EX Existential there RBS Adverb, superlative

FW Foreign word RP Particle

IN Preposition or subordinating conjunction SYM Symbol

JJ Adjective TO to

JJR Adjective, comparative UH Interjection

JJS Adjective, superlative VB Verb, base form

LS List item marker VBD Verb, past tense

MD Modal VBG Verb, gerund or present participle

NN Noun, singular or mass VBN Verb, past participle

NNS Noun, plural VBP Verb, non-3rd person singular present

NNP Proper noun, singular VBZ Verb, 3rd person singular present

NNPS Proper noun, plural WP Wh-pronoun

PDT Predeterminer WP$ Possessive wh-pronoun

POS Possessive ending WRB Wh-adverb

PRP Personal pronoun plus punctuation symbols

P.O.S. Tagsets
• Tagset is language specific.

• Some language capture more morphological information

which should be reflected in the tag set.

• “Universal Part Of Speech Tags?”

• Petrov et al. 2011: Mapping of 25 language specific tag-

sets to a common set of 12 universal tags
Part-of-Speech Tagging
• Goal: Assign a part-of-speech label to each word in a
sentence.

DT NN VBD DT NNS IN DT NN .
the koala put the keys on the table .

• This is an example of a sequence labeling task.

• Think of this as a translation task from a sequence of

words (w1, w2, …, wn) ∈ V*, to a sequence of tags
(t1, t2, …, tn) ∈T*.
Determining Part-of-Speech
• A blue seat / A child seat: noun or adj?

• Syntactic tests:

• A very blue seat • *A very child seat

• This seat is blue • *This seat is child

• Morphological Tests

• bluer • *childer
Determining Part-of-Speech
• Preposition or Particle?

• He threw out the garbage.

out is a particle
He threw the garbage out.

• He threw the garbage out the door.

out is a preposition
*He threw the garbage the door out
Part-of-Speech Tagging
• Goal: Translate from a sequence of words
(w1, w2, …, wn) ∈ V*, to a sequence of tags
( t1, t2, …, tn ) ∈ T*.

• NLP is full of translation problems from one structure to

another. Basic solution:

• For each translation step:

1. Construct search space of possible translations.

2. Find best paths through this space (decoding) according

to some performance measure.
Bayesian Inference for
Sequence Labeling
• Recall Bayesian Inference (Generative Models): Given
some observation, infer the value of some hidden variable.
(see Naive Bayes’)

• We can apply this approach to sequence labeling:

• Assume each word wi in the observed sequence

(w1, w2, …, wn) ∈ V* was generated by some hidden
variable ti.

• Infer the most likely sequence of hidden variables given

the sequence of observed words.
Noisy Channel Model
“NN VBZ IN DT NN”
P(tags)

P(words | tags)

“time flies like an arrow”

• Goal: figure out what the original input to the the channel
was. Use Bayes’ rule:

• This model is used widely (speech recognition, MT)

Hidden Markov Models (HMMs)
• Generative (Bayesian) probability model.
Observations: sequences of words.
Hidden states: sequence of part-of-speech labels.

START NN VBZ IN DT NN

time flies like an arrow

• Hidden sequence is generated by an n-gram language

model (typically a bi-gram model)

t0 = START
Markov Chains
start
0.1
0.7 0.2
0.4
0.6
1.0 0.2
DT NNZ IN VBZ
0.2
0.3
0.2
0.7 0.4

• A Markov chain is a sequence of random variables X1, X2, …

• The domain of these variables is a set of states.
• Markov assumption: Next state depends only on current state.

• This is a special case of a weighted finite state automaton (WFSA).

Hidden Markov Models (HMMs)
• There are two types of probabilities:
Transition probabilities and Emission Probabilities.

transition probabilities
start t1 t2 t3

emission probabilities

w1 w2 w3
Important Tasks on HMMs
• Decoding: Given a sequence of words, find the most likely
probability sequence.
(Bayesian inference using Viterbi algorithm).

• Evaluation: Given a sequence of words, find the

total probability for this word sequence given an HMM.
Note that we can view the HMM as another type of language
model. (Forward algorithm)

• Training: Estimate emission and transition probabilities from

training data. (MLE, Forward-Backward a.k.a Baum-Welch
algorithm)
Decoding HMMs
VBZ VBZ VBZ VBZ VBZ

IN IN IN IN P

NN NN NN NN NN

DT DT DT DT DT

time flies like an arrow

Goal: Find the path with the highest total probability (given the words)

There are dn paths for n words and d tags.

Viterbi Algorithm
• Input: Sequence of observed words w1, …, wn

• Create a table π, such that each entry π[k,t] contains the score of the
highest-probability sequence ending in tag t at time k.

• initialize π[0,start]=1.0 and π[0,t]=0.0 for all tags t∈T.

• for k=1 to n:

• for t ∈ T:
emission probability

•
• return transition probability
Emission Probabilities
• P(time | VB) = 0.2
P(flies | VB) = 0.3
P(like | VB) = 0.5

• P(time | NN) = 0.3

P(flies | NN) = 0.2
P(arrow | NN) = 0.5

• P(like | P) = 1.0

• P(an | DT) = 1.0

• P(time | VB) = 0.2
P(flies | VB) = 0.3
P(like | VB) = 0.5

Viterbi Algorithm
P(time | NN) = 0.3
P(flies | NN) = 0.2
P(arrow | NN) = 0.5

• P(like | P) = 1.0

• P(an | DT) = 1.0

.1 x .2 = .02VBZ VBZ VBZ VBZ VBZ

0 IN IN IN IN IN

.2 x .3 = .06 NN NN NN NN NN

0 DT DT DT DT DT

time flies like an arrow

• Idea: Because of the Markov assumption, we only need
the probabilities for Xn to compute the probabilities for Xn+1.
This suggests a dynamic programming algorithm.
• P(time | VB) = 0.2
P(flies | VB) = 0.3
P(like | VB) = 0.5

time flies like an arrow

Viterbi Algorithm
P(time | NN) = 0.3
P(flies | NN) = 0.2
P(arrow | NN) = 0.5

• P(like | P) = 1.0

• P(an | DT) = 1.0

• initialize π[0,start]=1.0 and π[0,t]=0.0 for all tags t∈T.

• for k=1 to n:

• for t ∈ T:
emission probability

•
• return transition probability
Trigram Language Model
• Instead of using a unigram context , use a bigram
context .

• Think of this as having states that represent pairs of tags.

• So the HMM probability for a given tag and word sequence

is:

• Need to handle data sparseness when estimating transition

probabilities (for example using backoff or linear
interpolation)
More POS tagging tricks
• It is also often useful in practice to add an end-of-sentence marker (just like we
did for n-gram language models).

where t-1 = t0 = START and tn+1 = STOP.

• Another useful trick is to replace words with “pseudo words” representing an

entire class.

• For example: replace {“01”,”85”,”90”,…} with twoDigitNumber

replace {“1985”,”2018”,…} with fourDigitNumber
replace {“1”,”1.0”,”234.3” …} with otherNum
replace {“IBM”, “DNC”, …} with allCaps etc.

Using a smoothed trigram HMM model with these tricks, we can build a tagger that is
close to the state-of-the art (~97% accuracy on the Penn Treebank).
HMMs as Language Models
• We can also use an HMM as language models (language
generation, MT, …), i.e. evaluate for a
given sentence.
What is the advantage over a plain word n-gram model?

• Problem: There are many tag-sequences that could have

generated w1, … wn.

• This is an example of spurious ambiguity.

• Need to compute:
Forward Algorithm
• Input: Sequence of observed words w1, …, wn

• Create a table π, such that each entry π[k,t] contains the score of the
highest-probability sequence ending in tag t at time k.

• initialize π[0,start]=1.0 and π[0,t]=0.0 for all tags t∈T.

• for k=1 to n:

• for t ∈ T:

•
• return
Named Entity Recognition as
Sequence Labeling
• Use 3 tags:
• O - outside of named entity
• I - inside named entity
• B - first word (beginning) of named entity

O O B I O
… identification of tetronic acid in …

• Other encodings are possible (for example, NE-type

specific)
• This can also be used for phrase chunking.

HMM for Part-of-Speech Tagging
No ratings yet
HMM for Part-of-Speech Tagging
38 pages
NLP Machine Learning: Sequence Labeling
No ratings yet
NLP Machine Learning: Sequence Labeling
37 pages
Part of Speech Tagging Overview
No ratings yet
Part of Speech Tagging Overview
33 pages
Understanding POS Tagging with HMM
No ratings yet
Understanding POS Tagging with HMM
52 pages
Part of Speech Tagging with HMM
No ratings yet
Part of Speech Tagging with HMM
67 pages
Part of Speech Tagging in NLP
No ratings yet
Part of Speech Tagging in NLP
57 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Optimization Techniques in NLP
No ratings yet
Optimization Techniques in NLP
23 pages
Part of Speech Tagging Overview
No ratings yet
Part of Speech Tagging Overview
36 pages
Probabilistic and Statistical NLP Approaches
No ratings yet
Probabilistic and Statistical NLP Approaches
24 pages
EM Algorithm in NLP Explained
No ratings yet
EM Algorithm in NLP Explained
67 pages
HMM and CRF in NLP Tagging
No ratings yet
HMM and CRF in NLP Tagging
93 pages
HMM for Part of Speech Tagging
No ratings yet
HMM for Part of Speech Tagging
59 pages
Machine Learning in Natural Language Processing
No ratings yet
Machine Learning in Natural Language Processing
28 pages
HMM-Based POS Tagging Techniques
No ratings yet
HMM-Based POS Tagging Techniques
11 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
5 pages
Part-of-Speech Tagging Overview
No ratings yet
Part-of-Speech Tagging Overview
86 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
56 pages
Techniques for POS Tagging
No ratings yet
Techniques for POS Tagging
12 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
48 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
Part-of-Speech Tagging Overview
No ratings yet
Part-of-Speech Tagging Overview
84 pages
Viterbi Algorithm for POS Tagging in Python
No ratings yet
Viterbi Algorithm for POS Tagging in Python
77 pages
AI Applications in Natural Language Processing
No ratings yet
AI Applications in Natural Language Processing
65 pages
Forward-Backward Algorithm & HMM for POS Tagging
No ratings yet
Forward-Backward Algorithm & HMM for POS Tagging
5 pages
Trigram Language Models in NLP
No ratings yet
Trigram Language Models in NLP
19 pages
N-gram vs. Negative Sampling in NLP
No ratings yet
N-gram vs. Negative Sampling in NLP
53 pages
HMM and Viterbi Algorithm Explained
No ratings yet
HMM and Viterbi Algorithm Explained
5 pages
POS Tagging Algorithms Overview
No ratings yet
POS Tagging Algorithms Overview
85 pages
Part of Speech in NLP Techniques
No ratings yet
Part of Speech in NLP Techniques
8 pages
Understanding Proper Nouns in NLP
No ratings yet
Understanding Proper Nouns in NLP
84 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
28 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
N-grams in Statistical Language Models
No ratings yet
N-grams in Statistical Language Models
87 pages
AI Course Overview and Applications
No ratings yet
AI Course Overview and Applications
28 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
13 pages
Sequence Learning in NLP: POS Tagging
No ratings yet
Sequence Learning in NLP: POS Tagging
50 pages
N-grams and Markov Models in NLP
No ratings yet
N-grams and Markov Models in NLP
51 pages
Understanding Language Modeling Basics
No ratings yet
Understanding Language Modeling Basics
69 pages
NLP: Sentiment Analysis and Tokenization
No ratings yet
NLP: Sentiment Analysis and Tokenization
119 pages
NLP Tasks and Grammar Modeling Overview
No ratings yet
NLP Tasks and Grammar Modeling Overview
46 pages
Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
35 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
28 pages
NLP Applications in AI: Language Models
No ratings yet
NLP Applications in AI: Language Models
22 pages
AI Language Models and Applications
No ratings yet
AI Language Models and Applications
16 pages
NLP Techniques: Word Clouds & Models
No ratings yet
NLP Techniques: Word Clouds & Models
38 pages
Introduction to Probabilistic CFG in NLP
No ratings yet
Introduction to Probabilistic CFG in NLP
62 pages
HMMs and Statistical Sequence Classification
No ratings yet
HMMs and Statistical Sequence Classification
47 pages
Language Models in NLP
No ratings yet
Language Models in NLP
160 pages
McCormick Meaning in Bengali
No ratings yet
McCormick Meaning in Bengali
28 pages
Understanding N-gram Language Models
No ratings yet
Understanding N-gram Language Models
3 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
51 pages
NLP for Programming Language Detection
No ratings yet
NLP for Programming Language Detection
8 pages
PoS Tagging Techniques in NLP
No ratings yet
PoS Tagging Techniques in NLP
50 pages
Part of Speech Tagging in NLP
No ratings yet
Part of Speech Tagging in NLP
101 pages
Virtual Lab in NLP Practices
No ratings yet
Virtual Lab in NLP Practices
25 pages
NLP: Language Models & POS Tagging
No ratings yet
NLP: Language Models & POS Tagging
59 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
39 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Numbers and Reciprocals Problem
No ratings yet
Numbers and Reciprocals Problem
13 pages
Triadic Mysticism in Kashmir Shaivism
No ratings yet
Triadic Mysticism in Kashmir Shaivism
251 pages
7th Grade Catholic Church Lesson Plan
100% (1)
7th Grade Catholic Church Lesson Plan
4 pages
Understanding Claims: Fact, Value, Policy
No ratings yet
Understanding Claims: Fact, Value, Policy
18 pages
Vocational Trainer IT Training Program
No ratings yet
Vocational Trainer IT Training Program
2 pages
Understanding Formalism in Literature
No ratings yet
Understanding Formalism in Literature
5 pages
Ball Valve Data Sheet for ASME 300#
No ratings yet
Ball Valve Data Sheet for ASME 300#
1 page
MarrewaKarwoski Columbia 0054D 15950 PDF
No ratings yet
MarrewaKarwoski Columbia 0054D 15950 PDF
288 pages
Trudvang Chronicles Game Master Guide
No ratings yet
Trudvang Chronicles Game Master Guide
19 pages
Biblical Defense of Exclusive Psalmody
No ratings yet
Biblical Defense of Exclusive Psalmody
20 pages
Windows Internals Lab: Process Debugging
No ratings yet
Windows Internals Lab: Process Debugging
6 pages
Theology of the Episcopacy Explained
No ratings yet
Theology of the Episcopacy Explained
7 pages
East African Oral Literature Insights
No ratings yet
East African Oral Literature Insights
12 pages
Unit Plan Framework for Educators
No ratings yet
Unit Plan Framework for Educators
8 pages
Understanding Present Progressive Tense
No ratings yet
Understanding Present Progressive Tense
6 pages
V. A. Kotel'nikov, "On The Transmission Capacity of The "Ether" and Wire in Electrocommunications", Izd. Red. Upr. Svyazi RKKA, Moscow, 1933
No ratings yet
V. A. Kotel'nikov, "On The Transmission Capacity of The "Ether" and Wire in Electrocommunications", Izd. Red. Upr. Svyazi RKKA, Moscow, 1933
19 pages
CS401 Assignment: Assembly Language Tasks
No ratings yet
CS401 Assignment: Assembly Language Tasks
2 pages
Understanding Horror Story Elements
No ratings yet
Understanding Horror Story Elements
3 pages
MONNIT: E-Commerce Platform for Local Shops
No ratings yet
MONNIT: E-Commerce Platform for Local Shops
29 pages
Linguistic Stylistics in Song Lyrics Analysis
No ratings yet
Linguistic Stylistics in Song Lyrics Analysis
37 pages
Outstanding Cambridge Learner Awards 2023
No ratings yet
Outstanding Cambridge Learner Awards 2023
3 pages
Understanding Classful IP Networks
No ratings yet
Understanding Classful IP Networks
10 pages
Phonetics and Phonology Overview
No ratings yet
Phonetics and Phonology Overview
6 pages
Windows 7 Installation Guide
No ratings yet
Windows 7 Installation Guide
14 pages
Spelling: Vowels Concerns: Using O
No ratings yet
Spelling: Vowels Concerns: Using O
4 pages
Direct and Indirect Speech Rules
No ratings yet
Direct and Indirect Speech Rules
9 pages
Ad-hoc Text Mining Pipelines
No ratings yet
Ad-hoc Text Mining Pipelines
333 pages
Grade 10 Mathematics Exam Paper 2024
No ratings yet
Grade 10 Mathematics Exam Paper 2024
6 pages
Guidelines to Avoid ESI Acronym Usage
No ratings yet
Guidelines to Avoid ESI Acronym Usage
24 pages
Lab1 Tessent Shell Overview
No ratings yet
Lab1 Tessent Shell Overview
18 pages