0% found this document useful (0 votes)

307 views53 pages

Understanding Part-Of-Speech Tagging

This document discusses part-of-speech (POS) tagging. It explains that POS tagging involves associating each word in a text with its correct lexical category, such as noun or verb. It also describes the different word classes like open classes that can have new words added and closed classes that do not. Finally, it discusses different approaches to POS tagging like rule-based and probabilistic methods.

Uploaded by

Eco Frnd Nikhil Ch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

307 views53 pages

Understanding Part-Of-Speech Tagging

Uploaded by

Eco Frnd Nikhil Ch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Part-Of-Speech (POS) Tagging

Synchronic Model of Language

Pragmatic
Discourse
Semantic
Syntactic
Lexical
Morphological

2
What is Part-Of-Speech Tagging?

•  The general purpose of a part-of-speech tagger is to

associate each word in a text with its correct lexical-
syntactic category (represented by a tag)

03/14/1999 (AFP)… the extremist Harkatul Jihad group,

reportedly backed by Saudi dissident Osama bin Laden ...

… the|DT extremist|JJ Harkatul|NNP Jihad|NNP group|

3
What are Parts-of-Speech?
•  Approximately 8 traditional basic words classes, sometimes
called lexical classes or types
•  These are the ones taught in grade school grammar
–  N noun chair, bandwidth, pacing
–  V verb study, debate, munch
–  ADJ adjective purple, tall, ridiculous (includes articles)
–  ADV adverb unfortunately, slowly
–  P preposition of, by, to
–  CON conjunction and, but
–  PRO pronoun I, me, mine
–  INT interjection um

4
Open Class Words
•  Open classes – can add words to these basic word classes:
•  Nouns, Verbs, Adjectives, Adverbs.
–  Every known human language has nouns and verbs
•  Nouns: people, places, things
–  Classes of nouns
•  proper vs. common
•  count vs. mass
–  Properties of nouns: can be preceded by a determiner, etc.
•  Verbs: actions and processes
•  Adjectives: properties, qualities
•  Adverbs: hodgepodge!
–  Unfortunately, John walked home extremely slowly yesterday
•  Numerals: one, two, three, third, … 5
Closed Class Words
•  Closed classes– words are not added to these classes:
–  determiners: a, an, the
–  pronouns: she, he, I
–  prepositions: on, under, over, near, by, …
–  over the river and through the woods
–  particles: up, down, on, off, …
•  Used with verbs and have slightly different meaning than when
used as a preposition
–  she turned the paper over
•  Closed class words are often function words which have
structuring uses in grammar:
–  of, it , and , you
•  Differ more from language to language than open class
words 6
Open and Closed Classes
•  We may want to make more distinctions than 8 classes:
Open class (lexical) words
Nouns Verbs Adjectives old older oldest

Proper Common Main Adverbs slowly

IBM cat / cats see
Italy snow registered Numbers … more
122,312
one
Closed class (functional)
Modals
Determiners the some can Prepositions to with
had
Conjunctions and or Particles off up … more

Pronouns he its Interjections Ow Eh

7
Prepositions from CELEX
•  From the CELEX on-line dictionary with frequencies from
the COBUILD corpus

Charts from Jurafsky and Martin text 8

English Single-Word Particles

9
Pronouns in CELEX

10
Conjunctions

11
Auxiliary Verbs

12
Possible Tag Sets for English
•  Kucera & Brown (Brown Corpus) – 87 POS tags

•  C5 (British National Corpus) – 61 POS tags

–  Tagged by Lancaster’s UCREL project

•  Penn Treebank – 45 POS tags

–  Most widely used of the tag sets today

13
Penn Treebank

•  A corpus containing:
–  over 1.6 million words of hand-parsed material from the
Dow Jones News Service, plus an additional 1 million
words tagged for part-of-speech.
–  the first fully parsed version of the Brown Corpus, which
has also been completely retagged using the Penn
Treebank tag set.
–  source code for several software packages which permits
the user to search for specific constituents in tree
structures.

•  Costs $1,250 to $2,500 for research use

•  Separate licensing needed for commercial use

14
Word Classes: Penn Treebank Tag Set

PRP
PRP$

15
Example of Penn Treebank Tagging of
Brown Corpus Sentence

• The/DT grand/JJ jury/NN commented/VBD on/IN a/DT

number/NN of/IN other/JJ topics/NNS ./.

• VB DT NN .
Book that flight .

• VBZ DT NN VB NN ?
Does that flight serve dinner ?

16
Why is Part-Of-Speech Tagging Hard?

•  Words may be ambiguous in different ways:

–  A word may have multiple meanings as the same part-
of-speech
•  file – noun, a folder for storing papers
•  file – noun, instrument for smoothing rough edges
–  A word may function as multiple parts-of-speech
•  a round table: adjective
•  a round of applause: noun
•  to round out your interests: verb
•  to work the year round: adverb

17
Why is Part-Of-Speech Tagging Needed?
•  May be useful to know what function the word plays, instead
of depending on the word itself.
•  Internally, next higher levels of NL Processing:
–  Phrase Bracketing
•  Can write regexps like (Det) Adj* N+ over the output for phrases, etc.
–  Parsing
•  As input to or to speed up a full parser
•  If you know the tag, you can back off to it in other tasks
–  Semantics
•  Applications that use POS tagging:
–  Speech synthesis - Text-to-speech (how do we pronounce “lead”?)
–  Information retrieval — stemming, selection of high-content words
–  Word-sense disambiguation
–  Machine Translation
–  and others 18
Overview of Approaches
•  Rule-based Approach
–  Simple and doesn’t require a tagged corpus, but not as accurate as
other approaches
•  Stochastic Approach
–  Refers to any approach which incorporates frequencies or
probabilities
–  Requires a tagged corpus to learn frequencies
–  N-gram taggers and Naïve Bayes taggers
–  Hidden Markov Model (HMM) taggers
–  . . .
•  Other Issues: unknown words and evaluation

19
The Problem
•  Words often have more than one word class: another
example is the word this
–  This is a nice day = PRP
–  This day is nice = DT
–  You can go this far = RB

20
Word Class Ambiguity
(in the Brown Corpus)
•  Unambiguous (1 tag): 35,340
•  Ambiguous (2-7 tags): 4,100

2 tags 3,760
3 tags 264
4 tags 61
5 tags 12
6 tags 2
7 tags 1

(Derose, 1988)
21
Rule-Based Tagging
•  Uses a dictionary that gives possible tags for words
•  Basic algorithm
–  Assign all possible tags to words
–  Remove tags according to set of rules of type:
•  Example rule:
–  if word+1 is an adj, adv, or quantifier and the following is a
sentence boundary and word-1 is not a verb like “consider” then
eliminate non-adv else eliminate adv.
–  Typically more than 1000 hand-written rules, but may be machine-
learned
•  This approach not is serious use

22
N-gram Approach
•  N-gram approach to probabilistic POS tagging:
–  calculates the probability of a given sequence of tags occurring for a
sequence of words
–  the best tag for a given word is determined by the (already
calculated) probability that it occurs with the n previous tags
–  may be bi-gram, tri-gram, etc

wordn-1 … word-2 word-1 word

tagn-1 … tag-2 tag-1 ??

•  Presented here as an introduction to HMM tagging

–  And given in more detail in the NLTK
–  In practice, bigram and trigram probabilities have the problem that
the combinations of words are sparse in the corpus
–  Combine the taggers with a backoff approach 23
N-gram Tagging
•  Initialize a tagger by learning probabilities from a tagged
corpus
wordn-1 … word-2 word-1 word
tagn-1 … tag-2 tag-1 ??
–  Probability that the sequence … tag-2 tag-1 word gives tag XX
–  Note that initial sequences will include a start marker as part of the
sequence
•  Use the tagger to tag word sequences (usually of length 2-3)
with unknown tags
–  Sequence through the words:
•  To determine the POS tag for the next word, use the previous
n-1 tags and the word to look up probabilities and use the
highest probability tag

24
Need Longer Sequence Classification
•  A more comprehensive approach to tagging
considers the entire sequence of words
–  Secretariat is expected to race tomorrow
•  What is the best sequence of tags which
corresponds to this sequence of observations?
•  Probabilistic view:
–  Consider all possible sequences of tags
–  Out of this universe of sequences, choose the tag
sequence which is most probable given the observation
sequence of n words w1…wn.
Thanks to Jim Martin’s online class slides for the examples and equation typesetting
in this section on HMM’s. 25
Road to HMMs
•  We want, out of all sequences of n tags t1…tn the single tag
sequence such that P(t1…tn|w1…wn) is highest.
–  i.e. the probability of the tag sequence t1…tn given the word sequence
w1…wn
*

•  Hat ^ means “our estimate of the best one”

•  Argmaxx f(x) means “the x such that f(x) is maximized”
–  i.e. find the tag sequence that maximizes the probability
26
Road to HMMs
•  This equation is guaranteed to give us the best tag sequence

•  But how to make it operational? How to compute this

value?
•  Intuition of Bayesian classification:
–  Use Bayes rule to transform into a set of other probabilities that are
easier to compute

Thomas Bayes 1701 - 1761 27

Using Bayes Rule
•  Bayes rule:

•  Apply Bayes Rule:

•  Note that this is using the conditional probability, given a

tag, what is the most likely word with that tag.
–  Eliminate denominator as it is the same for every sequence

28
Likelihood and Prior
•  Further simplify

•  Likelihood: assume that the probability of the word depends only on its
tag

•  Prior: use the bigram assumption that the tag only depends on the
previous tag

29
Two Sets of Probabilities (1)

•  Tag transition probabilities p(ti|ti-1) (priors)

–  Determiners likely to precede adjs and nouns
•  That/DT flight/NN
•  The/DT yellow/JJ hat/NN
•  So we expect P(NN|DT) and P(JJ|DT) to be high
–  Compute P(NN|DT) by counting in a labeled corpus:

Count of DT NN sequence

30
Two Sets of Probabilities (2)
•  Word likelihood probabilities p(wi|ti)
–  VBZ (3sg Pres verb) likely to be “is”
–  Compute P(is|VBZ) by counting in a labeled corpus:

Count of is tagged with VBZ

31
An Example: the verb “race”

•  Secretariat/NNP is/VBZ expected/VBN to/TO race/VB

tomorrow/NR
•  People/NNS continue/VB to/TO inquire/VB the/DT reason/NN
for/IN the/DT race/NN for/IN outer/JJ space/NN
•  How do we pick the right tag?

32
Disambiguating “race”
Which tag sequence is most likely?

33
Example
•  The equations only differ in “to race tomorrow”
•  P(NN|TO) = .00047 The tag transition probabilities P(NN|TO)
and P(VB|TO)
•  P(VB|TO) = .83
•  P(race|NN) = .00057 Lexical likelihoods from the Brown corpus
for ‘race’ given a POS tag NN or VB.
•  P(race|VB) = .00012
•  P(NR|VB) = .0027 Tag sequence probability for the likelihood
of an adverb occurring given the previous
•  P(NR|NN) = .0012 tag verb or noun

35
Hidden Markov Models
•  What we’ve described with these two kinds of probabilities
is a Hidden Markov Model
–  The Markov Model is the sequence of words and the hidden states
are the POS tags for each word.
•  When we evaluated the probabilities by hand for a sentence,
we could pick the optimum tag sequence
•  But in general, we need an optimization algorithm to most
efficiently pick the best tag sequence without computing all
possible combinations of probabilities

36
Tag Transition Probabilities for an HMM
•  The HMM hidden states can be represented in a graph
where the edges are the transition probabilities between
POS tags.

37
Observation likelihoods for POS HMM
•  For each POS tag, give words with probabilities

38
The A matrix for the POS HMM
•  Example of tag transition probabilities represented in a
matrix, usually called the A matrix in an HMM:
–  The probability that VB follows <s> is .019, …

39
The B matrix for the POS HMM
•  Word likelihood probabilities are represented in a matrix,
where for each tag, we show the probability that a word has
that tag

40
Using HMMs for POS tagging
•  From the tagged corpus, create a tagger by computing the
two matrices of probabilities, A and B
–  Straightforward for bigram HMM
–  For higher-order HMMs, efficiently compute matrix by the forward-
backward algorithm
•  To apply the HMM tagger to unseen text, we must find the
best sequence of transitions
–  Given a sequence of words, find the sequence of states (POS tags)
with the highest probabilities along the path
–  This task is sometimes called “decoding”
–  Use the Viterbi algorithm

41
Viterbi intuition: we are looking for the best ‘path’

Each word has states representing the possible POS tags:

S1 S2 S3 S4 S5
RB

VBN
JJ DT VB
TO
VBD
VB NNP NN

promised to back the bill

42
Viterbi example
Each pair of tags labeled with an edge giving transition probability
Each tag in a state labeled with a Viterbi value giving max over states in
previous word of its viterbi value * transition prob * word likelihood
representing “best path to this node”

43
Viterbi Algorithm sketch
•  This algorithm fills in the elements of the array viterbi in the
previous slide (cols are words, rows are states (POS tags))
function Viterbi
for each state s, compute the initial column
viterbi[s, 1] = A[0, s] * b[s, word1]
for each word w from 2 to N (length of sequence)
for each state s, compute the column for w
viterbi[s, w] =
max over s’ ( viterbi[s’,w-1] * A[s’,s] * B[s,w])
<save back pointer to trace final path>
return the trace of back pointers

where A is the matrix of state transitions

and B is the matrix of state/word likelihoods 44
Recall HMM
•  So an HMM POS tagger computes the A matrix of tag
transition probabilities and the B matrix of likelihood tag/
word probabilities from a (training) corpus
•  Then for each sentence that we want to tag, it uses the
Viterbi algorithm to find the path of the best sequence of
tags to fit that sentence.
•  This is an example of a sequential classifier.

45
Evaluation: Is our POS tagger any good?
•  Answer: we use a manually tagged corpus, which we will
call the “Gold Standard”
–  We run our POS tagger on the gold standard and compare its
predicted tags with the gold tags
–  We compute the accuracy (and other evaluation measures)
•  Important: 100% is impossible even for human annotators.
–  We estimate humans can do POS tagging at about 98% accuracy.
–  Some tagging decisions are very subtle and hard to do:
•  Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/
VBG
•  All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT
corner/NN
•  Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD
–  The “Gold Standard” will have human mistakes; humans are subject to
fatigue, etc.
46
How can we improve our tagger?
•  What are the main sources of information for our HMM
POS tagger?
–  Knowledge of tags of neighboring words
–  Knowledge of word tag probabilities
•  man is rarely used as a verb….
•  The latter proves the most useful, but the former also helps
•  Unknown words can be a problem because we don’t have
this information
•  And we are not including information about the features of
the words

47
Features of words
•  Can do surprisingly well just looking at a word by itself:
–  Word the: the → DT (determiner)
–  Lowercased word Importantly: importantly → RB (adverb)
–  Prefixes unfathomable: un- → JJ (adjective)
–  Suffixes Importantly: -ly → RB
tangential: -al → JJ
–  Capitalization Meridian: CAP → NNP (proper noun)
–  Word shapes 35-year: d-x → JJ
•  These properties can include information about the previous
or the next word(s)
–  The word be appears to the left pretty → JJ
•  But not information about tags of the previous or next
words, unlike HMM
48
Feature-based Classifiers
•  A feature-based classifier is an algorithm that will take a
word and assign a POS tag based on features of the word in
its context in the sentence.
•  Many algorithms are used, just to name a few
–  Naïve Bayes
–  Maximum Entropy (MaxEnt)
–  Support Vector Machines (SVM)

–  We’ll be covering lots more about classifiers later in the course.

49
Overview of POS tagger Accuracies
•  List produced by Chris Manning
•  Rough accuracies: all words / unknown words
–  Most freq tag: ~90% / ~50%

Most errors on
–  Trigram HMM: ~95% / ~55% unknown
•  HMM with trigrams words
–  Maxent P(t|w): 93.7% / 82.6%
•  Feature based tagger
–  MEMM tagger: 96.9% / 86.9%
•  Combines feature based and HMM tagger
–  Bidirectional dependencies: 97.2% / 90.0%

–  Upper bound: ~98% (human agreement)

50
Development process for features
•  The tagged data should be separated into a training set and a
test set.
–  The tagger is trained on the training set and evaluated on the test set
•  May also hold out some data for development
–  Evaluation numbers are not prejudiced by the training set
•  If our feature-based tagger has errors, then we improve the
features.
–  Suppose we incorrectly tag as as IN in the phrase as soon as, when
it should be RB:
PRP VBD IN RB IN PRP VBD .
They left as soon as he arrived .

–  We could fix this with a feature that include the next word.

51
POS taggers with online demos
•  Many pages list downloadable taggers (and other resources)
such as this page from the Stanford NLP group and George
Dillon at U Washington
–  [Link]
–  [Link]
•  There are not too many on-line taggers available for demos,
but here are two:
–  Illinois (UIUC) tagger demo from the Cognitive Computation Group
–  [Link] (colors!)
–  Sequential tagger from U Penn using SVM classification
–  [Link] (slow)

52
Conclusions

•  Part of Speech tagging is a doable task with high performance

results

•  Contributes to many practical, real-world NLP applications and

is now used as a pre-processing module in most systems

•  Computational techniques learned at this level can be applied

to NLP tasks at higher levels of language processing

POS Tagging and Sequence Labeling in NLP
No ratings yet
POS Tagging and Sequence Labeling in NLP
69 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
56 pages
HMM-Based POS Tagging Overview
No ratings yet
HMM-Based POS Tagging Overview
94 pages
Understanding POS Tagging in NLP
No ratings yet
Understanding POS Tagging in NLP
33 pages
Part of Speech Tagging Explained
No ratings yet
Part of Speech Tagging Explained
62 pages
Parts of Speech Tagging
No ratings yet
Parts of Speech Tagging
62 pages
Understanding Part of Speech Tagging
No ratings yet
Understanding Part of Speech Tagging
11 pages
Understanding Part of Speech Tagging
No ratings yet
Understanding Part of Speech Tagging
41 pages
Understanding POS Tagging Techniques
No ratings yet
Understanding POS Tagging Techniques
42 pages
Understanding POS Tagging in English
No ratings yet
Understanding POS Tagging in English
24 pages
POS Tagging and Word Classes Explained
No ratings yet
POS Tagging and Word Classes Explained
40 pages
Understanding POS Tagging Techniques
No ratings yet
Understanding POS Tagging Techniques
62 pages
Understanding Part of Speech Tagging
No ratings yet
Understanding Part of Speech Tagging
36 pages
POS Tagging and Named Entity Recognition
No ratings yet
POS Tagging and Named Entity Recognition
30 pages
Understanding VBZ in NLP POS Tagging
No ratings yet
Understanding VBZ in NLP POS Tagging
38 pages
POS Tagging and Sequence Labeling in NLP
No ratings yet
POS Tagging and Sequence Labeling in NLP
108 pages
Understanding POS Tagging Basics
No ratings yet
Understanding POS Tagging Basics
35 pages
PoS Tagging Techniques in NLP
No ratings yet
PoS Tagging Techniques in NLP
50 pages
Understanding POS Tagging in NLP
No ratings yet
Understanding POS Tagging in NLP
14 pages
NLP Unit III: POS Tagging Insights
No ratings yet
NLP Unit III: POS Tagging Insights
30 pages
Part of Speech Tagging in NLP
No ratings yet
Part of Speech Tagging in NLP
101 pages
Understanding Part of Speech Tagging
No ratings yet
Understanding Part of Speech Tagging
63 pages
POS Tagging with Viterbi Algorithm
No ratings yet
POS Tagging with Viterbi Algorithm
29 pages
HMM and CRF in NLP Tagging
No ratings yet
HMM and CRF in NLP Tagging
93 pages
Part-of-Speech Tagging Techniques
No ratings yet
Part-of-Speech Tagging Techniques
83 pages
NLP POS Tagging and HMM Techniques
No ratings yet
NLP POS Tagging and HMM Techniques
5 pages
Understanding Word Classes in NLP
No ratings yet
Understanding Word Classes in NLP
141 pages
Overview of POS Tagging Methods
No ratings yet
Overview of POS Tagging Methods
75 pages
Understanding Part-of-Speech Tagging
No ratings yet
Understanding Part-of-Speech Tagging
13 pages
Understanding POS Tagging Techniques
No ratings yet
Understanding POS Tagging Techniques
43 pages
Understanding Syntax Analysis in NLP
No ratings yet
Understanding Syntax Analysis in NLP
36 pages
Parts of Speech Tagging in NLP
No ratings yet
Parts of Speech Tagging in NLP
50 pages
Penn Treebank POS Tagging Overview
No ratings yet
Penn Treebank POS Tagging Overview
19 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
14 pages
Overview of POS Tagging Techniques
No ratings yet
Overview of POS Tagging Techniques
47 pages
Speech Recognition System Architecture
No ratings yet
Speech Recognition System Architecture
13 pages
Part-of-Speech Tagging Overview
No ratings yet
Part-of-Speech Tagging Overview
39 pages
Part-of-Speech Tagging Overview
No ratings yet
Part-of-Speech Tagging Overview
86 pages
Understanding POS Tagging Techniques
No ratings yet
Understanding POS Tagging Techniques
5 pages
Understanding POS Tagging with HMM
No ratings yet
Understanding POS Tagging with HMM
52 pages
Part-of-Speech Tagging in NLP
No ratings yet
Part-of-Speech Tagging in NLP
72 pages
Types of Tagging in Linguistics
No ratings yet
Types of Tagging in Linguistics
3 pages
Understanding Proper Nouns in NLP
No ratings yet
Understanding Proper Nouns in NLP
84 pages
Understanding Parts of Speech and Tagging
No ratings yet
Understanding Parts of Speech and Tagging
26 pages
Understanding English POS Tagging
No ratings yet
Understanding English POS Tagging
48 pages
A Hybrid Model For POS Tagging
No ratings yet
A Hybrid Model For POS Tagging
4 pages
Understanding POS Tagging in NLP
No ratings yet
Understanding POS Tagging in NLP
20 pages
Part-of-Speech Tagging Overview
No ratings yet
Part-of-Speech Tagging Overview
84 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
NLTK PoS Tagging for German Text
No ratings yet
NLTK PoS Tagging for German Text
30 pages
Sequence Learning in NLP: POS Tagging
No ratings yet
Sequence Learning in NLP: POS Tagging
50 pages
Hybrid POS Tagging for Bengali Language
No ratings yet
Hybrid POS Tagging for Bengali Language
4 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
31 pages
HMM for POS Tagging in Python
No ratings yet
HMM for POS Tagging in Python
13 pages
Understanding Parts of Speech Tagging
No ratings yet
Understanding Parts of Speech Tagging
20 pages
Understanding Part-of-Speech Tagging
No ratings yet
Understanding Part-of-Speech Tagging
56 pages
Multi-Tagging in Dependency Parsing
No ratings yet
Multi-Tagging in Dependency Parsing
10 pages
NLP Morphology and POS Tagging Notes
No ratings yet
NLP Morphology and POS Tagging Notes
22 pages
Perceptron Network Overview and Training
No ratings yet
Perceptron Network Overview and Training
92 pages
Unsupervised Learning in Deep Learning
100% (1)
Unsupervised Learning in Deep Learning
79 pages
Fuzzy Decision Making in Inference Systems
No ratings yet
Fuzzy Decision Making in Inference Systems
52 pages
Fuzzy Rule Base and Reasoning Methods
No ratings yet
Fuzzy Rule Base and Reasoning Methods
31 pages
Understanding N-gram Models
No ratings yet
Understanding N-gram Models
88 pages
Soft Computing Course Overview
100% (2)
Soft Computing Course Overview
69 pages
Perceptron and Adaline Learning Models
No ratings yet
Perceptron and Adaline Learning Models
40 pages
Fuzzy Membership Functions Explained
No ratings yet
Fuzzy Membership Functions Explained
90 pages
Java UDP DatagramSocket Overview
No ratings yet
Java UDP DatagramSocket Overview
33 pages
N-Gram Models in Natural Language Processing
No ratings yet
N-Gram Models in Natural Language Processing
48 pages
Data Leakage Prevention Strategies
No ratings yet
Data Leakage Prevention Strategies
27 pages
Data Classification and DLP Techniques
No ratings yet
Data Classification and DLP Techniques
36 pages
Soft Computing Course Overview and Techniques
No ratings yet
Soft Computing Course Overview and Techniques
244 pages
Wholesale to Retail ERP Management
No ratings yet
Wholesale to Retail ERP Management
25 pages
Web Development Quiz Questions and Answers
No ratings yet
Web Development Quiz Questions and Answers
36 pages
Understanding World Englishes Concepts
No ratings yet
Understanding World Englishes Concepts
26 pages
Writing Guidelines for Engineers
No ratings yet
Writing Guidelines for Engineers
5 pages
Mastering Effective Assignment Writing
No ratings yet
Mastering Effective Assignment Writing
3 pages
Grade 6 English Assessment Memorandum
No ratings yet
Grade 6 English Assessment Memorandum
6 pages
Key Concepts in Career Development
No ratings yet
Key Concepts in Career Development
12 pages
CEFR A1 Oral Presentation Guide
No ratings yet
CEFR A1 Oral Presentation Guide
24 pages
ELPR 140 Writing Task Guide 2024-2025
No ratings yet
ELPR 140 Writing Task Guide 2024-2025
16 pages
Customer Differentiation Strategies in Social Media
No ratings yet
Customer Differentiation Strategies in Social Media
2 pages
Understanding Multimodal Texts and Communication
No ratings yet
Understanding Multimodal Texts and Communication
28 pages
Lep Bilingual Dictionary
No ratings yet
Lep Bilingual Dictionary
36 pages
English 7 Performance Task Rubric
No ratings yet
English 7 Performance Task Rubric
1 page
High School: A Race for Prestige
No ratings yet
High School: A Race for Prestige
2 pages
Prepositions with "Get" Explained
No ratings yet
Prepositions with "Get" Explained
141 pages
Communication Process and Scenarios
No ratings yet
Communication Process and Scenarios
42 pages
The Rogerian Argument
No ratings yet
The Rogerian Argument
4 pages
Mastering English Sentence Stress
No ratings yet
Mastering English Sentence Stress
11 pages
Teacher Rating Scale for Language Skills
No ratings yet
Teacher Rating Scale for Language Skills
4 pages
Three P's of Effective Public Speaking
100% (1)
Three P's of Effective Public Speaking
10 pages
Effective Communication in Healthcare
No ratings yet
Effective Communication in Healthcare
8 pages
Understanding Scoring Rubrics
No ratings yet
Understanding Scoring Rubrics
9 pages
English Linguist with Teaching Experience
No ratings yet
English Linguist with Teaching Experience
1 page
Classroom Assessment Guidelines K-12
No ratings yet
Classroom Assessment Guidelines K-12
16 pages
Microcurricular Planning for English Class
No ratings yet
Microcurricular Planning for English Class
20 pages
Ahmed Reza Jihan's Resume
No ratings yet
Ahmed Reza Jihan's Resume
2 pages
Comprehensive Guide to Language Tasks
No ratings yet
Comprehensive Guide to Language Tasks
3 pages
National 3 Drama Skills Unit Guide
No ratings yet
National 3 Drama Skills Unit Guide
5 pages
Comparing Print and Electronic Media
No ratings yet
Comparing Print and Electronic Media
12 pages
Neurolinguistics in Forensic Analysis
No ratings yet
Neurolinguistics in Forensic Analysis
5 pages
Science 7 Semi-Detailed Lesson Plan
No ratings yet
Science 7 Semi-Detailed Lesson Plan
3 pages
Critical Reading Strategies for Comprehension
No ratings yet
Critical Reading Strategies for Comprehension
23 pages

Understanding Part-Of-Speech Tagging

Uploaded by

Understanding Part-Of-Speech Tagging

Uploaded by

Part-Of-Speech (POS) Tagging

Synchronic Model of Language

• The general purpose of a part-of-speech tagger is to

03/14/1999 (AFP)… the extremist Harkatul Jihad group,

… the|DT extremist|JJ Harkatul|NNP Jihad|NNP group|

Proper Common Main Adverbs slowly

Pronouns he its Interjections Ow Eh

Charts from Jurafsky and Martin text 8

• C5 (British National Corpus) – 61 POS tags

• Penn Treebank – 45 POS tags

• Costs $1,250 to $2,500 for research use

• Separate licensing needed for commercial use

• The/DT grand/JJ jury/NN commented/VBD on/IN a/DT

• Words may be ambiguous in different ways:

wordn-1 … word-2 word-1 word

• Presented here as an introduction to HMM tagging

• Hat ^ means “our estimate of the best one”

• But how to make it operational? How to compute this

Thomas Bayes 1701 - 1761 27

• Apply Bayes Rule:

• Note that this is using the conditional probability, given a

• Tag transition probabilities p(ti|ti-1) (priors)

Count of is tagged with VBZ

• Secretariat/NNP is/VBZ expected/VBN to/TO race/VB

Each word has states representing the possible POS tags:

promised to back the bill

where A is the matrix of state transitions

– We’ll be covering lots more about classifiers later in the course.

– Upper bound: ~98% (human agreement)

• Part of Speech tagging is a doable task with high performance

• Contributes to many practical, real-world NLP applications and

• Computational techniques learned at this level can be applied

You might also like

•  The general purpose of a part-of-speech tagger is to

•  C5 (British National Corpus) – 61 POS tags

•  Penn Treebank – 45 POS tags

•  Costs $1,250 to $2,500 for research use

•  Separate licensing needed for commercial use

• The/DT grand/JJ jury/NN commented/VBD on/IN a/DT

•  Words may be ambiguous in different ways:

•  Presented here as an introduction to HMM tagging

•  Hat ^ means “our estimate of the best one”

•  But how to make it operational? How to compute this

•  Apply Bayes Rule:

•  Note that this is using the conditional probability, given a

•  Tag transition probabilities p(ti|ti-1) (priors)

•  Secretariat/NNP is/VBZ expected/VBN to/TO race/VB

–  We’ll be covering lots more about classifiers later in the course.

–  Upper bound: ~98% (human agreement)

•  Part of Speech tagging is a doable task with high performance

•  Contributes to many practical, real-world NLP applications and

•  Computational techniques learned at this level can be applied