1.
SYNTACTIC PARSING
What is Syntactic Parsing?
Syntactic Parsing (also called parsing) is the process of:
Taking a sentence and analyzing its grammatical structure using a grammar.
It tells:
✔ which words group together
✔ what their roles are
✔ the structure of the sentence (parse tree)
Example Sentence
“The boy eats an apple.”
A parser finds:
“The boy” → Noun phrase (NP)
“eats an apple” → Verb phrase (VP)
Entire sentence → Sentence (S)
Output is usually a parse tree.
Why Parsing is Important?
Machine Translation
Question Answering
Information Extraction
Grammar Checking
Sentiment Analysis
2. COCKE–YOUNGER–KASAMI (CKY)
PARSING ALGORITHM
The CKY Algorithm is one of the most important algorithms in syntactic parsing.
2.1 What is CKY Algorithm? (Very Simple)
CKY is a bottom-up parsing algorithm used to parse sentences using:
Context-Free Grammar (CFG)
In Chomsky Normal Form (CNF)
It builds a dynamic programming table to find all possible parses.
2.2 Requirements of CKY
CKY works only with:
✔ CFG
✔ in CNF (Chomsky Normal Form)
CNF rules are of the form:
1. A → B C
2. A → a
No rule can have more than 2 symbols on the right side.
2.3 CKY Parsing Table (Easy Explanation)
CKY uses a triangular table (matrix).
If sentence length = n, table size = n × n.
Each cell stores the non-terminals that can generate that substring.
2.4 CKY Algorithm Steps (Very Easy)
Step 1: Convert grammar to CNF
Example grammar:
S → NP VP
NP → Det N
VP → V NP
Det → “the”
N → “boy”
V → “saw”
Step 2: Fill diagonal cells (words of the sentence)
Sentence: “the boy saw”
Fill entries:
Cell(1,1) ← “the” → Det
Cell(2,2) ← “boy” → N
Cell(3,3) ← “saw” → V
Step 3: Fill upper cells bottom-up
Try all splits:
Example:
Substring (1,2): “the boy”
Check combinations:
Det + N → NP
So cell(1,2) = NP
Continue until top cell gives S.
Step 4: Accept the sentence if S is in table(1,n)
2.5 Advantages of CKY
Efficient dynamic programming
Finds all possible parses
Works well with PCFG (probabilistic version)
3. STATISTICAL PARSING BASICS
What is Statistical Parsing?
Statistical Parsing assigns probabilities to different parse trees and chooses the most likely one.
Why Statistical Parsing?
Grammar alone may produce many valid trees.
Example:
“I saw the man with a telescope.”
2 possible meanings → 2 parse trees.
Statistical parsing chooses the most probable interpretation.
Core Idea
Attach probabilities to:
Rules
Parses
Trees
Then use algorithms to compute the best parse.
4. PROBABILISTIC CONTEXT FREE
GRAMMAR (PCFG)
What is a PCFG?
A PCFG = CFG + Probability for each rule.
Each grammar rule has a probability:
Example:
NP → Det N 0.6
NP → NP PP 0.4
4.1 PCFG Rule Probability
Probability =
Number of times rule appears /
Total number of expansions of that non-terminal
4.2 Probability of a Parse Tree
Multiply the probabilities of all rules used in building the tree.
[
P(parse\ tree) = \prod P(rule_i)
]
The most likely parse tree = tree with highest probability.
4.3 Advantages of PCFG
Handles ambiguity
Gives quantitative ranking of parse trees
Used widely in early statistical NLP
5. PROBABILISTIC CKY PARSING OF
PCFGs
This is the probabilistic version of the CKY algorithm.
✔ Same algorithm structure
✔ But instead of storing only symbols
✔ We store probabilities also
✔ Choose the best parse tree using maximum probability
5.1 How Probabilistic CKY Works?
Step 1: For diagonal cells
For word “boy”:
N → boy P = 0.5
NP → boy P = 0.1
We store:
Cell(2,2):
N : 0.5
NP : 0.1
Step 2: For each upper cell
Combine possibilities:
Example:
Cell(1,2) from “the boy”:
Det (0.7) + N (0.5) → NP with rule probability 0.6
Total probability =
0.7 × 0.5 × 0.6 = 0.21
Store NP with probability 0.21.
Step 3: Continue filling table
For all spans and all splits:
[
P(A\rightarrow BC)=P(A\rightarrow BC) \times P(B) \times P(C)
]
Pick max probability if more than one parse exists.
Step 4: Final parse
Top cell gives S with highest probability.
5.2 Advantages of Probabilistic CKY
Combines dynamic programming with probabilities
Finds the most probable parse, not just any parse
Works well with PCFG
Efficient and accurate
6. SHORT SUMMARY
1. Syntactic Parsing analyzes sentence grammatical structure and produces a parse tree.
2. CKY Algorithm is a bottom-up dynamic programming parser that works with CFG in
CNF form.
3. Statistical Parsing assigns probabilities to parse trees and selects the most likely one.
4. PCFG is a CFG where each production rule has a probability.
5. Probabilistic CKY Parsing extends CKY to PCFG by storing probabilities and choosing
the best parse tree using maximum likelihood.