Chapter Three
Context Free Languages
1
Context free grammar (CFG)
• A context-free grammar is a specification for the syntactic
structure of a programming language.
• Context-free grammar has 4-tuples:
G = (V, T, P, S) where
– T is a finite set of terminals (a set of tokens)
– V is a finite set of non-terminals (syntactic variables)
– P is a finite set of productions of the form
A→α where A is non-terminal and
α is a strings of terminals and non-terminals (including the empty
string)
S ∈ V is a designated start symbol (one of the non- terminal
symbols)
2
Context-Free Grammars (CFG)
There is a finite set of symbols that form the strings, i.e. there is a
finite alphabet. The alphabet symbols are called terminals
• There is a finite set of variables, sometimes called non
terminals or syntactic categories. Each variable represents a
language (i.e. a set of strings).
• One of the variables is the start symbol. Other variables may
exist to help define the language.
3
There is a finite set of productions or production rules that
represent the recursive definition of the language.
production is defined:
[Link] a single variable that is being defined to the left of the
production
[Link] the production symbol
[Link] a string of zero or more terminals or variables, called the
body of the production. To form strings we can substitute each
variable’s production in for the body where it appears.
4
• Definition: A CFG, G=(V, T, P, S) with productions of the
form A β, A Є V, β Є (VUT)*.
• A language generated from CFG is called Context Free
Language (CFL).
• Ex.
a) S aB b) S aB|A
B bA|b A aA|a|CBA
Aa Bλ
Cc
5
Derivation tree
•Derivation tree is sequence of productions rules.
• It is used to get i/p string.
• The root of the tree is S and x is the collection of leaves from
left to right.
•Left most derivation: employs the reduction of the left most
non-terminal
•Right most derivation: employs the reduction of the right most
non-terminal
6
Parse tree
• A parse tree is a graphical representation of a derivation.
• It filters out the order in which productions are applied to
replace non-terminals.
• A parse tree corresponding to a derivation is a labeled tree in
which:
• the interior nodes are labeled by non-terminals,
• the leaf nodes are labeled by terminals, and
• the children of each internal node represent the
replacement of the associated non-terminal in one step
of the derivation.
7
Parse tree and Derivation
Grammar E E + E | E E | ( E ) | - E | id
Lets examine this derivation:
E -E -(E) -(E + E) -(id + id)
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
8
Ambiguity
•If a derivation of a string x has two different left
most or right most derivations, then the grammar
is said to be ambiguous. Otherwise it is
unambiguous.
• a grammar is ambiguous if it can produce more
than one parse tree for a particular sentence( a set
of terminals).
9
Ambiguity: example
E E + E | E E | ( E ) | - E | id
Construct parse tree for the expression: id + id id
E E E E
E + E E + E E + E
E E id E E
id id
E E E E
E E E E E E
E + E E + E id
Which parse tree is correct?
id id
10
Ambiguity: example…
E E + E | E E | ( E ) | - E | id
Find a derivation for the expression: id + id id
E
According to the grammar, both are correct.
E + E
id E E
A grammar that produces more than one
id id
parse tree for any input sentence is said
E
to be an ambiguous grammar.
E * E
E + E id
id id
11
1. G2 = (V, T, P, S) with productions:
S SbS|ScS|a
let x = abaca Є L(G2)
a) find a left most and right most
derivations for x
b) draw the parse tree for x
2. Is G2 ambiguous?
12
Con..
1. G1 = (V, T, P, S) with productions:
S AB
A aA|a
B bB|b
let x = aaabbb
a) find a left most and right most
derivations for x
b) draw the parse tree for x
13
Parsing
S ➞ XY
X ➞ XA | a | b
Y ➞ AY | a
A➞a
Does this grammar generate “baaa”?
14
Parsing Arithmetic Expression
• Consider the following grammar:
ET|E+T|E–T
T F | T * F | T/F
F a | b | c | (E)
Draw parse trees for
a) a*b+c b) a+b*c c) (a+b)*c d) a-b-c
15
CFG and programming languages
• One of the most important uses of the theory of formal
languages is in the definition of programming languages
and in the construction of interpreters and compilers for
them.
• regular languages helps us to model simple patterns in
programming languages, context free languages used to
model complex aspects.
• programming language that can be modeled by context-
free grammar is referred to as the syntax.
16
Simplifications
of
Context-Free Grammars
17
Simplifying Context-Free Grammars
• Variables in CFGs can often be eliminated
– If they are not recursive, you can substitute their rules
throughout.
18
A Substitution Rule
Equivalent
grammar
S aB
S aB | ab
A aaA
Substitute A aaA
A abBc B b A abBc | abbc
B aA
B aA
B b
19
A Substitution Rule
S aB | ab
A aaA
A abBc | abbc
B aA
Substitute
B aA
S aB | ab | aaA
A aaA Equivalent
A abBc | abbc | abaAc grammar
20
In general:
A xBz
B y1
Substitute
B y1
equivalent
A xBz | xy1z grammar
21
Nullable Variables
• Any variable that can eventually
terminate in the empty string is said to
be nullable.
– Note: a variable may be indirectly
nullable
– In general: if A ➞ V1V2…Vn, and all
the Vi are nullable, then A is also
nullable.
22
Nullable Variables
production : A
Nullable Variable: A
23
Removing Nullable Variables
Example Grammar:
S aMb
M aMb
M
Nullable variable
24
Final Grammar
S aMb
S aMb
Substitute S ab
M aMb M
M aMb
M
M ab
25
Unit-Productions
Unit Production: A B
(a single variable in both sides)
26
Removing Unit Productions
Observation:
A A
Is removed immediately
27
Example Grammar:
S aA
A a
A B
B A
B bb
28
S aA
S aA | aB
A a
Substitute A a
A B A B B A| B
B A
B bb
B bb
29
S aA | aB S aA | aB
A a Remove A a
B A| B B B B A
B bb B bb
30
S aA | aB
S aA | aB | aA
A a Substitute
B A A a
B A
B bb
B bb
31
Remove repeated productions
Final grammar
S aA | aB | aA S aA | aB
A a A a
B bb B bb
32
Useless Productions
S aSb
S
S A
A aA Useless Production
Some derivations never terminate...
S A aA aaA aa aA
33
Another grammar:
S A
A aA
A
B bA Useless Production
Not reachable from S
34
In general: contains only
terminals
if S xAy w
w L(G )
then variable A is useful
otherwise, variable A is useless
35
A production A x is useless
if any of its variables is useless
S aSb
S Productions
Variables S A useless
useless A aA useless
useless B C useless
useless C D useless
36
Removing Useless Productions
Example Grammar:
S aS | A | C
A a
B aa
C aCb
37
First: find all variables that can produce
strings with only terminals
S aS | A | C Round 1: { A, B}
A a S A
B aa
C aCb Round 2: { A, B, S }
38
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)
S aS | A | C
A a S aS | A
B aa A a
C aCb B aa
Remove useless productions
39
Second: Find all variables
reachable from S
Use a Graph
S aS | A
A a S A B
B aa not
reachable
40
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S aS | A
S aS | A
A a
A a
B aa
Remove useless productions
41
Removing All
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions
Step 3: Remove Useless Variables
42
Normal Forms
for
Context-free Grammars
43
Chomsky Normal Form
Each productions has form:
A BC or A a
variable variable terminal
44
Examples:
S AS S AS
S a S AAS
A SA A SA
A b A aa
Chomsky Not Chomsky
Normal Form Normal Form
45
Convertion to Chomsky Normal Form
Example: S ABa
A aab
B Ac
Not Chomsky
Normal Form
46
Introduce variables for terminals: Ta , Tb , Tc
S ABTa
S ABa A TaTaTb
A aab B ATc
B Ac Ta a
Tb b
Tc c
47
Introduce intermediate variable: V1
S AV1
S ABTa
V1 BTa
A TaTaTb
A TaTaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
Tc c
48
Introduce intermediate variable: V2
S AV1
S AV1
V1 BTa
V1 BTa
A TaV2
A TaTaTb
V2 TaTb
B ATc
B ATc
Ta a
Ta a
Tb b
Tb b
Tc c
Tc c 49
Final grammar in Chomsky Normal Form:
S AV1
V1 BTa
A TaV2
Initial grammar
V2 TaTb
S ABa B ATc
A aab Ta a
B Ac Tb b
Tc c 50
Procedure
First remove:
Nullable variables
Unit productions
51
Then, for every symbol a :
Add production Ta a
In productions: replace a with Ta
New variable: Ta
52
Replace any production A C1C2 Cn
with A C1V1
V1 C2V2
Vn 2 Cn 1Cn
New intermediate variables:V1, V2 , ,Vn 2
53
Greinbach Normal Form
All productions have form:
A a V1V2 Vk k 0
symbol variables
54
Examples:
S cAB
S abSb
A aA | bB | b
S aa
B b
Greinbach Not Greinbach
Normal Form Normal Form
55
Conversion to Greinbach Normal Form:
S aTb STb
S abSb S aTa
S aa Ta a
Tb b
Greinbach
Normal Form
56
THANK YOU
57