0% found this document useful (0 votes)
22 views57 pages

Understanding Context-Free Grammars

Chapter Three discusses Context Free Languages (CFL) and their defining characteristics, including context-free grammars (CFG) which consist of terminals, non-terminals, productions, and a start symbol. It covers derivation trees, parse trees, ambiguity in grammars, and methods for simplifying CFGs, such as removing nullable variables and unit productions. The chapter also introduces normal forms for CFGs, particularly Chomsky Normal Form, and outlines procedures for converting grammars to these forms.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views57 pages

Understanding Context-Free Grammars

Chapter Three discusses Context Free Languages (CFL) and their defining characteristics, including context-free grammars (CFG) which consist of terminals, non-terminals, productions, and a start symbol. It covers derivation trees, parse trees, ambiguity in grammars, and methods for simplifying CFGs, such as removing nullable variables and unit productions. The chapter also introduces normal forms for CFGs, particularly Chomsky Normal Form, and outlines procedures for converting grammars to these forms.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter Three

Context Free Languages

1
Context free grammar (CFG)

• A context-free grammar is a specification for the syntactic


structure of a programming language.
• Context-free grammar has 4-tuples:
G = (V, T, P, S) where
– T is a finite set of terminals (a set of tokens)
– V is a finite set of non-terminals (syntactic variables)
– P is a finite set of productions of the form
A→α where A is non-terminal and
α is a strings of terminals and non-terminals (including the empty
string)
 S ∈ V is a designated start symbol (one of the non- terminal
symbols)

2
Context-Free Grammars (CFG)

There is a finite set of symbols that form the strings, i.e. there is a
finite alphabet. The alphabet symbols are called terminals

• There is a finite set of variables, sometimes called non


terminals or syntactic categories. Each variable represents a
language (i.e. a set of strings).

• One of the variables is the start symbol. Other variables may


exist to help define the language.

3
There is a finite set of productions or production rules that
represent the recursive definition of the language.
production is defined:
[Link] a single variable that is being defined to the left of the
production
[Link] the production symbol
[Link] a string of zero or more terminals or variables, called the
body of the production. To form strings we can substitute each
variable’s production in for the body where it appears.

4
• Definition: A CFG, G=(V, T, P, S) with productions of the
form A  β, A Є V, β Є (VUT)*.
• A language generated from CFG is called Context Free
Language (CFL).
• Ex.
a) S  aB b) S  aB|A
B  bA|b A  aA|a|CBA
Aa Bλ
Cc

5
Derivation tree
•Derivation tree is sequence of productions rules.
• It is used to get i/p string.
• The root of the tree is S and x is the collection of leaves from
left to right.
•Left most derivation: employs the reduction of the left most
non-terminal
•Right most derivation: employs the reduction of the right most
non-terminal

6
Parse tree
• A parse tree is a graphical representation of a derivation.
• It filters out the order in which productions are applied to
replace non-terminals.

• A parse tree corresponding to a derivation is a labeled tree in


which:
• the interior nodes are labeled by non-terminals,
• the leaf nodes are labeled by terminals, and
• the children of each internal node represent the
replacement of the associated non-terminal in one step
of the derivation.

7
Parse tree and Derivation

Grammar E  E + E | E  E | ( E ) | - E | id
Lets examine this derivation:
E  -E  -(E)  -(E + E)  -(id + id)

E E E E E

- E - E - E - E

( E ) ( E ) ( E )

E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree

8
Ambiguity
•If a derivation of a string x has two different left
most or right most derivations, then the grammar
is said to be ambiguous. Otherwise it is
unambiguous.
• a grammar is ambiguous if it can produce more
than one parse tree for a particular sentence( a set
of terminals).

9
Ambiguity: example
E  E + E | E  E | ( E ) | - E | id
Construct parse tree for the expression: id + id  id

E E E E

E + E E + E E + E

E  E id E  E

id id
E E E E

E  E E  E E  E

E + E E + E id
Which parse tree is correct?
id id
10
Ambiguity: example…
E  E + E | E  E | ( E ) | - E | id

Find a derivation for the expression: id + id  id


E
According to the grammar, both are correct.
E + E

id E  E
A grammar that produces more than one
id id
parse tree for any input sentence is said
E
to be an ambiguous grammar.
E * E

E + E id

id id
11
1. G2 = (V, T, P, S) with productions:
S  SbS|ScS|a
let x = abaca Є L(G2)
a) find a left most and right most
derivations for x
b) draw the parse tree for x
2. Is G2 ambiguous?

12
Con..
1. G1 = (V, T, P, S) with productions:
S  AB
A  aA|a
B  bB|b
let x = aaabbb
a) find a left most and right most
derivations for x
b) draw the parse tree for x

13
Parsing

S ➞ XY
X ➞ XA | a | b
Y ➞ AY | a
A➞a

Does this grammar generate “baaa”?

14
Parsing Arithmetic Expression
• Consider the following grammar:
ET|E+T|E–T
T  F | T * F | T/F
F  a | b | c | (E)

Draw parse trees for


a) a*b+c b) a+b*c c) (a+b)*c d) a-b-c

15
CFG and programming languages
• One of the most important uses of the theory of formal
languages is in the definition of programming languages
and in the construction of interpreters and compilers for
them.
• regular languages helps us to model simple patterns in
programming languages, context free languages used to
model complex aspects.
• programming language that can be modeled by context-
free grammar is referred to as the syntax.

16
Simplifications
of
Context-Free Grammars

17
Simplifying Context-Free Grammars
• Variables in CFGs can often be eliminated
– If they are not recursive, you can substitute their rules
throughout.

18
A Substitution Rule

Equivalent
grammar
S  aB
S  aB | ab
A  aaA
Substitute A  aaA
A  abBc B b A  abBc | abbc
B  aA
B  aA
B b
19
A Substitution Rule
S  aB | ab
A  aaA
A  abBc | abbc
B  aA
Substitute
B  aA
S  aB | ab | aaA
A  aaA Equivalent
A  abBc | abbc | abaAc grammar
20
In general:
A  xBz

B  y1

Substitute
B  y1

equivalent
A  xBz | xy1z grammar
21
Nullable Variables
• Any variable that can eventually
terminate in the empty string is said to
be nullable.
– Note: a variable may be indirectly
nullable
– In general: if A ➞ V1V2…Vn, and all
the Vi are nullable, then A is also
nullable.
22
Nullable Variables

  production : A 

Nullable Variable: A  

23
Removing Nullable Variables

Example Grammar:

S  aMb
M  aMb
M

Nullable variable

24
Final Grammar

S  aMb
S  aMb
Substitute S  ab
M  aMb M
M  aMb
M
M  ab

25
Unit-Productions

Unit Production: A B

(a single variable in both sides)

26
Removing Unit Productions

Observation:

A A

Is removed immediately

27
Example Grammar:

S  aA
A a
A B
B A
B  bb

28
S  aA
S  aA | aB
A a
Substitute A a
A B A B B  A| B
B A
B  bb
B  bb

29
S  aA | aB S  aA | aB
A a Remove A a
B  A| B B B B A
B  bb B  bb

30
S  aA | aB
S  aA | aB | aA
A a Substitute
B A A a
B A
B  bb
B  bb

31
Remove repeated productions

Final grammar
S  aA | aB | aA S  aA | aB
A a A a
B  bb B  bb

32
Useless Productions

S  aSb
S
S A
A  aA Useless Production

Some derivations never terminate...

S  A  aA  aaA    aa  aA  
33
Another grammar:

S A
A aA
A 
B bA Useless Production
Not reachable from S

34
In general: contains only
terminals
if S    xAy    w

w L(G )

then variable A is useful

otherwise, variable A is useless

35
A production A  x is useless
if any of its variables is useless

S  aSb
S Productions
Variables S  A useless
useless A  aA useless
useless B  C useless

useless C D useless
36
Removing Useless Productions

Example Grammar:

S  aS | A | C
A a
B  aa
C  aCb

37
First: find all variables that can produce
strings with only terminals

S  aS | A | C Round 1: { A, B}
A a S A
B  aa
C  aCb Round 2: { A, B, S }

38
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S  aS | A | C
A a S  aS | A
B  aa A a
C  aCb B  aa
Remove useless productions
39
Second: Find all variables
reachable from S

Use a Graph

S  aS | A
A a S A B
B  aa not
reachable

40
Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S  aS | A
S  aS | A
A a
A a
B  aa

Remove useless productions

41
Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

42
Normal Forms
for
Context-free Grammars

43
Chomsky Normal Form

Each productions has form:

A  BC or A a

variable variable terminal

44
Examples:

S AS S AS
S a S AAS
A SA A SA
A b A aa
Chomsky Not Chomsky
Normal Form Normal Form

45
Convertion to Chomsky Normal Form

Example: S  ABa
A  aab
B  Ac

Not Chomsky
Normal Form

46
Introduce variables for terminals: Ta , Tb , Tc

S  ABTa
S  ABa A  TaTaTb
A  aab B  ATc
B  Ac Ta  a
Tb  b
Tc  c
47
Introduce intermediate variable: V1

S  AV1
S  ABTa
V1  BTa
A  TaTaTb
A  TaTaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c
48
Introduce intermediate variable: V2
S  AV1
S  AV1
V1  BTa
V1  BTa
A  TaV2
A  TaTaTb
V2  TaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c 49
Final grammar in Chomsky Normal Form:
S  AV1
V1  BTa
A  TaV2
Initial grammar
V2  TaTb
S  ABa B  ATc
A  aab Ta  a
B  Ac Tb  b
Tc  c 50
Procedure

First remove:

Nullable variables

Unit productions

51
Then, for every symbol a :

Add production Ta  a

In productions: replace a with Ta

New variable: Ta
52
Replace any production A  C1C2 Cn

with A  C1V1
V1  C2V2

Vn 2  Cn 1Cn

New intermediate variables:V1, V2 ,  ,Vn  2


53
Greinbach Normal Form

All productions have form:

A  a V1V2 Vk k 0

symbol variables

54
Examples:

S  cAB
S  abSb
A  aA | bB | b
S  aa
B b

Greinbach Not Greinbach


Normal Form Normal Form

55
Conversion to Greinbach Normal Form:

S  aTb STb
S  abSb S  aTa
S  aa Ta  a
Tb  b
Greinbach
Normal Form
56
THANK YOU

57

You might also like