Intermediate Code Generation in Compilers
Intermediate Code Generation in Compilers
Rupesh Nasre.
Machine-Independent
Machine-Independent
Lexical
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
Intermediate representation
Backend
Token stream
Frontend
Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator
Machine-Dependent
Machine-Dependent
Semantic
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
Intermediate
Intermediate Symbol
Code
CodeGenerator
Generator Table
2
Intermediate representation
Agenda
●
IR forms
– 3AC, 2AC, 1AC
– SSA
●
IR generation
– Types
– Declarations
– Assignments
– Conditionals
– Loops
3
Role of IR Generator
●
To act as a glue between front-end and
backend (or source and machine codes).
●
To lower abstraction from source level.
– To make life simple.
●
To maintain some high-level information.
– To keep life interesting.
●
Complete some syntactic checks, perform more
semantic checks.
– e.g. break should be inside loop or switch only.
4
Representations
●
Syntax Trees
++
– Maintains structure of the construct
** 44
– Suitable for high-level representations 55
33
●
Three-Address Code
– Maximum three addresses
t1
t1==33**55
in an instruction t2
t2==t1
t1++44
3AC
– Suitable for both high and
low-level representations mult
mult3,
3,55 2AC
add
add44
●
Two-Address Code
push
push33
●
… push
push55
1AC
mult or
mult
– e.g. Java push
push44
stack
5
machine
add
add
Syntax Trees and DAGs
a + a * (b – c) + (b – c) * d ●
Trees represent replicated
expressions.
●
Cannot optimize processing.
++ ++ ●
For optimizations, the
structure changes to a DAG.
++ ** ++ **
aa ** -- dd ** dd
aa -- bb cc aa --
bb cc bb cc
Assignment statement: a = b * - c + b * - c;
Assignment statement: a = b * - c + b * - c;
op arg1 arg2
(0) (2) 0 minus c
(1) (3) 1 * b (0)
(2) (0) 2 minus c
(3) (1) 3 * b (2)
(4) (4) 4 + (1) (3)
(5) (5) 5 = a (4)
10
Indirect triples can be reordered
SSA
●
Classwork: Allocate registers to variables.
●
Some observations
pp==aa++bb
– Definition of a variable kills qq==pp––cc
its previous definition. pp==qq**dd
pp==ee––pp
– A variable's use refers to qq==pp++qq
its most recent definition.
– A variable holds a register for a r1 r1
a long time, if it is live longer. b r2 r2
p r3 r1, r2, r2
pp1 ==aa++bb
1 c r4 r2
qq1 ==pp1 ––cc q r5
1 1 r1, r1
pp2 ==qq1 **dd d r6 r2
2 1
pp3 ==ee––pp2 e r7 r3
3 2 11
qq2 ==pp3 ++qq1 Can
2 3 1 Canr3
r3be
beavoided?
avoided?
SSA
●
Static Single Assignment
– An IR
– Each definition refers to a different variable (instance)
ifif(flag)
(flag)
ifif(flag)
(flag) xx1 ==-1;
-1; flag
flag
1
xx==-1;
-1; else
else
else
else xx2 ==1;
1;
xx==1;
1; 2
xx3 ==Φ(xΦ(x ,,xx2))
yy==xx**a; a; 3 11 2 xx==-1
-1 xx==11
yy==xx3 **a; a;
3
●
A phi node is an abstract node. yy==xx**aa
15
SDT Applications
●
Finding type expressions
– int a[2][3] is array of 2 arrays of 3 integers.
– in functional style: array(2, array(3, int))
Production Semantic Rules
array
array T → B id C T.t = C.t
C.i = B.t
B → int B.t = int
22 array
array
B → float B.t = float
C → [ num ] C1 C.t = array(num, C1.t)
33 int
int C1.i = C.i
C→ε C.t = C.i
Classwork:
Classwork:Write
Writeproductions
productionsand
andsemantic
semanticrules
rulesfor
forcomputing
computing
types
typesand
andfinding
findingtheir
theirwidths
widthsininbytes.
bytes. 16
SDT Applications Width
Widthcan
canalso
alsobe
be
computed
computedusing
using
S-attributed
S-attributedSDT.
SDT.
●
Finding type expressions
– int a[2][3] is array of 2 arrays of 3 integers.
– in functional style: array(2, array(3, int))
Production Semantic Rules
array
array T → B id C T.t = C.t; [Link] = [Link];
C.i = B.t; [Link] = [Link];
B → int B.t = int; [Link] = 4;
22 array
array
B → double B.t = double; [Link] = 8;
C → [ num ] C1 C.t = array(num, C1.t);
33 int
int C1.i = C.i; [Link] = [Link] * [Link];
C→ε C.t = C.i; [Link] = [Link];
Classwork:
Classwork:Write
Writeproductions
productionsand
andsemantic
semanticrules
rulesfor
forcomputing
computing
types
typesand
andfinding
findingtheir
theirwidths
widthsininbytes.
bytes. 17
Types
●
Types encode:
– Storage requirement (number of bits)
– Storage interpretation (meaning)
– Valid operations (manipulation)
For instance,
– 1100..00 may be char[4], int, float, int[1], …
18
Type Equivalence Compare
Compareagainst
against
assembly
assemblycode.
code.
●
Two types are structurally equivalent
iff one of the following conditions is true.
[Link] are the same basic type.
Name
[Link] are formed by applying the same equivalence
construction to structurally equivalent types.
[Link] is a type name that denotes the other. typedef
int a[2][3] is not equivalent to int b[3][2];
int a is not equivalent to char b[4];
struct {int, char} is not equivalent to struct {char, int};
19
int * is not equivalent to void *.
Type Equivalence
●
Name equivalence is easy to check, but is strict.
typedef int NumCarsType;
typedef int NumTrucksType;
NumCarsType ncars = 2;
NumTrucksType ntrucks = 2;
if (ncars == ntrucks): Type error
●
Structural equivalence permits this, but then:
DoublyLinkedListNode == BSTNode: No type error
●
A language may follow different schemes for
different types.
– C follows structural equivalence for primitives, but name
equivalence for structures.
●
May permit char [32] to be type-equiv. to char [24]
20
26
Expressions
●
We have studied expressions at length.
●
To generate 3AC, we will use our grammar and
its associated SDT to generate IR.
●
For instance, a = b + -c would be converted to
t1 = minus c
t2 = b + t1
a = t2
27
Array Expressions
● For instance, create IR for c + a[i][j].
● This requires us to know the types of a and c.
● Say, c is an integer (4 bytes) and a is int [2][3].
●
Then, the IR is
t1
t1 == ii ** 12
12 ;; 33 ** 44 bytes
bytes
t2
t2 == jj ** 44 ;; 11 ** 44 bytes
bytes
t3
t3 == t1
t1 ++ t2t2 ;; offset
offset from
from aa
t4
t4 == a[t3]
a[t3] ;; assuming
assuming base[offset]
base[offset] is
is present
present in
in IR.
IR.
t5
t5 == cc ++ t4
t4 28
Array Expressions
● a[5] is a + 5 * sizeof(type)
● a[i][j] for a[3][5] is
a + i * 5 * sizeof(type) + j * sizeof(type)
●
This works when arrays are zero-indexed.
●
Classwork: Find array expression to be generated
for accessing a[i][j][k] when indices start with low,
and array is declared as type a[10][20][30].
●
Classwork: What all computations can be
performed at compile-time?
● Classwork: What happens for malloc’ed arrays?
29
Array Expressions
void
voidfun(int
fun(inta[
a[][][])]){{ We view an array to be a D-
a[0][0]
a[0][0]==20;
20; dimensional matrix. However, for
}} the hardware, it is simply single
void
voidmain()
main(){{ dimensional.
int
inta[5][10];
a[5][10];
fun(a);
fun(a);
printf("%d\n",
printf("%d\n",a[0][0]);
a[0][0]);
}}
ERROR: type of formal parameter 1 is incomplete
●
How to optimize computation of the offset for a
long expression a[i][j][k][l] with declaration as
int a[w4][w3][w2][w1]?
– i * w3 * w2 * w1 + j * w2 * w1 + k * w1 + l
– Use Horner's rule: ((i * w3 + j) * w2 + k) * w1 + l 30
Array Expressions
●
In C, C++, Java, and so ●
In Fortran, we use
far, we have used row- column-major storage
major storage. format.
– All elements of a row are – each column is
stored together. stored together.
0,0 0,2
0,0 2,0
1,2
3,2
0,3 1,3 2,3
31
IR for Array Expressions
●
L → id [E] | L [E] // maintain three attributes: type, addr and base.
sizeof([Link]) may be
L → id [ E ] { [Link] = [Link]; part of nextwidth()
[Link] = new Temp(); or can be
a[i]
a[i]in
ina[i][j][k]
a[i][j][k] // ignore [Link]() explicitly added.
gen([Link] '=' [Link] '*' [Link]()); }
L → L1 [ E ] { [Link] = [Link];
addrisissyntax
addr syntaxtree
treenode,
node,
L[j] t = new Temp();
L[j]in
ina[i][j][k]
a[i][j][k] [Link] = new Temp(); baseisisthe
base thearray
arrayaddress.
address.
then
then gen(t '=' [Link] '*' [Link]());
L[k]
L[k] gen([Link] '=' [Link] '+' t); }
E → id { [Link] = [Link]; }
E→L { [Link] = new Temp();
gen([Link] '=' [Link] '[' [Link] ']'); }
E → E1 + E2 { [Link] = new Temp();
gen([Link] '=' [Link] + [Link]); }
S → id = E { gen([Link] '=' [Link]); }
S→L=E { gen([Link] '[' [Link] ']' '=' [Link]); } 32
t1
t1 == ii ** 12
12 ;; 33 ** 44 bytes
bytes
t2
t2 == jj ** 44 ;; 11 ** 44 bytes
bytes
t3
t3 == t1
t1 ++ t2t2 ;; offset
offset from
from aa
t4
t4 == a[t3]
a[t3] ;; assuming
assuming base[offset]
base[offset] is
is present
present in
in IR.
IR.
t5
t5 == cc ++ t4
t4
L → id [ E ] { [Link] = [Link];
[Link] = new Temp();
gen([Link] '=' [Link] '*' [Link]); }
L → L1 [ E ] { [Link] = [Link];
addrisissyntax
addr syntaxtree
treenode,
node,
t = new Temp();
[Link] = new Temp(); baseisisthe
base thearray
arrayaddress.
address.
gen(t '=' [Link] '*' [Link]);
gen([Link] '=' [Link] '+' t); }
E → id { [Link] = [Link]; }
E→L { [Link] = new Temp();
gen([Link] '=' [Link] '[' [Link] ']'); }
E → E1 + E2 { [Link] = new Temp();
gen([Link] '=' [Link] + [Link]); }
S → id = E { gen([Link] '=' [Link]); } 33
int
int a[1][2],
a[1][2], b[3][4],
b[3][4], c[5][6];
c[5][6]; int
int a[1][2],
a[1][2], b[3][2],
b[3][2], c[5][2];
c[5][2];
...
... ...
...
printMatrix(a);
printMatrix(a); printMatrix(a);
printMatrix(a);
printMatrix(b);
printMatrix(b); printMatrix(b);
printMatrix(b);
printMatrix(c);
printMatrix(c); printMatrix(c);
printMatrix(c);
int
int a[1][2];
a[1][2];
... Okay, the dimensions could be
... hard-coded by the programmer.
printMatrix(a);
printMatrix(a); 34
Type Qualifiers
●
const: no assignment post initialization
– via pointers?
●
static: can be within a function or outside
– Local: global lifetime, local scoping
– Global: local to a file
●
register: frequent use hinted by user
– not recommended
●
extern: defined in a different compilation unit
●
volatile: disable memory optimizations
35
– useful in multi-threaded programs
Language Constructs
to generate IR
●
Declarations
– Types (int, int [], struct, int *)
– Storage qualifiers (array expressions, const, static)
●
Assignments: LHS = RHS
● Conditionals, switch
●
Loops
●
Function calls, definitions
36
Control Flow
●
Conditionals
– if, if-else, switch
●
Loops
– for, while, do-while, repeat-until
●
We need to worry about
– Boolean expressions
– Jumps (and labels)
37
Control-Flow – Boolean Expressions
●
B → B || B | B && B | !B | (B) | E relop E | true | false
●
relop → < | <= | > | >= | == | !=
●
What is the associativity of ||?
●
What is its precedence over &&?
● How to optimize evaluation of (B1 || B2) and (B3 && B4)?
– Short-circuiting: if (x < 10 && y < 20) ...
– Classwork: Write a C program to find out if C uses
short-circuiting or not.
●
while (p && p->next) …
●
if (x || ++x) …
38
●
x = (f() && g());
Control-Flow – Boolean Expressions
●
Source code:
– if (x < 100 || x > 200 && x != y) x = 0;
●
IR: without short-circuit with short-circuit
b1
b1 == xx << 100
100 b1
b1 == xx << 100
100
b2
b2 == xx >> 200
200 iftrue
iftrue b1
b1 goto
goto L2
L2
b3
b3 == xx !=
!= yy b2
b2 == xx >> 200
200
iftrue
iftrue b1
b1 goto
goto L2
L2 iffalse
iffalse b2
b2 goto
goto L3
L3
iffalse
iffalse b2
b2 goto
goto L3
L3 b3
b3 == xx !=
!= yy
iffalse
iffalse b3
b3 goto
goto L3
L3 iffalse
iffalse b3
b3 goto
goto L3
L3
L2:
L2: L2:
L2:
xx == 0;
0; xx == 0;
0;
L3:
L3: L3:
L3: 39
...
... ...
...
3AC for Boolean Expressions
[Link]
.true==[Link];
[Link];
B → B1 || B2 1
[Link]
.false==newLabel();
newLabel();
// attributes: true, false, code 1
B → B1 && B2 [Link]
.true==newLabel();
newLabel();
1
[Link]
.false==[Link];
[Link];
1
[Link]
.true==[Link];
[Link];
2
[Link]
.false==[Link];
[Link];
2
[Link]
[Link]==[Link]
.code++
label(B
label([Link])
.true)++
[Link];
.code;
2
40
3AC for Boolean Expressions
[Link]
.true==[Link];
[Link];
B → !B1 1
[Link]
.false==[Link];
[Link];
1
[Link]
[Link]==[Link];
.code;
1
B → E1 relop E2 [Link]
[Link]==[Link]
.code++[Link]
.code++
gen('if'
gen('if'[Link]
.addrrelop
[Link]
.addr
'goto'
'goto'[Link])
[Link])++
gen('goto'
gen('goto'[Link]);
[Link]);
B → true [Link]
[Link]==gen('goto'
gen('goto'[Link]);
[Link]);
B → false [Link]
[Link]==gen('goto'
gen('goto'[Link]);
[Link]);
41
SDD for while
S → while ( C ) S1 L1
L1 ==newLabel();
newLabel();
L2
L2 ==newLabel();
newLabel();
// [Link], [Link] [Link]
.next ==L1;
L1;
1
// [Link], [Link], [Link] [Link]
[Link] ==[Link];
[Link];
[Link]
[Link] ==L2;
L2;
[Link]
[Link] ==“label”
“label”++L1
L1++
[Link]
[Link]++
”label”
”label”++L2
L2++
[Link]
.code++
1
gen('goto'
gen('goto'L1);
L1);
42
3AC for if / if-else
[Link]
[Link]==newLabel();
newLabel();
S → if (B) S1
[Link]
[Link]==[Link]
.next==[Link];
[Link];
[Link]
[Link]==[Link]
[Link]++
label([Link])
label([Link])++
[Link];
.code;
1
[Link]
[Link]==newLabel();
newLabel();
S → if (B) S1 else S2 [Link]
[Link]==newLabel();
newLabel();
[Link]
.next==[Link]
.next==[Link];
[Link];
1
[Link]
[Link]==[Link]
[Link]++
label([Link])
label([Link])++[Link]
.code++
gen('goto'
gen('goto'[Link])
[Link])++
label([Link])
label([Link])++[Link];
.code;
43
Control-Flow – Boolean Expressions
●
Source code: if (x < 100 || x > 200 && x != y) x = 0;
without optimization with short-circuit
b1
b1 == xx << 100
100
b2
b2 == xx >> 200
200
b3
b3 == xx !=
!= yy b1
b1 == xx << 100
100
iftrue
iftrue b1
b1 goto
goto L2
L2 iftrue
iftrue b1
b1 goto
goto L2
L2
goto
goto L0L0 b2
b2 == xx >> 200
200
L0:
L0: iffalse
iffalse b2
b2 goto
goto L3
L3
iftrue
iftrue b2
b2 goto
goto L1
L1 b3
goto b3 == xx !=
!= yy
goto L3L3 iffalse
L1: iffalse b3
b3 goto
goto L3
L3
L1: L2:
iftrue
iftrue b3
b3 goto
goto L2
L2 L2:
goto xx == 0;
0;
goto L3L3
L2: L3:
L3:
L2:
xx == 0;
0; ...
...
L3:
L3: Avoids redundant gotos.
44
...
...
Homework
●
Write SDD to generate 3AC for for.
– for (S1; B; S2) S3
●
Write SDD to generate 3AC for repeat-until.
– repeat S until B
45
Backpatching
●
if (B) S required us to pass label while
evaluating B.
– This can be done by using inherited attributes.
●
Alternatively, we could leave the label
unspecified now...
– … and fill it in later.
●
Backpatching is a general concept for one-pass
code generation
B → true [Link]
[Link]==gen('goto
gen('goto–');
–');
B → B1 || B2 backpatch(B
backpatch([Link]);
.false);
1
...
... 46
break and continue
●
break and continue are disciplined / special gotos.
●
Their IR needs
– currently enclosing loop / switch.
– goto to a label just outside / before the enclosing block.
●
How to write the SDD to generate their 3AC?
– either pass on the enclosing block and label as an
inherited attribute, or
– use backpatching to fill-in the label of goto.
– Need additional restriction for continue.
●
Classwork: How to support break label? 47
IR for switch
●
Using nested if-else
switch(E)
switch(E) {{
●
Using a table of pairs case
case VV11:: SS11
case
case VV22:: SS22
– <Vi, Si>
……
●
Using a hash-table case
case VVn-1 :: SSn-1
n-1 n-1
default:
default: SSnn
– when i is large (say, > 10) }}
● Special case when Vis are
consecutive integrals.
– Indexed array is sufficient.
Classwork:
Classwork: Write
Write IR
IR for
for switch
switch (assume implicit break).
(assume implicit break). 48
switch(E)
switch(E) {{
case
case VV11:: SS11
case
case VV22:: SS22
……
case
case VVn-1 :: SSn-1
n-1 n-1
default:
default: SSnn
}}
49
Sequence
Sequenceofofstatements,
statements, Sequence
Sequenceofof
Sequence of values values and statements
Functions
●
Function definitions
– Type checking / symbol table entry
– Return type, argument types, void
– Stack offset for variables
– Stack offset for arguments
●
Function calls
– Push parameters
– Switch scope / push environment
– Jump to label for the function
– Switch scope / pop environment 50
– Pop parameters
Summary
●
IR forms
– 3AC, 2AC, 1AC
– SSA
●
IR generation
– Types
– Declarations
– Assignments
– Conditionals
– Loops
51