0% found this document useful (0 votes)
59 views12 pages

Intermediate Code Generation Techniques

Module 5 covers intermediate code generation in compiler design, detailing techniques for translating high-level code into a simpler, machine-independent form. It discusses various representations such as postfix notation and three-address code, along with their advantages and disadvantages, including improved optimization and portability. The module also touches on basic code generation techniques and optimization methods like Directed Acyclic Graphs and Peephole Optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views12 pages

Intermediate Code Generation Techniques

Module 5 covers intermediate code generation in compiler design, detailing techniques for translating high-level code into a simpler, machine-independent form. It discusses various representations such as postfix notation and three-address code, along with their advantages and disadvantages, including improved optimization and portability. The module also touches on basic code generation techniques and optimization methods like Directed Acyclic Graphs and Peephole Optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Module 5

Code generation: Intermediate code and data structures for code generation,
Basic code generation techniques, code generation of data structure references,
Code generation of control statement and logical expressions, code generation
of procedure and functions calls, code generation in commercial compilers,
code optimization techniques and data flow equation.
Intermediate code and data structures for code generation

Intermediate Code Generation is a stage in the process of compiling a program,


where the compiler translates the source code into an intermediate
representation.

This representation is not machine code but is simpler than the original high-
level code. Here’s how it works:

 Translation: The compiler takes the high-level code (like C or Java) and
converts it into an intermediate form, which can be easier to analyze and
manipulate.

 Portability: This intermediate code can often run on different types of


machines without needing major changes, making it more versatile.

 Optimization: Before turning it into machine code, the compiler can


optimize this intermediate code to make the final program run faster or
use less memory.
If we generate machine code directly from source code then for n target
machine we will have optimizers and n code generator but if we will have a
machine-independent intermediate code, we will have only one optimizer.

Intermediate code can be either language-specific (e.g., Bytecode for Java) or


language. independent (three-address code). The following are commonly used
intermediate code representations:

Postfix Notation

 Also known as reverse Polish notation or suffix notation.

 In the infix notation, the operator is placed between operands, e.g., a +


b. Postfix notation positions the operator at the right end, as in ab +.

 For any postfix expressions e1 and e2 with a binary


operator (+) , applying the operator yields e1e2+.

 Postfix notation eliminates the need for parentheses, as the operator’s


position and arity allow unambiguous expression decoding.

 In postfix notation, the operator consistently follows the operand.

Example 1: The postfix representation of the expression (a + b) * c is : ab + c *


Example 2: The postfix representation of the expression (a – b) * (c + d) + (a –
b) is : ab – cd + *ab -+

Three-address codes

Three-address code is a sequence of statements of the general form A := B op C, where A,


B, C are either programmer defined names, constants or compiler-generated temporary
names; op stands for an operation which is applied on A, [Link] simple words, a code having
at most three addresses in a line is called three address codes.
 Example:

 (a+b)*(a+b+c)
 The three address code for above expression is:

 t1=a+b
 t2=t1+c
 t3=t1*t2
In compiler design the most popular intermediate code representation is Three-address
code. It is globally accepted and is most widely used. There are a lot of three-address
statements. All the complex three-address statements are generally a combination of
simpler three-address statements.

These statements come under following seven categories and can be called as building
block for three-address statements-
Statement Meaning

X = Y op Z Binary Operation

X= op Z Unary Operation

X=Y Assignment

if X(rel op)Y goto L Conditional Goto

goto L Unconditional Goto

A[i] = X
Y= A[i] Array Indexing

P = addr X
Y = *P
*P = Z Pointer Operations

 There are 3 ways to represent a Three-Address Code in compiler design:


i) Quadruples
ii) Triples
iii) Indirect Triples
Read more: Three-address code

1. Quadruple

A quadruple is a record structure with four fields:

 Operator: The operation to be performed (e.g., +, -, *, =, etc.).

 Argument 1: The first operand.

 Argument 2: The second operand.

 Result: The location (e.g., a temporary variable) where the result is


stored.

2. Triples

A triple is a representation where no explicit field for the Result is


provided. Instead, the results of intermediate calculations are referred to
by their position (index) in the list of triples.

Advantages:

 Saves space since no separate field is needed for results.

 Results are indexed directly.

3. Indirect Triples

An indirect triple uses an additional array (or table) of pointers to refer to


triples. This format allows instructions to be rearranged easily, as the array of
pointers can be updated without modifying the triples themselves.
 Problem-01:

 Translate the following expression to quadruple, triple and indirect triple-
a+bxc/e↑f+bxa
Solution-
Three Address Code for the given expression is-

T1 = e ↑ f
T2 = b x c
T3 = T2 / T1
T4 = b x a
T5 = a + T3
T6 = T5 + T4

Now, we write the required representations-

Quadruple Representation-

Location Op Arg1 Arg2 Result

(0) ↑ e f T1

(1) X b c T2

(2) / T2 T1 T3

(3) X b a T4

(4) + a T3 T5

(5) + T5 T4 T6
Triple Representation-

Location Op Arg1 Arg2

(0) ↑ e F

(1) x b C

(2) / (1) (0)

(3) x b A

(4) + a (2)

(5) + (4) (3)

Indirect Triple Representation-

Statement

35 (0)

36 (1)

37 (2)

38 (3)

39 (4)

40 (5)

Location Op Arg1 Arg2

(0) ↑ e F
(1) x b E

(2) / (1) (0)

(3) x b a

(4) + a (2)

(5) + (4) (3)

Syntax Tree

 A syntax tree serves as a condensed representation of a parse tree.

 The operator and keyword nodes present in the parse tree undergo a
relocation process to become part of their respective parent nodes in the
syntax tree. the internal nodes are operators and child nodes are operands.

 Creating a syntax tree involves strategically placing parentheses within


the expression. This technique contributes to a more intuitive
representation, making it easier to discern the sequence in which
operands should be processed.

The syntax tree not only condenses the parse tree but also offers an
improved visual representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)

Advantages of Intermediate Code Generation

 Easier to Implement: Intermediate code generation can simplify the


code generation process by reducing the complexity of the input code,
making it easier to implement.

 Facilitates Code Optimization: Intermediate code generation can enable


the use of various code optimization techniques, leading to improved
performance and efficiency of the generated code.
 Platform Independence: Intermediate code is platform-independent,
meaning that it can be translated into machine code or bytecode for any
platform.

 Code Reuse: Intermediate code can be reused in the future to generate


code for other platforms or languages.

 Easier Debugging: Intermediate code can be easier to debug than


machine code or bytecode, as it is closer to the original source code.

Disadvantages of Intermediate Code Generation

 Increased Compilation Time: Intermediate code generation can


significantly increase the compilation time, making it less suitable for
real-time or time-critical applications.

 Additional Memory Usage: Intermediate code generation requires


additional memory to store the intermediate representation, which can be
a concern for memory-limited systems.

 Increased Complexity: Intermediate code generation can increase the


complexity of the compiler design, making it harder to implement and
maintain.

 Reduced Performance: The process of generating intermediate code can


result in code that executes slower than code generated directly from the
source code.

Conclusion

In conclusion, Intermediate Code Generation is a important step in


compiler design that simplifies the translation of high-level programming
languages into machine code. By creating an intermediate representation,
compilers can analyze and optimize code more effectively, ensuring that
programs run efficiently on various hardware. This approach enhances
portability and allows for improvements in performance. Overall,
Intermediate Code Generation plays a key role in making programming
easier and more efficient for developers.

Basic code generation techniques

Code generation can be considered as the final phase of compilation. Through


post code generation, optimization process can be applied on the code, but that
can be seen as a part of code generation phase itself.

Directed Acyclic Graph

Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see
the flow of values flowing among the basic blocks, and offers optimization too. DAG
provides easy transformation on basic blocks. DAG can be understood here:

 Leaf nodes represent identifiers, names or constants.

 Interior nodes represent operators.

 Interior nodes also represent the results of expressions or the identifiers/name where
the values are to be stored or assigned.

 Example:
 t0 = a + b
 t1 = t0 + c

[t0 = a + b]
[t1 = t0 + c]

[d = t0 + t1]
 d = t0 + t1
Peephole Optimization
This optimization technique works locally on the source code to transform it
into an optimized code. By locally, we mean a small portion of the code block
at hand. These methods can be applied on intermediate codes as well as on
target codes. A bunch of statements is analyzed and are checked for the
following possible optimization:

Redundant instruction elimination


At source code level, the following can be done by the user:

int add_ten(int int add_ten(int int add_ten(int int add_ten(int


x) x) x) x)
{ { { {
int y, z; int y; int y = 10; return x + 10;
y = 10; y = 10; return x + y; }
z = x + y; y = x + y; }
return z; return y;
} }

You might also like