0% found this document useful (0 votes)
64 views57 pages

Logic Synthesis and Optimization Guide

Unit 3 covers logic synthesis, including synthesis overview, RTL synthesis, logic optimization, technology mapping, timing optimization, and low power synthesis. It discusses the transformation of HDL descriptions into logic gate networks, advantages of HDL synthesis, and various optimization techniques. The unit also highlights the importance of special element inferences and the synthesis procedure for combinational circuits.

Uploaded by

Shawn Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views57 pages

Logic Synthesis and Optimization Guide

Unit 3 covers logic synthesis, including synthesis overview, RTL synthesis, logic optimization, technology mapping, timing optimization, and low power synthesis. It discusses the transformation of HDL descriptions into logic gate networks, advantages of HDL synthesis, and various optimization techniques. The unit also highlights the importance of special element inferences and the synthesis procedure for combinational circuits.

Uploaded by

Shawn Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 3: Logic Synthesis

․Course contents
 Synthesis overview
 RTL synthesis
 Logic optimization
 Technology mapping
 Timing optimization
 Synthesis for low power
․Readings
 Chapter 11
 Giovanni De Micheli, “Synthesis and Optimization of
Digital Circuits”, McGraw-Hill, Inc., 1994.
 Related papers

Unit 3
1
Chang, Huang, Li, Lin, Liu

HDL Synthesis
․Logic synthesis programs transform Boolean
expressions or register-transfer level (RTL) description
(in Verilog/VHDL/C) into logic gate networks (netlist)
in a particular library.
․Advantages
 Reduce time to generate netlists
 Easier to retarget designs from one technology to
another
 Reduce debugging effort
․Requirement
 Robust HDL synthesizers

Unit 3
2
Chang, Huang, Li, Lin, Liu
Synthesis Procedure

Synthesis = Domain Translation + Optimization

Domain Optimization
translation (area, timing, power...)

+
--VHDL //Verilog
if(A=‘1’) then if(A==1)
Y<=C + D; Y=C + D;
elseif (B=‘1’) then else if(B==1)
Y<=C or D; Y=C | D;
else Y<=C; else Y=C;
endif
RTL
Behavioral domain synthesis

Structural domain

Unit 3
3
Chang, Huang, Li, Lin, Liu

Domain Translation

Consistent with data


manipulation functions

x = y op z Combinational
Circuit
Generation Initial
Input HDL 3-address optimization
Structural
Description Code (area, timing …)
Netlist
Special Element
Inferences

Consistent with special semantics

Unit 3
4
Chang, Huang, Li, Lin, Liu
Optimization
․Technology-independent optimization: logic
optimization
 Work on Boolean expression equivalent
 Estimate size based on # of literals
 Use simple delay models
․Technology-dependent optimization: technology
mapping/library binding
 Map Boolean expressions into a particular cell library
 May perform some optimizations in addition to simple mapping
 Use more accurate delay models based on cell structures

Unit 3
5
Chang, Huang, Li, Lin, Liu

Technology-Independent Logic Optimization


․Two-level: minimize the # of product terms.

․Multi-level: minimize the #'s of literals, variables.


 E.g., equations are optimized using a smaller number of literals.

․Methods/CAD tools: The Quine-McCluskey method


(exponential-time exact algorithm), Espresso (heuristics
for two-level logic), MIS (heuristics for multi-level logic),
Synopsys, etc.
Unit 3
6
Chang, Huang, Li, Lin, Liu
Technology Mapping
․Goal: translation of a technology independent
representation (e.g. Boolean networks) of a circuit into
a circuit in a given technology (e.g. standard cells) with
optimal cost
․Optimization criteria:
 Minimum area
 Minimum delay
 Meeting specified timing constraints
 Meeting specified timing constraints with minimum area
․Usage:
 Technology mapping after technology independent logic
optimization
 Technology translation
Unit 3
7
Chang, Huang, Li, Lin, Liu

Standard Cells for Design Implementation

Unit 3
8
Chang, Huang, Li, Lin, Liu
Timing Optimization
․There is always a trade-off between area and delay
․Optimize timing to meet delay spec. with minimum area

output meets
delay spec
Area start with
area-optimized

delay spec
Delay

Unit 3
9
Chang, Huang, Li, Lin, Liu

Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
10
Chang, Huang, Li, Lin, Liu
Typical Domain Translation Flow

․Translate original HDL code into 3-address format


․Conduct special element inferences before
combinational circuit generation
․Conduct special element inferences process by
process (local view)

Combinational Initial
3-address Special Element
Circuit Structure
Code Inferences
Generation Netlist

Unit 3
11
Chang, Huang, Li, Lin, Liu

Combinational Circuit Generation

․Functional unit allocation


 Straightforward mapping with 3-address code
․Interconnection binding
 Using control/data flow analysis

Unit 3
12
Chang, Huang, Li, Lin, Liu
Functional Unit Allocation
․3-address code
 x = y op z in general form
 Function unit op with inputs y and z and output x
c s
x=c+d+e; t=c+d; + t
d
if(a==b) x= e-f; x=t+e; t
y=x; s = (a==b); + x x 0 M
e
if(s) x= e-f; U x
a
== s x 1 X
3-address code y=x; b
e
- x
f
Implicit multiplexer
x y

Unit 3
13
Chang, Huang, Li, Lin, Liu

Interconnection Binding

․Need the dependency information among


functional units
 Using control/data flow analysis
 A traditional technique used in compiler design for a
variety of code optimizations
 Statically analyze and compute the set of assignments
reaching a particular point in a program

Unit 3
14
Chang, Huang, Li, Lin, Liu
Control/Data Flow Analysis
․Terminology
 A definition of a variable x
 An assignment assigns a value to the variable x

 d1 can reach d4 but cannot reach d3


 d1 is killed by d2 before reaching d3

․A definition can only be affected by those definitions being


able to reach it
․Use a set of data flow equations to compute which
assignments can reach a target assignment
/*d1*/ x=a;
/*d1*/ x = a;
if(s) begin s
/*d2*/ x = b; /*d2*/ x=b;
/*d3*/ y = x + a; /*d3*/ y=x+a;
end
Unit 3
/*d4*/ y = x; /*d4*/ y=x;
15
Chang, Huang, Li, Lin, Liu

Combinational Circuit Generation: An Example


always @ (x or a or b or c or d or s) In[d1]={d4, d5} computed by control/
begin a d1 data flow analysis
/*d1*/ x = a + b; + x
/*d2*/ if ( s ) x = c – d; b
/*d3*/ else x = x; s
/*d4*/ y = x; In[d2]={d1, d5} c d2
end c d2 -
- x d
Input HDL d 1 d4 d5
a wire y
d1 d3 0 Mux
In[d3]={*d1, d5}
+ wire
always @ (x or a or b or c or d or s) d3
b
x wire x
begin Interconnection binding
/*d1*/ x = a + b;
In[d4]={*d2, *d3, d5}
/*d2*/ if ( s ) x = c – d;
/*d3*/ else x = x; s
s
/*d4*/ x = s mux x; x 1
/*d5*/ y = x; d4
x c
0 Mux d2
end x -
d
Modified In[d5]={*d4, d5}
1 d4 y
a d1 0 Mux
3-address code d5 +
x wire y b
Final result
Functional unit allocation
Unit 3
16
Chang, Huang, Li, Lin, Liu
Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
17
Chang, Huang, Li, Lin, Liu

Special Element Inferences


․Given a HDL code at RTL, three special elements
need to be inferred to keep the special semantics
 Latch (D-type) inference
 Flip-Flop (D-type) inference
 Tri-state buffer inference
․Some simple rules are used in typical approaches
reg Q;
reg Q; reg Q; always@(D or en)
always@(D or en) always@(posedge clk) if(en) Q = D;
if(en) Q = D; Q = D; else Q = 1’bz;
Latch inferred!! Flip-flop inferred!! Tri-state buffer
inferred!!

Unit 3
18
Chang, Huang, Li, Lin, Liu
Preliminaries

․Sequential section
 Edge triggered always statement
․Combinational section
 All signals whose values are used in the always statement are
included in the sensitivity list

reg Q; reg Q;
always@(posedge clk) always@(in or en)
Q = D; if(en) Q=in;
Sequential section Combinational section
Conduct flip-flop inference Conduct latch inference

Unit 3
19
Chang, Huang, Li, Lin, Liu

Typical Latch Inference

․Conditional assignments are not completely specified


 Check if the else-clause exists
 Check if all case items exist
․Outputs conditionally assigned in an if-statement are
not assigned before entering or after leaving the if-
statement

always@(D or S) always@(S or A or B)
if(S) Q = D; begin
Infer latch Q = A; Do not infer
for Q if(S) Q = B; latch for Q
end

Unit 3
20
Chang, Huang, Li, Lin, Liu
Terminology (1/2)

․Conditional assignment
․Selector: S
․Input: D
․Output: Q
Selector

if (S)
Conditional
Q = D; assignment
Output
Input

Unit 3
21
Chang, Huang, Li, Lin, Liu

Terminology (2/2)

․A variable Q has a branch for a value of selector s


 The variable Q is assigned a value in a path going through
the branch

Q=b

if(s) Q=a; s Q=b; s

Q=a
if(s) Q=a; Q=a
Q has no branch for the Q has a branch for the
false value of the selector s false value of the selector s

Unit 3
22
Chang, Huang, Li, Lin, Liu
Rules of Latch Inference (1/2)

․Condition 1: There is no branch associated with the


output of a conditional assignment for a value of the
selector
 Output depends on its previous value implicitly

always@(s or a) if(s) Q depends on its


if(s) Q=a; previous value
Q=a; at this branch
Q=Q;

Unit 3
23
Chang, Huang, Li, Lin, Liu

Rules of Latch Inference (2/2)

․Condition 2: The output value of a conditional


assignment depends on its previous value explicitly

always@(s or z or y or a)
begin z=y; y depends on its
z = y; if(s) previous value
if(s) y=a; at this branch via
else y=z; y=a; y=z; the assignment z=y;
end

Unit 3
24
Chang, Huang, Li, Lin, Liu
Terminology

․Clocked statement: edge-triggered always statement


 Simple clocked statement
e.g., always @ (posedge clock)
 Complex clocked statement
e.g., always @ (posedge clock or posedge reset)
․Flip-flop inference must be conducted only when
synthesizing the clocked statements

Unit 3
25
Chang, Huang, Li, Lin, Liu

Infer FF for Simple Clocked Statements (1/2)

․Infer a flip-flop for each variable being assigned in


the simple clocked statement

used after defined


connected to input of x
input a, b, s, clk;
output y, w; a
X
reg x, w, y, z;
always @ (posedge clk)
begin 1
MUX

y
/* d1 */ x = a; y
b z 0
/* d2 */ if ( s ) y = x;
/* d3 */ else y = z;
/* d4 */ z = b; s
/* d5 */ w = 1'b1; 1 w
w
end used before defined
connected to output of z
Unit 3
26
Chang, Huang, Li, Lin, Liu
Infer FF for Simple Clocked Statements (2/2)
․Two post-processes
 Propagating constants
 Removing the flip-flops without fanouts

a a
X

1 1
MUX

MUX
y y
y y
b z 0 b z 0

s s
1 w 1 w
w

Unit 3
27
Chang, Huang, Li, Lin, Liu

Infer FF for Complex Clocked Statements


․The edge-triggered signal not used in the following
operations is chosen as the clock signal
․The usage of asynchronous control pins requires the
following syntactic template
 An if-statement immediately follows the always statement
 Each variable in the event list except the clock signal must be a
selective signal of the if-statements
 Assignments in the blocks B1 and B2 must be constant
assignments (e.g., x=1, etc.)

always @ (posedge clock or posedge reset or negedge set)

if(reset) begin B1 end


else if ( !set) begin B2 end
else begin B3 end
Unit 3
28
Chang, Huang, Li, Lin, Liu
Typical Tri-State Buffer Inference (1/2)

․If a data object Q is assigned a high impedance value


‘Z’ in a multi-way branch statement (if, case, ?:)
 Associated Q with a tri-state buffer
․If Q associated with a tri-state buffer has also a
memory attribute (latch, flip-flop)
 Have the Hi-Z propagation problem
 Real hardware cannot propagate Hi-Z value
 Require two memory elements for the control and the data
inputs of tri-state buffer

reg Q; En reg Q; En
always @ (En or D) always @ (posedge clk)
if(En) Q = D; D Q
if(En) Q = D; D Q
else Q = 1'bz; else Q = 1'bz;

Unit 3
29
Chang, Huang, Li, Lin, Liu

Typical Tri-State Buffer Inference (2/2)

․It may suffer from mismatches between synthesis and


simulation
 Process by process
 May incur the Hi-Z propagation problem

reg QA, QB;


always @ (En or D) En cannot propagate Hi-Z
if(En) QA = D;
else QA = 1'bz; in real hardware
D QA QB

always @ (posedge clk)


QB = QA;
assignment can pass Hi-Z
to QB in simulation

Unit 3
30
Chang, Huang, Li, Lin, Liu
Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
31
Chang, Huang, Li, Lin, Liu

Two-Level Logic Optimization


․Two-level logic optimization
 Key technique in logic optimization
 Many efficient algorithms to find a near minimal representation
in a practical amount of time
 In commercial use for several years
 Minimization criteria: number of product terms
․Example: F = XYZ + XYZ + XYZ + XYZ+XYYZ

F = XY + YZ

․Approaches to simplify logic functions:


 Karnaugh maps [Kar53]
 Quine-McCluskey [McC56]
Unit 3
32
Chang, Huang, Li, Lin, Liu
Boolean Functions
․B = {0,1}, Y = {0,1,D}
․A Boolean function f: Bm  Yn
 f = x1 x 2 + x1 x 3 + x2 x 3 + x1 x 2 + x2 x 3 + x1 x 3
․Input variables: x1, x2, …
․The value of the output partitions Bm into three sets
 the on-set
 the off-set
 the dc-set (don’t-care set)

Unit 3
33
Chang, Huang, Li, Lin, Liu

Minterms and Cubes


․A minterm is a product of all input variables or their
negations.
 A minterm corresponds to a single point in Bn.
․A cube is a product of the input variables or their
negations.
 The fewer the number of variables in the product, the
bigger the space covered by the cube.

Unit 3
34
Chang, Huang, Li, Lin, Liu
Implicant and Cover
․An implicant is a cube whose points are either in the
on-set or the dc-set.
․A prime implicant is an implicant that is not included in
any other implicant.
․A set of prime implicants that together cover all points
in the on-set (and some or all points of the dc-set) is
called a prime cover.
․A prime cover is irredundant when none of its prime
implicants can be removed from the cover.
․An irredundant prime cover is minimal when the cover
has the minimal number of prime implicants.

Unit 3
35
Chang, Huang, Li, Lin, Liu

Cover Examples
․f = x1 x3 + x2 x3 + x1 x2
․f = x1 x2 + x2 x3 + x1 x3

Unit 3
36
Chang, Huang, Li, Lin, Liu
The Quine-McCluskey Algorithm
․Theorem:[Quine,McCluskey] There exists a minimum
cover for F that is prime
 Need to look just at primes (reduces the search space)
․Classical methods: two-step process
1. Generation of all prime implicants (of the union of the on-set
and dc-set)
2. Extraction of a minimum cover (covering problem)
․Exponential-time exact algorithm, huge amounts of
memory!
․Other methods do not first enumerate all prime
implicants; they use an implicit representation by
means of ROBDDs.

Unit 3
37
Chang, Huang, Li, Lin, Liu

Primary Implicant Generation (1/5)

ab a
cd 00 01 11 10

00 X 1 0 1

01 0 1 1 1
d
11 0 X X 0
c
10 0 1 0 1

Unit 3
38
Chang, Huang, Li, Lin, Liu
Primary Implicant Generation (2/5)
Implication Table
Column I

zero “1” 0000

0100
one “1”
1000

0101
0110
two “1”
1001
1010

0111
three “1”
1101

four “1” 1111

Unit 3
39
Chang, Huang, Li, Lin, Liu

Primary Implicant Generation (3/5)


Implication Table
Column I Column II
0000 | 0-00
-000
0100 |
1000 | 010-
01-0
0101 | 100-
0110 | 10-0
1001 |
1010 | 01-1
-101
0111 | 011-
1101 | 1-01

1111 | -111
11-1
Unit 3
40
Chang, Huang, Li, Lin, Liu
Primary Implicant Generation (4/5)
Implication Table
Column I Column II Column III
0000 | 0-00 * 01-- *
-000 *
0100 | -1-1 *
1000 | 010- |
01-0 |
0101 | 100- *
0110 | 10-0 *
1001 |
1010 | 01-1 |
-101 |
0111 | 011- |
1101 | 1-01 *

1111 | -111 |
11-1 |
Unit 3
41
Chang, Huang, Li, Lin, Liu

Primary Implicant Generation (5/5)

ab a
cd 00 01 11 10

00 X 1 0 1 Prime Implicants:
0-00 = a'c'd'
100- = ab'c'
01 0 1 1 1 1-01 = ac'd
d -1-1 = bd
11 0 X X 0 -000 = b'c'd'
c 10-0 = ab'd'
01-- = a'b
10 0 1 0 1

Unit 3
42
Chang, Huang, Li, Lin, Liu
Column Covering (1/4)
4 5 6 8 9 10 13

0,4 (0-00)

0,8 (-000)

8,9 (100-)

8,10 (10-0)

9,13 (1-01)

4,5,6,7 (01- -)

5,7,13,15 (-1-1)

rows = prime implicants


columns = ON-set elements
place an "X" if ON-set element
is covered by the prime implicant
Unit 3
43
Chang, Huang, Li, Lin, Liu

Column Covering (2/4)


4 5 6 8 9 10 13

0,4 (0-00)

0,8 (-000)

8,9 (100-)

8,10 (10-0)

9,13 (1-01)

4,5,6,7 (01- -)

5,7,13,15 (-1-1)

If column has a single X, then the


implicant associated with the row
is essential. It must appear in
minimum cover
Unit 3
44
Chang, Huang, Li, Lin, Liu
Column Covering (3/4)
4 5 6 8 9 10 13

0,4 (0-00)

0,8 (-000)

8,9 (100-)

8,10 (10-0)

9,13 (1-01)

4,5,6,7 (01- -)

5,7,13,15 (-1-1)

Eliminate all columns covered by


essential primes

Unit 3
45
Chang, Huang, Li, Lin, Liu

Column Covering (4/4)


4 5 6 8 9 10 13

0,4 (0-00)

0,8 (-000)

8,9 (100-)

8,10 (10-0)

9,13 (1-01)

4,5,6,7 (01- -)

5,7,13,15 (-1-1)

Find minimum set of rows that


cover the remaining columns
f = ab'd' + ac'd + a'b
Unit 3
46
Chang, Huang, Li, Lin, Liu
Petrick’s Method
- Solve the satisfiability problem of the following function
P = (P1+P6)(P6+P7)P6(P2+P3+P4)(P3+P5)P4(P5+P7)=1
4 5 6 8 9 10 13

P1 0,4 (0-00)

P2 0,8 (-000)

P3 8,9 (100-)

P4 8,10 (10-0)

P5 9,13 (1-01)

P6 4,5,6,7 (01- -)

P7 5,7,13,15 (-1-1)

• Each term represents a corresponding column


• Each column must be chosen at least once
• All columns must be covered
Unit 3
47
Chang, Huang, Li, Lin, Liu

ROBDDs and Satisfiability


․A Boolean function is satisfiable if an assignment to its
variables exists for which the function becomes ‘1’
․Any Boolean function whose ROBDD is unequal to ‘0’ is
satisfiable.
․Suppose that choosing a Boolean variable xi to be ‘1’
costs ci. Then, the minimum-cost satisfiability
problem asks to minimize:

where (xi) = 1 when xi = ‘1’ and (xi) = 0 when xi = ‘0’.


․Solving minimum-cost satisfiability amounts to
computing the shortest path in an ROBDD, which can
be solved in linear time.
 Weights: w(v,  (v)) = ci, w(v,  (v)) = 0, variable xi = (v).

Unit 3
48
Chang, Huang, Li, Lin, Liu
Brute Force Technique
․Brute force technique: Consider all possible elements
P1
in out

P2 P2
in out in out

P3 P3 P3 P3
in out in out

․Complete branching tree has 2|P| leaves!!


 Need to prune it
․Complexity reduction
 Essential primes can be included right away
 If there is a row with a singleton “1” for the column

 Keep track of best solution seen so far


 Classic branch and bound
Unit 3
49
Chang, Huang, Li, Lin, Liu

Branch and Bound Algorithm

r r

Bound = 4
a b a b

x y w z x y w z Killed subtree

5 4 9 8

(a) (b)

Unit 3
50
Chang, Huang, Li, Lin, Liu
Heuristic Optimization
․Generation of all prime implicants is impractical
 The number of prime implicants for functions with n variables is in
n
the order of 3 /n
․Finding an exact minimum cover is NP-hard
 Cannot be finished in polynomial time
․Heuristic method: avoid generation of all prime implicants
․Procedure
 A minterm of ON(f) is selected, and expanded until it becomes a
prime implicant
 The prime implicant is put in the final cover, and all minterms
covered by this prime implicant are removed
 Iterated until all minterms of the ON(f) are covered
․“ESPRESSO” developed by UC Berkeley
 The kernel of synthesis tools
Unit 3
51
Chang, Huang, Li, Lin, Liu

ESPRESSO - Illustrated

REDUCE

EXPAND

IRREDUNDANT

Unit 3
52
Chang, Huang, Li, Lin, Liu
Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
53
Chang, Huang, Li, Lin, Liu

Multi-Level Logic Optimization


․Translate a combinational circuit to meet performance
or area constraints
 Two-level minimization
 Common factors or kernel extraction
 Common expression resubsitution
․In commercial use for several years
․Example:
f1 = abcd + abce +abcd + abcd + f1 = c (a + x) + acx
ac + cdf + abcde + abcdf f2 = gx
f2 = bdg + bdfg + bdg + bdeg x = d (b + f) + d (b + e)

Unit 3
54
Chang, Huang, Li, Lin, Liu
Multi-Level Logic
․Multi-level logic:
 A set of logic equations with no cyclic dependencies
․Example: Z = (AB + C)(D + E + FG) + H
 4-level, 6 gates, 13 gate inputs
Z=(AB + C) (D + E + FG) + H A B F G

2 2
Level 4
C D E
3
2 Level 3

2
Level 2
H

2 Level 1

Z
Unit 3
55
Chang, Huang, Li, Lin, Liu

Boolean Network
․Directed acyclic graph (DAG)
․Each source node is a primary input
․Each sink node is a primary output
․Each internal node represents an equation
․Arcs represent variable dependencies
F x y G

Y X

fanin of y : a, b
fanout of x : F a b d c
Unit 3
56
Chang, Huang, Li, Lin, Liu
Boolean Network : An Example

x1
x2 y1 y4 z1
x3
y3
x4
y2 y5 z2
x5
x6
y1 = f1(x2, x3) = x2’ + x3’
y2 = f2(x4, x5) = x4’ + x5’
y3 = f3(x4, y1) = x4’y1’
y4 = f4(x1, y3) = x1 + y3’
y5 = f5(x6, y2, y3) = x6y2 + x6’y3’
Unit 3
57
Chang, Huang, Li, Lin, Liu

Multi-Level v.s. Two-Level


․Two-level: ․Multi-level:
 Often used in control logic  Datapath or control logic
design design
f1 = x1x2 + x1x3 + x1x4  Can share x2 + x3 between
f2 = x1’x2 + x1’x3 + x1x4 the two expressions
 Only x1x4 shared  Can use complex gates
 Sharing restricted to g1 = x2 + x3
common cube g2 = x2x4
f1 = x1y1 + y2
f2 = x1’y1 + y2
(yi is the output of gate gi )

Unit 3
58
Chang, Huang, Li, Lin, Liu
Multi-Level Logic Optimization

․Technology independent
․Decomposition/Restructuring
Algebraic

 Functional

․Node optimization
 Two-level logic optimization techniques are used

Unit 3
59
Chang, Huang, Li, Lin, Liu

Decomposition / Restructuring
․Goal : given initial network, find best network
․Two problems:
Find good common subfunctions

 How to perform division

․Example:
f1 = abcd + abce + ab’cd’ + ab’c’d’ + a’c + cdf + abc’d’e’ + ab’c’df’
f2 = bdg + b’dfg + b’d’g + bd’eg
minimize (in sum-of-products form):
f1 = bcd + bce + b’d’ + b’f + a’c + abc’d’e’ + ab’c’df’
f2 = bdg + dfg + b’d’g + d’eg
decompose:
f1 = c(a’ + x) + ac’x’ x = d(b + d) + d’(b’ + e)
f2 = gx

Unit 3
60
Chang, Huang, Li, Lin, Liu
Basic Operations (1/2)

1. decomposition 2. extraction
(single function) (multiple functions)
f = abc + abd + (ac)’d’ + f = (az + bz’)cd + e
b’c’d’ g = (az + bz’)e’
h = cde

f = xy + (xy)’ f = xy + e
x = ab g = xe’
y=c+d h = ye
x = az + bz’
y = cd

Unit 3
61
Chang, Huang, Li, Lin, Liu

Basic Operations (2/2)


3. factoring 5. elimination
(series-parallel decomposition) f = ga + g’b
f = ac + ad + bc + bd + e g=c+d

f = (a + b)(c + d) + e f = ac + ad + bc’d’
g=c+d
4. substitution
(with complement)
g=a+b
f = a + bc + b’c’ “Division” plays
a key role !!
f = g(a + c) + g’c’
Unit 3
62
Chang, Huang, Li, Lin, Liu
Division

․Division: p is a Boolean divisor of f if q   and r exist


such that f = pq + r
 p is said to be a factor of f if in addition r =  :
f = pq
 q is called the quotient
 r is called the remainder
 q and r are not unique
․Weak division: the unique algebraic division such that
r has as few cubes as possible
 The quotient q resulting from weak division is denoted by f / p
(it is unique)

Unit 3
63
Chang, Huang, Li, Lin, Liu

Weak Division Algorithm (1/2)

Weak_div(f, p):
U = Set {uj} of cubes in f with literals not in p deleted
V = Set {vj} of cubes in f with literals in p deleted
/* note that ujvj is the j-th cube of f */
V i = {vj  V : uj = pi}
q = V i
r = f - pq
return(q, r)

Unit 3
64
Chang, Huang, Li, Lin, Liu
Weak Division Algorithm (2/2)

․Example
common f = acg + adg + ae + bc + bd + be + a’b
expressions p = ag + b
U = ag + ag + a + b + b + b + b
V = c + d + e + c + d + e + a’
Vag = c + d
Vb = c + d + e + a’
q = c + d = f/p

Unit 3
65
Chang, Huang, Li, Lin, Liu

Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
66
Chang, Huang, Li, Lin, Liu
Technology Mapping
․General approach:
 Choose base function set for canonical representation
 Ex: 2-input NAND and Inverter
 Represent optimized network using base functions
 Subject graph
 Represent library cells using base functions
 Pattern graph
 Each pattern associated with a cost which is dependent on the
optimization criteria
․Goal:
 Finding a minimal cost covering of a subject graph using
pattern graphs

Unit 3
67
Chang, Huang, Li, Lin, Liu

Example Pattern Graph (1/3)


inv (1)
nor2 (2)
nand2 (1)

nand3 (3) nor3 (3)

nand4 (4) nor4 (4)

Unit 3
68
Chang, Huang, Li, Lin, Liu
Example Pattern Graph (2/3)

nand4 (4) nor4 (4)

aoi21 (3) oai21 (3)

oai22 (4)
aoi22 (4)

Unit 3
69
Chang, Huang, Li, Lin, Liu

Example Pattern Graph (3/3)

and2 (3) or2 (3)

xor (5) xnor (5)

Unit 3
70
Chang, Huang, Li, Lin, Liu
Example Subject Graph

t1 = d + e;
t2 = b + h;
t3 = a t2 + c;
t4 = t1 t3 + f g h;
F = t4’;

f
g
d
e F
h
b
a
c

Unit 3
71
Chang, Huang, Li, Lin, Liu

Sample Covers (1/2)

AND2

f
g
AOI22
OR2
d
e F
OR2
h
b NAND2

a
c Area = 18
NAND2
INV

Unit 3
72
Chang, Huang, Li, Lin, Liu
Sample Covers (2/2)

NAND3

f
g
AND2
d
e F
h
OAI21
b
OAI21
a
c Area = 15
NAND2
INV

Unit 3
73
Chang, Huang, Li, Lin, Liu

DAGON Approach
․Partition a subject graph into trees
 Cut the graph at all multiple fanout points

․Optimally cover each tree using dynamic programming


approach
․Piece the tree-covers into a cover for the subject graph
Unit 3
74
Chang, Huang, Li, Lin, Liu
Dynamic Programming for Minimum Area

․Principle of optimality: optimal cover for the tree


consists of a match at the root plus the optimal cover
for the sub-tree starting at each input of the match

I1 Match: area = m
I2

I3

I4
root

A(root) = m + A(I1) + A(I2) + A(I3) + A(I4)


cost of a leaf = 0
Unit 3
75
Chang, Huang, Li, Lin, Liu

A Library Example

INV 2 a’

NAND2 3 (ab)’

NAND3 4 (abc)’

NAND4 5 (abcd)’

AOI21 4 (ab+c)’

AOI22 5 (ab+cd)’

Library Element Canonical Form


Unit 3
76
Chang, Huang, Li, Lin, Liu
DAGON in Action

NAND2(3)

NAND2(8)

NAND2(21)
INV(2) INV(15) NAND2(16) NAND3(17)
NAND2(13) AOI21(9) NAND3(18) NAND4(19)

INV(2)

AOI21(22)
NAND2(3) INV(5) NAND2(8) INV(18)
NAND3(4)

Unit 3
77
Chang, Huang, Li, Lin, Liu

Features of DAGON
․Pros. of DAGON:
 Strong algorithmic foundation
 Linear time complexity
 Efficient approximation to graph-covering problem
 Given locally optimal matches in terms of both area and delay
cost functions
 Easily “portable” to new technologies
․Cons. Of DAGON:
 With only a local (to the tree) notion of timing
 Taking load values into account can improve the results
 Can destroy structures of optimized networks
 Not desirable for well-structured circuits
 Inability to handle non-tree library elements (XOR/XNOR)
 Poor inverter allocation
Unit 3
78
Chang, Huang, Li, Lin, Liu
Inverter Allocation

․Add a pair of inverters for each wire in the subject


graph
․Add a pattern of a wire that matches two inverters with
zero cost
․Effect: may further improve the solution

2 INV
2 NOR2
1 AIO21

Unit 3
79
Chang, Huang, Li, Lin, Liu

Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
80
Chang, Huang, Li, Lin, Liu
Delay Model at Logic Level

1. unit delay model


 Assign a delay of 1 to each gate
2. unit fanout delay model
 Incorporate an additional delay for each fanout
3. library delay model
 Use delay data in the library to provide more accurate delay value
 May use linear or non-linear (tabular) models

Unit 3
81
Chang, Huang, Li, Lin, Liu

Linear Delay Model

Delay = Dslope + Dintrinsic + Dtransition + Dwire

Ds : Slope delay : delay at input A


caused by the transition delay at B Dw : Wire delay : time from state
transition at C to state transition at D

B D
A
C

DI : Intrinsic delay : incurred from


DT : Transition delay : output pin
cell input to cell output
loading, output pin drive
Unit 3
82
Chang, Huang, Li, Lin, Liu
Tabular Delay Model
․Delay values are obtained by a look-up table
 Two-dimensional table of delays ( m by n )
 with respect to input slope (m) and total output capacitance (n)
 One dimensional table model for output slope (n)
 with respect to total output capacitance (n)
 Each value in the table is obtained by real measurement
Total Output Load (fF)
0.0ns

Transition (ns)
Input
0.2 0.3 0.4 0.5

0 3 4.5 6 7
0.4fF
6.0 0.1 5 8 10.7 13

Cell Delay (ps)


․Can be more precise than linear delay model
 table size↑  accuracy ↑
․Require more space to store the table
Unit 3
83
Chang, Huang, Li, Lin, Liu

Arrival Time and Required Time


․arrival time : calculated from input to output
․required time : calculated from output to input
․slack = required time - arrival time

A(j): arrival time of signal j


A(j) R(j)
R(k): required time or for signal k
S(k): slack of signal k
D(j,k): delay of node j from input k D(j,k)
r(j,k)
node j
A(j) = maxkFI (j) [A(k) + D(j,k)] A(k) R(k)
r(j,k) = R(j) - D(j,k)
R(k) = minjFO(k) [r(j,k)] node k

S(k) = R(k) - A(k)

Unit 3
84
Chang, Huang, Li, Lin, Liu
Delay Graph
․Replace logic gates with delay blocks
․Add start (S) and end (E) blocks
․Indicate signal flow with directed arcs

B
C 1

S E

D 2
E

Unit 3
85
Chang, Huang, Li, Lin, Liu

Longest and Shortest Path


․If we visit vertices in precedence order, the following
code will need executing only once for each u

Update Successors[u]

1 for each vertex v  Adj[u] do u'' V1

2 if A[v] < A[u] + [u] // longest


3 then A[v]  A[u] +  [u] V2
u
4 LP[v]  u fi
5 if a[v] >a[u] + [u] // shortest
6 then a[v]  a[u] + [u] u' Vk

7 SP[v]  u fi

Unit 3
86
Chang, Huang, Li, Lin, Liu
Delay Graph and Topological Sort

1 2 3 4

S 5 6 7 8
E

9 10

S 1 5 9 2 6 3 7 10 4 8 E

Unit 3
87
Chang, Huang, Li, Lin, Liu

Delay Calculation
A=0 A=3 A=7 A=8
1 2 3 4
2 4 1 3
a=0 a=2 a=6 a=7

A=0 A=3 A=7 A=8 A=13


5 6 7 8 a=5
S
3 1 1 5 E
a=0 a=3 a=4 a=1

A=2 A=4
9 10
1 4
a=0 a=1
A=3 longest path delay
2 node number
4 gate delay P.S: The longest delay and shortest delay
of each gate are assumed to be the same.
a=2 shortest path delay
Unit 3
88
Chang, Huang, Li, Lin, Liu
Restructuring Algorithm

While (circuit timing improves ) do


select regions to transform
collapse the selected region
resynthesize for better timing
done

․Which regions to restructure ?


․How to resynthesize to minimize delay ?

Unit 3
89
Chang, Huang, Li, Lin, Liu

Restructuring Regions
․All nodes with slack within  of the most critical signal
belong to the -network
․To improve circuit delay, necessary and sufficient to
improve delay at nodes on cut-set of -network
1 6 5

n
5
l m
4 1
i j k
3

0 0 0 0 2 0 0
a b c d e f g
Unit 3
90
Chang, Huang, Li, Lin, Liu
Find the Cutset
․The weight of each node is W = Wxt +  * Wxa
 Wxt is potential for speedup
 Wxa is area penalty for duplication of logic
  is decided by various area/delay tradeoff
․Apply the maxflow-mincut algorithm to generate the
cutset of the -network
․: Specify the size of the -network
 Large  might waste area without much reduction in critical
delay
 Small  might slow down the algorithm
․: Control the tradeoff between area and speed
 Large  avoids the duplication of logic
  = 0 implies a speedup irrespective of the increase in area

Unit 3
91
Chang, Huang, Li, Lin, Liu

Timing Optimization Techniques (1/8)


․Fanout optimization
 Buffer insertion
 Split
․Timing-driven restructuring
 Critical path collapsing
 Timing decomposition
․Misc
 De Morgan
 Repower
 Down power
․Most of them will increase area to improve timing
 Have to make a good trade-off between them

Unit 3
92
Chang, Huang, Li, Lin, Liu
Timing Optimization Techniques (2/8)
․Buffer insertion: divide the fanouts of a gate into
critical and non-critical parts and drive the non-critical
fanouts with a buffer

timing is improved
due to less loading

more
critical less
critical

Unit 3
93
Chang, Huang, Li, Lin, Liu

Timing Optimization Techniques (3/8)


․Split: split the fanouts of a gate into several parts. Each
part is driven with a copy of the original gate.

Unit 3
94
Chang, Huang, Li, Lin, Liu
Timing Optimization Techniques (4/8)
․Critical path collapsing: reduce the depth of logic
networks

AB
A
B
A

Unit 3
95
Chang, Huang, Li, Lin, Liu

Timing Optimization Techniques (5/8)


․Timing decomposition: restructuring the logic
networks to minimize the arrival time

A(f) = 6.5 A(f) = 5.0 A(f) = 4.5


f f
f

D=3 3.0 g d
e c d
D = 4.5 1.5 1.5 e c D = 1.5
a d
b c
a b a b
0.0 0.0 1.0 2.0
0.0 0.01.0 2.0 0.0 0.0 1.0 2.0

e = ab
e = ab g = ce
f = abcd f = ecd f = dg

Unit 3
96
Chang, Huang, Li, Lin, Liu
Timing Optimization Techniques (6/8)
․De Morgan: replace a gate with its dual, and reverse
the polarity of inputs and output
 NAND gate is typically faster than NOR gate

Unit 3
97
Chang, Huang, Li, Lin, Liu

Timing Optimization Techniques (7/8)


․Repower: replace a gate with one of the other gate in
its logic class with higher driving capability

Unit 3
98
Chang, Huang, Li, Lin, Liu
Timing Optimization Techniques (8/8)
․Down power: reducing gate size of a non-critical fanout
in the critical path

not
critical
H

critical

Unit 3
99
Chang, Huang, Li, Lin, Liu

Outline
․Synthesis overview
․RTL synthesis
 Combinational circuit generation
 Special element inferences
․Logic optimization
 Two-level logic optimization
 Multi-level logic optimization
․Technology mapping
․Timing optimization
․Synthesis for low power

Unit 3
100
Chang, Huang, Li, Lin, Liu
Power Dissipation
․Leakage power
 Static dissipation due to leakage current
 Typically a smaller value compared to other power dissipation
 Getting larger and larger in deep-submicron process
․Short-circuit power
 Due to the short-circuit current when both PMOS and NMOS
are open during transition
 Typically a smaller value compared to dynamic power
․Dynamic power VDD
 Charge and discharge of a load capacitor
 Usually the major part of total
power consumption Vin Vout

GND
Unit 3
101
Chang, Huang, Li, Lin, Liu

Power Dissipation Model

1
P   C  Vdd2  D
2
․Typically, dynamic power is used to represent total
power dissipation
P: the power dissipation for a gate
C: the load capacitance
Vdd: the supply voltage
D: the transition density
․To obtain the power dissipation of the circuit, we need
 The node capacitance of each node (obtained from layout)
 The transition density of each node (obtained by computation)
Unit 3
102
Chang, Huang, Li, Lin, Liu
The Signal Probability

․Definition: The signal probability of a signal x(t),


denoted by P1x is defined as :
1 +T / 2
P1x  lim
T  T
-T / 2 x(t) dt

where T is a variable about time.

․ P 0x is defined as the probability of a logic signal X(t)


being equal to 0.
․P 0x  1  P1x

Unit 3
103
Chang, Huang, Li, Lin, Liu

Transition Density

․Definition: The transition density Dx of a logic signal


x(t), t(- ,  ) , is defined as
n x (T)
D x  lim
T   T f c

where fc is the clock rate or frequency of operation.

․Dx is the expected number of transitions happened in a


clock period.
․A circuit with clock rate 20MHz and 5 MHz transitions
per second in a node, transition density of this node is
5M / 20M = 0.4

Unit 3
104
Chang, Huang, Li, Lin, Liu
Signal Probability and Transition Density

Clock

Signal a Pa = 0.5 Da = 1

Pb = 0.5 Db = 0.5
Signal b

Signal c Pc = 0.5 Dc = 0.25

Signal d Pd = 0.25 Dd = 0.25

Unit 3
105
Chang, Huang, Li, Lin, Liu

The Calculation of Signal Probability


․BDD-based approach is one of the popular way
․Definition
 p(F) : fraction of variable assignments for which F = 1
․Recursive Formulation
 p(F) = [ p( F[x=1] ) + p( F[x=0] ) ] / 2
d2 7/32
․Computation 3/16 1/4
d1 d1
 Compute bottom-up, starting at leaves
1/8 1/4
 At each node, average the value of children d0 d0
․ Ex: F = d2’(d1+d0)a1a0 + d2(d1’+d0’)a1a0’ 1/4 1/4 1/4
+ d2d1d0a1’a0 a1 a1 a1
p(F) = 7/32 = 0.21875 1/2
a0 a0
:0 1/2

:1 1 0
Unit 3
106
Chang, Huang, Li, Lin, Liu
The Calculation of Transition Density
․Transition density of cube
 f = ab
 Df = Da Pb + Db Pa - 1/2 Da Db
 DaPb means that output will change when b=1 and a has changes
 1/2 DaDb is the duplicate part when both a and b changes
․n-input AND :
 a network of 2 -input AND gate in zero delay model
 3-input AND gate
a
Dg = Df Pc + Dc Pf - 1/2 Df Dc
b f
․Inaccuracy of this simple model :
 Temporal relations g
 Spatial relations
c

Unit 3
107
Chang, Huang, Li, Lin, Liu

The Problem of Temporal Relations

(1) Without considering the Gate Delay and Inertial Delay

(2) Without considering Inertial Delay

2,
0,1

(3) Practical condition


Unit 3
108
Chang, Huang, Li, Lin, Liu
The Problem of Spatial Correlation

P = 0.5 P = 0.5 * 0.5 = 0.25

P = 1-0.5 = 0.5
(a) Without considering Spatial Correlation
P = 0.5 P=0

P = 1-0.5 = 0.5
(b) Practical condition

Unit 3
109
Chang, Huang, Li, Lin, Liu

Logic Minimization for Low Power (1/2)

․Consider an example:
a c

f f
b b

c a

ab 00 01 11 10 ab 00 01 11 10
c c

0 1 1 1 0 1 1 1

1 1 1 1 1 1 1 1

f = a'b' + ac' + bc f = b'c' + a'c + ab


P = 108.7 W P = 115.5 W
(a) (b)

․Different choices of the covers may result in different


power consumption
Unit 3
110
Chang, Huang, Li, Lin, Liu
Logic Minimization for Low Power (2/2)
․Typically, the objective of logic minimization is to
minimize
 NPT : the number of product terms of the cover
 NLI : the number of literals in the input parts of the cover
 NLO : the number of literals in the output parts of the cover
․For low power synthesis, the power dissipation has to
be added into the cost function for best covers

timing

tradeoff !!
area power
Unit 3
111
Chang, Huang, Li, Lin, Liu

Technology Mapping for Low Power (1/3)


a
b Pt=0.109
c G1
out
G3 Pt=0.179
d
e G2 Pt=0.179 Pt=0.179
f Pt=0.109
(a) Circuit to be mapped

Gate Type Area Intrinsic Cap. Input Load


INV 928 0.1029 0.0514
NAND2 1392 0.1421 0.0747
NAND3 1856 0.1768 0.0868
AOI33 3248 0.3526 0.1063
(b) Characteristics of Library
Unit 3
112
Chang, Huang, Li, Lin, Liu
Technology Mapping for Low Power (2/3)

AOI33

a INV
b
c G1
out
G3
d
e G2
f

Area Cost: 4176


Power Cost: 0.0907
(a) Minimun-Area Mapping

Unit 3
113
Chang, Huang, Li, Lin, Liu

Technology Mapping for Low Power (3/3)

NAND3
a
b NAND2 WIRE
c G1
out
G3
d
e G2
f
NAND3
Area Cost: 5104
Power Cost: 0.0803
(b) Minimun-Power Mapping
Unit 3
114
Chang, Huang, Li, Lin, Liu

You might also like