VLSI CAD Flow: Logic Synthesis,
Placement and Routing
6.375 Lecture 5
Guest Lecture by Srini Devadas
1
RTL Design Flow
HDL
RTL manual
Synthesis design
netlist a
b
0
1
d
q
Library/ s clk
module logic
generators optimization
a 0 d
netlist b 1
q
s clk
physical
design
layout
2
Two-Level Logic Minimization
Can realize an arbitrary logic function in
sum-of-products or two-level form
F1 = A B + A B D + A B C D
+ABCD+AB+ABD
F1 = B + D + A C + A C
Of great interest to find a minimum sum-
of-products representation
– Solved problem even for functions with 100’s
of inputs (variants of Quine-McCluskey)
3
Two-Level versus Multilevel
2-Level:
f1 = AB + AC + AD
f2 = AB + AC + AE
6 product terms which cannot be shared.
24 transistors in static CMOS
Multi-level:
Note that B + C is a common term in f1 and f2
K=B+C 3 Levels
f1 = ΑΚ + AD 20 transistors in static CMOS
not counting inverters
f2 = AK + AE
4
Technologies
“Closed book”: gate-array
standard-cell
“Open book”: CMOS Domino,
complex gate static CMOS
LOGIC EQUATIONS
TECHNOLOGY-INDEPENDENT Factoring
OPTIMIZATION Commonality Extraction
TECH-DEPENDENT OPTIMIZATION LIBRARY
(MAPPING, TIMING)
OPTIMIZED LOGIC NETWORK
5
Tech.-Independent Optimization
Involves:
Minimizing two-level logic functions.
Finding common subexpressions.
Substituting one expression into another.
Factoring single functions.
Factored versus Disjunctive forms
f = ac + ad + bc + bd + a e
sum-of-products or disjunctive form
f = ( a + b )( c + d ) + a e
factored form
multi-level or complex gate
6
Optimizations
⎧ f1 = AB + AC + AD + AE + A BC D E
F=⎨
⎩ f2 = AB + AC + AD + AF + A BC D F
Factor F
⎧ f1 = A( B + C + D + E) + ABC DE
F=⎨
⎩ f2 = A( B + C + D + F) + ABC DF
Extract common expression
⎧ g1 = B + C + D
G = ⎨ f1 = A ( g1 + E ) + A E g1
⎪
⎩ f2 = A ( g1 + F ) + A F g1
7
What Does “Best” Mean?
Transistor count AREA
Number of circuits POWER
Number of levels DELAY
(Speed)
Need quick estimators of area, delay and power
which are also accurate
8
Algebraic vs. Boolean Methods
Algebraic techniques view equations as
polynomials and attempt to factor equations or
“divide” them
Do not exploit Boolean identities e.g., a a = 0
In algebraic substitution (or division) if a function
f = f(a, b, c) is divided by g = g(a, b), a and b
will not appear in f / g
Algebraic division: O(n log n) time
Boolean division: 2-level minimization required
9
Comparison
f = ab + ac + b a + bc + ca + cb
Algebraic factorization procedures
f = a( b + c ) + a ( b + c ) + b c + c b
Boolean factorization produces
f = ( a + b + c )( a + b + c )
l = ( b f + b f ) ( a + e ) + ae ( b f + bf )
r = ( b f + b f ) ( a + e ) + ae ( b f + bf )
Algebraic substitution of l into r fails
Boolean substitution
r = a ( e l + el ) + a ( el + e l )
l = a ( er + e r ) + a ( er + e r )
10
Strong (or Boolean) Division
Given a function f to be strong divided by g
Add an extra input to f corresponding to g,
namely G and obtain function h as follows
hDC = G g + Gg
hON = fON − hDC
Minimize h using two-level minimizer
11
Strong Division Example
f = a bc + a bc + a b c + a b c
g = a b +a b hDC = G (a b + a b) + G (a b + a b)
hON = fON − hDC
Ga
bc 00 01 11 10
00 x 1 x
01 1 x x Function h
11 x 1 x
10 x x 1
Minimization gives h = G c + G c
12
Weak (or Algebraic) Division
Definition: support of f as sup( f ) = { set of all
variables v that occur in f as v or v }
Example: f=AB+C
sup( f ) = { A, B, C }
Definition: we say that f is orthogonal to g,
f ⊥ g, if sup( f ) ∩ sup( g ) = φ
Example: f=A+B g=C+D
∴ f ⊥ g since { A, B } ∩ { C, D } = φ
13
Weak Division - 2
We say that g divides f weakly if there exist h, r
such that f = gh + r where h ≠ φ and g ⊥ h
Example: f = ab + ac + d
g=b+c
f = a(b + c) + d h=a r=d
We say that g divides f evenly if r = φ
The quotient f / g is the largest h such that
f = gh + r i.e., f = ( f / g )g + r
14
Weak Division Example
f = abc + abde + abh + bcd
g = c + de + h
Theorem: f / g = f / c ∩ f / de ∩ f / h
f / c = ab + bd
f / de = ab
f / h = ab
f / g = (ab + bd) ∩ ab ∩ ab = ab
f = ab(c + de + h) + bcd
Time complexity: O( | f | | g | )
15
How to Find Good Divisors?
$64K question
Strong division: Use existing nodes in the
multilevel network to simplify other nodes
Weak division: Generate good algebraic
divisors using algorithms based on “kernels”
of an algebraic expression
16
Tech.-Dependent Optimization
OPTIMIZED LOGIC EQUATIONS
LIBRARY TECHNOLOGY MAPPING
TIMING
CONSTRAINTS
GATE
NETLIST
Area, delay and power dissipation cost
functions
17
“Closed Book” Technologies
A standard cell technology or library is
typically restricted to a few tens of gates
e.g., MSU library: 31 cells
Gates may be NAND, NOR, NOT, AOIs.
A B
A
A C
A AB+C
A C
B
18
Mapping via DAG Covering
Represent network in canonical form
⇒ subject DAG
Represent each library gate with canonical
forms for the logic function
⇒ primitive DAGs
Each primitive DAG has a cost
Goal: Find a minimum cost covering of the
subject DAG by the primitive DAGs
Canonical form: 2-input NAND gates and
inverters
19
Sample Library
INVERTER 2
NAND2 3
NAND3 4
NAND4 5
20
Sample Library - 2
AOI21 4
AOI22 5
21
Trivial Covering
subject DAG
7 NAND2 = 21
5 INV = 10
31
22
Covering #1
2 INV =4
2 NAND2 =6
1 NAND3 =4
1 NAND4 =5
19
23
Covering #2
1 INV = 2
1 NAND2 = 3
2 NAND3 = 8
1 AOI21 = 4
17
24
DAG Covering
Sound Algorithmic approach
NP-hard optimization problem
multiple fanout
Tree covering heuristic: If subject and primitive
DAGs are trees, efficient algorithm can find
optimum cover in linear time
⇒ dynamic programming formulation
25
Partitioning a Graph
26
Resulting Trees
Break at multiple fanout points
27
Dynamic Programming
Principle of optimality: Optimal cover for a tree
consists of a match at the root of the tree
plus the optimal cover for the sub-trees
starting at each input of the match
x Best cover for
this match uses
p best covers for
y x, y, z
z Best cover for
this match uses
best covers for
p, z
28
Optimum Tree Covering
INV AOI21
11 + 2 = 13 4+3=7
NAND2
2 + 6 + 3 = 11
NAND2
INV 3+3=6
2
NAND2
3 NAND2
3
29
RTL Design Flow
HDL
RTL manual
Synthesis design
netlist a
b
0
1
d
q
Library/ s clk
module logic
generators optimization
a 0 d
netlist b 1
q
s clk
physical
design
layout
Physical Design: Overall Conceptual Flow
Input Read Netlist
Floorplanning Floorplanning
Initial Placement
Routing Region
Placement Definition
Placement
Global Routing Improvement
Cost Estimation
Routing Region
Ordering
Routing Routing
Detailed Routing Improvement
Cost Estimation
Compaction/clean-up
Output
Write Layout Database
Results of Placement
A bad placement A good placement
What’s good about a good placement?
What’s bad about a bad placement?
A. Kahng 3
Kurt Keutzer
Results of Placement
Bad placement causes routing Good placement
congestion resulting in: •Circuit area (cost) and wiring
• Increases in circuit area (cost) decreases
and wiring • Shorter wires Æ less capacitance
• Longer wires Æ more capacitance z Shorter delay
z Longer delay z Less dynamic power
dissipation
z Higher dynamic power
dissipation
4
Kurt Keutzer
Gordian Placement Flow
module coordinates
Global Partitioning
Optimization of the module set
minimization and dissection of
of the placement
wire length region
position constraints
module Regions
coordinates with ≤ k
Final modules
Placement
adoption of style
dependent
constraints
Data flow in the placement procedure GORDIAN
Complexity
space: O(m) time: Q( m1.5 log2m)
Final placement
•standard cell •macro-cell &SOG
Gordian: A Quadratic Placement Approach
• Global optimization:
solves a sequence of quadratic
programming problems
• Partitioning:
enforces the non-overlap constraints
Intuitive formulation
Given a series of points x1, x2, x3, … xn
and a connectivity matrix C describing the connections
between them
(If cij = 1 there is a connection between xi and xj)
Find a location for each xj that minimizes the total sum of
all spring tensions between each pair <xi, xj>
xi xj
Problem has an obvious (trivial) solution – what is it?
Improving the intuitive formulation
To avoid the trivial solution add constraints: Hx=b
z These may be very natural - e.g. endpoints (pads)
x1 xn
To integrate the notion of ``critical nets’’
z Add weights wij to nets
xi xj
wij - some
springs have
wij more tension
should pull
associated
vertices closer
Modeling the Net’s Wire Length
connection to
y other modules
module u
net
l vu v
node
(xu ,yu ) pin vu
(ξ , η
vu
)vu
(xv ,yv)
x
The length Lv of a net v is measured by the squared distances from its
points to the net’s center
Lv = ∑ [( x uv− x v ) 2 + ( y uv− yv )2]
u←Mv
( x uv = xu+ ξ uv ; yuv = yu + yvu )
x=100
Toy x=200
x1
Example: x2
Cost = (x1 − 100) 2 + (x 1 − x 2) 2 + (x 2 − 200) 2
Cost = 2(x − 100) + 2(x − x )
x1 1 1 2
Cost =− 2(x − x ) + 2(x − 200)
x2 1 2 2
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
4 −2 x1 −200
−2 4 x 2 + −400 = 0
2 −1 x 1
x + −100
−200
=0
−1 2 2
x1=400/3 x2=500/3
10
Kurt Keutzer D. Pan
Quadratic Optimization Problem
D A B C D E F G
E ( uρ ,vρ )
' ' M ⎡M M M M M M M⎤
F ρ ⎢⎢* * * 0 0 0 ⎥
L⎥
A( l )=
A
B ρ ' ⎢0 0 0 * * * L⎥
( uρ ,vρ ) ⎢ ⎥
M ⎣M M M M M M M⎦
C
Linearly constrained quadratic programming problem
min{ Φ( x) = x TC x + d Tx } Accounts for fixed modules
x ∈R m
Wire-length for movable modules
s.t. A( l )x = u( l )
Center-of-gravity constraints
Problem is computationally tractable, and well behaved
Commercial solvers available: mostek
Global Optimization Using Quadratic
Placement
Quadratic placement clumps cells in center
Partitioning divides cells into two regions
z Placement region is also divided into two regions
New center-of-gravity constraints are added to the
constraint matrix to be used on the next level of global
optimization
z Global connectivity is still conserved
Setting up Global Optimization
Layout After Global Optimization
A. Kahng
Partitioning
Partitioning
In GORDIAN, partitioning is used to constrain the movement of
modules rather than reduce problem size
By performing partitioning, we can iteratively impose a new
set of constraints on the global optimization problem
z Assign modules to a particular block
Partitioning is determined by
z Results of global placement – initial starting point
z Spatial (x,y) distribution of modules
z Partitioning cost
z Want a min-cut partition
16
Kurt Keutzer
Layout after Min-cut
Now global placement problem will be solved again
with two additional center_of_gravity constraints
Adding Positioning Constraints
• Partitioning gives us two
new “center of gravity”
constraints
• Simply update constraint
matrix
• Still a single global
optimization problem
• Partitioning is not
“absolute”
• modules can migrate
back during optimization
• may need to re-partition
Continue to Iterate
First Iteration
A. Kahng
20
Kurt Keutzer
Second Iteration
A. Kahng
21
Kurt Keutzer
Third Iteration
A. Kahng
22
Kurt Keutzer
Fourth Iteration
A. Kahng
23
Kurt Keutzer
Final Placement
Final Placement - 1
Earlier steps have broken down the problem into a manageable
number of objects
Two approaches:
z Final placement for standard cells/gate array – row
assignment
z Final placement for large, irregularly sized macro-blocks –
slicing – won’t talk about this
25
Kurt Keutzer
Final Placement – Standard Cell Designs
This process continues until there are only a
few cells in each group( ≈ 6 )
each group
has ≤ 6 cells
Assign cells in each
group close together in
the same row or nearly
in adjacent rows
group: smallest partition
A. E. Dunlop, B. W. Kernighan,
A procedure for placement of standard-cell VLSI
circuits, IEEE Trans. on CAD, Vol. CAD-4, Jan , 1985,
pp. 92- 98
Final Placement – Creating Rows
1 1 1 1,2
1,2 1,2
1,2 2
2 2,3 2,3 Row-based
2,3
2,3 standard cell
3 3 3 design
3,4 3,4 3,4 3,4
4 4 4
4
5 5 4,5 4,5
5 5 5 5
Partitioning of circuit into 32 groups. Each group is
either assigned to a single row or divided into 2 rows
27
Kurt Keutzer
Standard Cell Layout
28
Kurt Keutzer
Another Series of Gordian
(a) Global placement with 1 region (b) Global placement with 4 region (c) Final placements
D. Pan – U of Texas
29
Kurt Keutzer
Physical Design Flow
Input Read Netlist
Floorplanning Floorplanning
Initial Placement
Routing Region
Placement Definition
Placement
Global Routing Improvement
Cost Estimation
Routing Region
Ordering
Routing Routing
Detailed Routing Improvement
Cost Estimation
Compaction/clean-up
Output
Courtesy K. Keutzer et al. UCB
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 1 Write Layout Database
Imagine …
You have to plan transportation (i.e. roads and highways)
for a new city the size of Chicago
Many dwellings need direct roads that can’t be used by
anyone else
You can affect the layout of houses and neighborhoods
but the architects and planners will complain
And … you’re told that the time along any path can’t be
longer than a fixed amount
What are some of your considerations?
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 2
What are some of your considerations?
How many levels do my roads need to go? Remember:
Higher is more expensive.
How do I avoid congestion?
What basic structure do I want for my roads?
z Manhattan?
z Chicago?
z Boston?
Automated route tools have to solve problems of
comparable complexity on every leading edge chip
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 3
Routing Applications
Mixed
Mixed
Cell
Cell and
and Block
Block
Cell-based
Cell-based
Block-based
Block-based
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 4
Routing Algorithms
Hard to tackle high-level issues like congestion
and wire-planning and low level details of pin-
connection at the same time
Global routing
z Identify routing resources to be used
z Identify layers (and tracks) to be used
z Assign particular nets to these resources
z Also used in floorplanning and placement
Detail routing
z Actually define pin-to-pin connections
z Must understand most or all design rules
z May use a compactor to optimize result
z Necessary in all applications Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 5
Basic Rules of Routing - 1
Wiring/routing
performed in layers –
5-9 (-11), typically
only in “Manhattan”
N/S E/W directions
z E.g. layer 1 – N/S
z Layer 2 – E/W
A segment cannot
cross another
segment on the same
wiring layer
Wire segments can
cross wires on other
layers
Photo courtesy:
Jan M. Rabaey
Anantha Chandrakasan
Power and ground
Borivoje Nikolic may have their own
layers
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 6
Basic Rules of Routing – Part 2
Routing can be on a fixed grid –
Case 1: Detailed routing only in channels
z Wiring can only go over a row of cells when there is a
free track – can be inserted with a “feedthrough”
z Design may use of metal-1, metal-2
z Cells must bring signals (i.e. inputs, outputs) out to the
channel through “ports” or “pins”
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 7
Basic Rules of Routing – Part 3
Routing can be on a fixed or gridless (aka area
routing)
Case 1: Detailed routing over cells
z Wiring can go over cells
z Design of cells must try to minimize obstacles to
routing – I.e. minimize use of metal-1, metal-2
z Cells do not need to bring signals (i.e. inputs, outputs)
out to the channel – the route will come to them
ECE 260B – CSE 241A /UCB EECS 244 8
Kahng/Keutzer/Newton
Taxonomy of VLSI Routers
Routers
Global Detailed Specialized
Graph Search Power & Ground
Restricted General Purpose
Steiner Clock
River Maze
Iterative
Switchbox Line Probe
Channel Line Expansion
Hierarchical Greedy Left-Edge
Kahng/Keutzer/Newton
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A /UCB EECS 244 9
Today’s high-perf logical/physical flow
Library user constraints
netlist tech
files
1) optimize using
estimated or logic delay
extracted optimization/ model
capacitances
timing verif generator
2) re-place and re-route
3)if design fails to meet
constraints due to placement SDF
poor estimation - cell/wire RC
repeat 1 +2- delays
routing
layout extraction
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 10
Top-down problems in the flow
Library user constraints
netlist tech
files
initial capacitance
estimates inaccurate
logic delay
optimization/ model
timing verif generator
inability to take top-
down timing
placement SDF
constraints
cell/wire RC
delays
routing
inaccurate internal
timing model
layout extraction
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 11
Iteration problems in the flow
Library user constraints
netlist tech
files
updated capacitances
cause significant
changes in logic delay
optimization optimization/ model
timing verif generator
limited-incremental
capability
placement SDF
cell/wire RC
delays
routing
resulting iteration may
not bring closer to layout extraction
convergence
Kahng/Keutzer/Newton
ECE 260B – CSE 241A /UCB EECS 244 12