0% found this document useful (0 votes)
20 views42 pages

ASIC Physical Design in 12nm Technology

The document is a project report on the physical design of an ASIC block using 12nm technology, submitted for a Master's degree in VLSI Engineering. It covers the VLSI design process, emphasizing the challenges and complexities of physical design as technology nodes shrink, and discusses various stages of Place and Route (PnR) along with optimization techniques. The report includes acknowledgments, an abstract, and detailed sections on synthesis, timing analysis, and results, showcasing the author's learning and experiences during an internship at Open-Silicon Research Pvt. Ltd.

Uploaded by

morlamahesh7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views42 pages

ASIC Physical Design in 12nm Technology

The document is a project report on the physical design of an ASIC block using 12nm technology, submitted for a Master's degree in VLSI Engineering. It covers the VLSI design process, emphasizing the challenges and complexities of physical design as technology nodes shrink, and discusses various stages of Place and Route (PnR) along with optimization techniques. The report includes acknowledgments, an abstract, and detailed sections on synthesis, timing analysis, and results, showcasing the author's learning and experiences during an internship at Open-Silicon Research Pvt. Ltd.

Uploaded by

morlamahesh7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

I

“Physical design of ASIC block in 12nm technology”

Project Report

Submitted in Partial Fulfillment of the


Requirements for the Degree of

MASTER OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION (VLSI) ENGINEERING

By
Pooja Kevat
(19MECV09)

Department of Electronics and Communication Engineering,


Institute of Technology,
Nirma University,
Ahmedabad 382 481

May 2021
II

Acknowledgement
Experience shapes person, and having good initial experience in any field strengthens base of carrier.
Working with telanted team is an exciting opportunity as well as a challenge. In this internship I learned
various concepts of physical design and VLSI from training, experience and colleagues.
I am immensely greatful for internship opportunity provided by Open-Silicon Research [Link], A
Sifive Company. It was a pleasure to be associated with the Company. I thank company to providing me
with amazing opportunities which increased my knowledge and boosted my confidence. I would like to
thank Mr. Mayur Deshpande (Staff Engineer II and manager) to provide both technical and non-
technical support and being there when needed. His guidance and training are valuable. I would also like
to thank Mr. Anand J. Bariya to select me for internship and providing with good learning opportunities.
I thank every staff member who helped me in my learning process and it is indeed long list. A special
thanks to IT department also which maintained system throughout and made work from home really
easy.
I would also like to thank my internal guide Prof. Jayesh Kumar Patel to support me and guide me
during internship. His support and guidance is vital. I would also like to extent my thanks to all faculties
who taught me. This helped me in gaining fundamental knowledge required for internship.

Pooja Kevat
III

ABSTRACT

VLSI is a engineering related to design, verification and physical implementation of Integrated circuits.
It has been growing since invention of first integrated circuit. As technology advanced, it became
possible to reduce device dimentions. With that more and more devices were possible to be fabricated
on single IC.
Due to it’s requirements in communication devices and computing devices, research on IC went on to
improve performance. Today IC have dimension in nanometers and research on better devices is going
on. This have improved area performance but increased design complexity. As area of components
reduced, the routing requirement for devices per area also increased. This tradeoff limits performance of
IC. It needs tradeoff between three performance parameters: power, area and performance. ASIC is
application specific chip and is designed and optimized for given technology node and requirements
from architecture.
RISC-V architecture is developed and preserved as open architecture. Many people are contributing to
this effort. This architecture is designed by keeping in mind all the performance parameters, especially
power consumption. Physical design engineers converts architecture/ logic design into final IC. As
technology is shrinking, physical design is becoming increasingly challenging. Design Rule Manuals are
used to implement design optimally. These rules are becoming complex as technology node is shrinking.
Improving all performance parameters- power, timing and area is challenging. Many experiments are
done on single node single block design before getting perfect violation free output.
Reducing runtime of PnR processes is one of the issues to be resolved. Single error can waste a lot of
human working hours. Using tool smartly is also one of the skill an engineer can possess. Deep
understanding of input timing constraints is required. Wrong input causes flawed output with correct
process. Understanding of design is necessary, understanding critical paths is half the solution for issues
faced. To approach design with generic point of view helps a lot. If we understand every process and
have knowledge of undetected issues in one process that can affect another process, it is easier to resolve
issues.
In this report, various PnR stages are explained with optimization techniques. With each stage various
experiments are done to get final optimum solution. Here, results of different area utilization are
compared and discussed.
IV

INDEX

Chapter Title Page No.


No.
Acknowledgement II
Abstract III
Index IV
List of Figures V
Nomenclature VI
1 Introduction 1
1.1 Introduction 1
2 Synthesis 3
2.1 Introduction 3
2.2 Synthesis flow 3
2.3 Synthesis strategies 6
3 Place and route 9

3.1 Introduction 9
3.2 Stages of PnR 9
4 STA 20
4.1 Introduction 20
4.2 SDC constraints 21
4.3 MMMC analysis 22
4.4 Timing optimizations 22
5 IR Analysis 24
5.1 Introduction 24
5.2 Static IR 25
5.3 Dynamic IR 26
5.4 Power analysis 26
6 Results 29
V

6.1 Synthesis 29
6.2 PnR 31
6.3 IR Analysis 32
7 Conclusion 34
8 References 35

LIST OF FIGURES

Figure Title Page


No. No.

1.1 VLSI design flow 1


2.1 Synthesis flow 3
2.2 Single cell clock gating example 7
2.3 Integrated clock gating cell 7
2.4 ICG setup and hold check 8
3.1 Floorplanning 10
3.2 Various blockages examples 11
3.3 Bump planning 12
3.4 Power planning 13
3.5 Physical cell placement 13
3.6 Tap cell 14
3.7 Placement of cell with flipped/ original orientation 16
3.8 Clock delay 18
3.9 Clock skew 18
3.10 Clock tree debugger 19
4.1 Path groups 20
5.1 Voltage drop in rails 24
5.2 Voltage drop and ground bounce in rails 25
VI

5.3 Static current 25


5.4 Dynamic current vs static current 26
5.5 Internal power table in .lib 27
5.6 Leakage power table 27
5.7 Switching power 28
6.1 Design checks 29
6.2 Timing check summary 30
6.3 Area report without don’t use cell constraint 30
6.4 Area report after implementing don’t use cell constraint 30

LIST OF TABLES
Figure Title Page
No. No.

6.1 PnR results comparison 31


6.2 IR Analysis and Power Analysis 32
VII

NOMENCLATURE

Abbreviations

VLSI Very Large Scale Integration

RTL Regester Transfer Logic

EDA Electronic design automation

HDL Haedware Description Language

UVM Universal Verification Methodology

OVM Open Verification Methodology

GDCAP Gate array Decoupling Capacitance

ECO Engineering Change Order

CTS Clock Tree Synthesis

DEF Design Exchange Formate

SDC Synopsis Design Constraints

QoR Quality of Result


1

Chapter 1
Introduction
1.1 Introduction
VLSI is branch of engineering which deals with large scale integration of semiconductor devices
designed for various purpose. after invention of first IC, continuous research was done with goal
of decreasing size of devices and getting maximum utilization out of IC. As research went on, it
became cheaper and easier to manufacture consumer based electronics. Computers, mobile and
other electronic devices became integral part of life.

Figure 1.1 VLSI design flow


2

In figure above VLSI design cycle is described. This report is based on physical design for which
detailed process is described here.
Physical design convertes given input netlist into layout (GDSII) form. VLSI design process is
generally devided into two broad categories: front end and back end. Front end deals with logic
design while backend deals with converting given logic design into GDSII file which can be
given as input to foundary.
I. Front end
 RTL designing
RTL design is basically a code describing required functionality with use of logical elements
used for digital design. This code are designed with HDL like VHDL, Verilog.
 Functional verification
Functional verification of HDL description have become more challenging then designing
itself as technology improved. UVM, OVM are general methodologies developed to test
digital designs efficiently. Generally system Verilog is used to program testbench.
 Synthesis of RTL netlist
Generic design described with HDL is converted into technology specific gate level netlist.
II. Back end
 Physical design
Physical design is the process which converts gate level netlist into manufacturable GDS. It
is a complex process and consist of many stages like florrplanning, power planning,
placement, clock tree synthesis, routing, logical equivalence checks, physical verification and
mask data generation.

As process nodes are shrinking, design process is becoming more challenging. Here, regular design
techniques are described in physical design domain.
3

Chapter 2
Synthesis
2.1 Introduction
Synthesis is process to convert given RTL netlist into technology specific gate-level netlist. Required
inputs for logical synthesis are: library information, RTL netlist file and SDC constraint file. For
physical synthesis DEF file is also required for placement information.
2.2 Synthesis flow
Here, logical synthesis flow is discussed. There are various intermediate stages involved in this
process which are disussed here. They are listed here as: library reading, netlist reading, SDC
reading, elaboration, initial synthesis, incremental synthesis, report generation, sdc generation and
netlist generation.
With given inputs we might add extra constraints as per requirement.

Figure 2.1 Synthesis flow


4

1) Library reading
We need information about technology and foundry specific libreries in order to map logical
functionality described in RTL and logical elements defined in technology specific library.
Technology library gives information about timing, area and power consumption of design.
Primarily, .lib and .lef files are given as basic input. For some advanced synthesis flows def can
be required.
i. .lib (liberty file formate)
Liberty files contains information about timing, power (internal and leakage), operating
condition, area of cell and functionality. All the respective units for timing, power etc are
defined in .lib files.
Liberty files can be based on various timing models like CCS (Composite Current
Source), NLDM (Non Linear Delay Model) etc. A timing model consists of driver, net
and receiver model. Driver models are characterized by circuit simulators, same goes for
receiver model. However, for net model can be either estimated or extracted. NLDM
model is based on voltage source, while CCS model is based on current source.
ii. LEF (Library Exchange Format)
LEF are of two types: technology lef and cell lef/ macro lef. Technology lef files contains
information about various metal layers, their DRC rules, layer and via size information.
While cell/ macro lef contains information about geometry of cell/ macro.
iii. DEF (Design Exchange Format)
It contains floorplan information like, placement of macros, standard cells used in design,
I/O pin locations, physical cell placement. In detail it contains Die area, tracks, macro
placement information, I/O pin placement information, nets, blockage information, halo,
scan chain placement information, vias, slots, fills, regions, rows and metal layer
information.
 Don’t use cell list generation
This is optional process. Synthesis just convrt an RTL input file into netlist. Synthesis
tool don’t have information about placement of cells, net lengths (rc information for
nets), area/ shape of block, pin placements etc.
When STA is done on synthesized netlist timing delays information might be too
optimistic. This causes less optimization and can create problems in later PnR stages.
5

Some examples are:


 Too many low drive strength cells are there in synthesized netlist. As cell drive
strength depends on area of cell, estimated area is lower than expected. Now in
PnR stage as placement and routing information are added, tool might optimize
these cells by replacing them with other high driving strength cells. This causes
area to be increased and if area optimization is kept tight, congestion occurs. To
avoid this very low drive strength cells are avoided in synthesis. We can add low
drive strength cells in don’t use list.
 Combinational cells with very high pin density used (eg. AOI cells). During PnR
when these cells are placed, routing congestion occurs due to high pin density. If
tool uses lot of high pin density cells, than to avoid routing congestion we might
require to decrease area utilization. So, these cells can be added in don’t use list.
We might have some cells in library itself defined as don’t use cells. We can make don’t
use attribute false for these cells if required.
2) Netlist reading
HDL netlist is read and analysed.
3) SDC reading
To provide required timing information specific to design SDC is defined. SDC contains timing
constraints link definition of clocks, maximum and minimum signal transition allowed. It also
defines path specific constraints like false path, minimum and maximum delay definition for path
where clocks are not defined, multicycle paths, generated clock information etc.
Timing report is generated based on given information in SDC. So it is very important to define
SDC constraints properly.
4) Elaboration
Elaboration is heart of synthesis in which HDL description of circuit is analysed and converted
into generic gate level netlist formate. There are various algorithms to convert HDL coded
design into Boolean data structure, This Boolean data structure can be mapped into respective
logic components. This information is used to define a generic netlist which describes connection
between various components.
5) Pre-Mapping optimization
6

At this stage we have generic netlist describing logic present in design. However, this is not
optimized fully. There might be some redundant logic still present in generic stage. This
redundancy is removed and wherever it is possible generic logic netlist is optimized.
6) Technology mapping
Here, gates from technology library is chosen in place of genenric gates. Perticular cells are chosen
in such a way that they provide enough drive strength with minimum area requirement.
7) Post mapping optimization
In post mapping optimization, gates sizes and drive strength are chosen in such a way that timing is
improved. After overall optimization, uniquification of instances is done. When logic is
instantiated multiple times, it is important to define each instantiation uniquely. This is done to
identify each instantiation at correct logic connection, so PnR tool can place it as required.
8) Results and reports
Synthesis outputs are a technology mapped netlist and an sdc generated for next stage. Most
Important synthesis reports are check design report, check timing report, QoR and pathwise
timing summary.

2.3 Synthesis stratagies


Various techniques/ addones can be used during synthesis to improve performance of netlist. Some of
them are discussed here.
1) Clock Gating cell insertion
Clock is major contributer to total power consumption of design. Clock activity is 1 if no constraint
is given to clock. Clock paths are major contributors in power consumption. In flip flops, data is
captured when clock edge is detected. When signal activity is very low at register input or part of
logic is going to be idle for some time, keeping clock on consumes unnecessary power. To
reduce dynamic power consumption, clock gating is used.
when EN = 1; D = Din ;
There is continuously a new data at Dout after every posedge of the clock. If
EN = 0 ;
flop and mux are reloading repeatedly same data at every posedge of the clock. This will result in
immense power consumption every time same data in is loaded to data out and back to data in.
7

The idea of clock gating cell is to disable clock signal when there is no signal transition at input
of sequential element. We can implement clock gating cell with any combinational cell like
“and” or “or” gate. Clock can be one input and other is controlling signal. If controlling signal is
dominant signal than clock is blocked. If it’s non-dominant input, clock is propagated.
Dominant input 0
Non-dominant input 1 output

Clock
Figure 2.2 single cell clock gating example

This kind of logic is avoided in clock gating as it can cause spikes, which can cause failure of
circuit. To remove glitch, integrated clock cells are used as shown below. Clock gating cells are
defined in standard cell libraries. However, clock gating can be inserted with RTL coding also.
In synthesis clock gating attributes are defined saperately. Clock gating cells from library is
defined with “set_attribute lp_clock_gating_cell {cell_name}”. we can choose multiple clock
gating cell for this. We also need to specify minimum number of fanout from ICG as it is not
advisable to insert ICG for each individual flop saperately. This can be defined with
“set_attribute lp_clock_gating_min_flops {number}” command.

Figure 2.3 Integrated clock gating cell

Here, clock signal is input to and gate as well as at negative triggered latch. This way, enable
signal is synchronized with clock. This removes possibility of glitch on output clock wave form.
When clock gating is applied, clock waveform becomes non-uniform. To deal with this different
kind of setup and hold checks are used. Above example is of “AND” gate based integrated clock
cell. For “AND” gate “0” is controlling input value. When enable is at logic low clock is not
propogagted. Similarly for “OR” gate “1” is controlling input value.
8

Figure 2.4 ICG setup and hold check

As shown in figure, setup check is done at clock edge when clock goes into non-controlling state from
controlling state. Failure in setup can cause glitch at leading edge of clock or clipping of clock signal.
Hold check is done at clock edge when clock goes into controlling state from non-controlling state.
Failure in hold check can cause glitch at trailing edge or clipping in clock signal.
9

Chapter 3
Place and Route
3.1 Introduction
Place and route is process to convert synthesized netlist into detailed physical layout in form of GDS
file. Its’ inputs are: Netlist, SDC, Timing analysis view definition file and various library files (lef, lib,
qrcTech, layermap files)
Based on information from synthesis output and reports; area estimation, uncertainty value, additional
constraints are defined.

3.2 Stages of PnR


3.2.1 Design Import
All the inputs are defined at this stage. Setup is done, design contents are saved.
3.2.2 Floorplan
In floorplaning area, aspect ratio, shape of core, placement of IP and memories, pin placement,
blockages are defined. Area is estimated from synthesis reports as per required utilization. Shape of
block is based on various factors like requirement from top block, pin assignment requirements,
connectivity of cells and blocks.
 Pin assignment
Pin assignment is crucial part of floorplan. It affects timing and routing congestion. When it
comes to pin assignment of standard processor IP, it is done based on it’s connectivity with
various other blocks and internal connection.
Pins present in common bus are placed together. Generally, clock pins are kept in the middle of
edges in order to propogate clock tree uniformely in design.
 Macro placement
Various memories and IP can be part of block. To place this cells some general guidelines are
followed. These are listed below:
 Macros are placed near edges or in corners. Various combination/ sequential logic cells
can be connected to multiple macros/ pins from different edges. Due to this it is desired
that standard cells are placed near center from where routing can be done without concern
of high routing resource requirement.
10

 Macro cells requires power connection and connection to physical cell properly. To give
proper power connection, minimum channel width is defined in such a way that atleast a
pair of voltage source and ground stripe can be routed through channel.
 Channel width should accommodate all the necessary routing . Channel width should be
wide enough to let every pin of macro route through channel. If there is any
combinational logic present which is tightly connected to macro than it should be
constrained to placed near that particular macro.
 Placement of macros is done as per design hierarchy. Macros defined in same modules/
design hierarchy are placed together as per requirement of their connection with each
other and ports.

Placement/ routing blockage

pin

Macro channel width

macro

Figure 3.1 Floorplanning


 Macros can be stacked on sides not having any pin connections. This improves area
utilization. However it is not dvised to stack more than two macros together as it may
11

result in increased standard cell placement in channels which can cause congestion issues.
 Physical cells are added at macro boundary. End cap cells are added around macro, other
physical cells are added at channels available around macro.
 Placement of blockages
 There are various type of blockages: Placement blockage, routing blockage. Placement
blockage can be of various types. Hard placement blockage blocks every cell from being
placed in area. Soft blockage allows only bufferes and inverters type cells which are used
to maintain signal strength. Partial blockages allows placement of cells in given area upto
certain area optimization, beyond that it won’t allow placement of cells.
 We can define regions which constraints tool to place certain cells in defined area. This is
also used in low power design techniques. To use Multi source voltage cells, cell using
same operating voltage are kept in same region.
Routing blockage fence/region
Partial blockage Placement blockage

Figure 3.2 Various blockages examples

 As per floorplan, blockage requirement can be decided. Hard placement blockages are
kept at boundary to avoid congestion and signal integrity issues. To keep block IO signals
robust, IO buffers are placed near boundary.
12

 Soft blockages can be applied in channel between macro if we don’t want to put
combinational logic in that area. Generally, partial blockages are used to control area
utilization near macros.
 Routing blockages are applied over macros to reduce congestion. They can be created as
per requirement. We can create routing blockage for signal nets and let special nets
(power, clock) routed over some regions if required.
 Power plan
 In IC design, it is required to route power nets to each and every standard cell and macro.
We need to ensure that powr planning is robust with low IR drop. Good power planning
is required for overall performance of IC.
 If IR drop is high than cell timing performance degrades, also static power consumption
increases. This causes low battery life and bad timing performance.
 Generally nowadays bumps are used as to provide signals to design. Bumps are created
with highest metal layer available. From these bumps, power is routed to lower metal
layers with striped and vias.

Figure 3.3 Bump planning


13

Figure 3.4 Power planning

 As shown in figure, blues stripes are of metal 1, and rest of metal are connected to these
stripes. Optimum power routing is required for best performance. If powerplan is too
dense and utilizes more routing resources it create problem in signal routing.
 Combination of stapling, stripes and via only connections is created for power planning.
 Physical cell placement

Figure 3.5 Physical cell placement


14

 Various physicl issues occures on IC which can cause failure of functionality or poor
performance and reduced life span of IC. Latch up issue, decrease in operating voltage
due to high current demand, discontinuity in n-well or p-well, non-uniform cell
distribution across IC are some examples of issues faced in IC.
 Well TAP cells
Latch up issue cause direct current flow between VDD and ground. It can cause circuit
failure. To avoid it, N-well is connected to VDD and P-well is connected to ground using
well TAP cells. These cells avoids latch up issue. Also, TAP cells provide continuity of
wells. Well tap cells are placed in rows and distance between rows is decided using
design rule manual.

Figure 3.6 Tap cell

 Endcap cells
These cells are placed at row start and end. Basically they define row start and end point.
Endcap cells are placed to avoid cell damage at time of fabrication. During fabrication
process, probability of poly variation is high at edges.
 Decap cells
MOSFET based capacitor cells added to ensure constant operating voltage when
instantaneous current demand is high. When switching activity is high, large amount of
instantaneous current is drawn from power rails. This can force cells to go in metastable
stage which is undesirable. To avoid this Decap cells are added, they store voltage and
15

supply voltage when required. These are added at regular interval in rows. Decap cells
contribute to lekage power consumption.
 Filler cells
Entire block/ IC region is never filled entirely with functional cells. Some gaps are
always present there. This can cause DRC violations at base layer and well discontinuity.
To avoid these issues filler cells are added. These are small cells with no functionality.
Generally after routing stage filler cels are added.
 Tie cells
During ASIC design, some cells might be assigned constant value. They can be
connected to either power or ground. We can’t connect cells directly to power/ ground
rails due to probability of ground bounce/ voltage drop. These cells are connected to
power/ ground via Tie cells.
 Spare cells
These cells are added to apply ECO (Engineering Change Order) if required. A clean
GDSII file is set to foundary, however there are still chances for bugs in design. To
accommodate for this situation, spare cells are added. Spare cells are placed across the
design. We can change masks of metal layers according to change in spare cell
connection.
3.2.3 Placement
 After macro placement, powerplanning, physical cell placement standard cells are placed. To
reduce area reuirement power reils are shared. Cells are flipped between rows.
 Timing driven placement is done to reduce timing violations. It can be either net based or timing
path based
 In net based placement, nets are given weights based on logic connected to net and timing
criticality of net. Cell placement is based on weight of net.
 Placement of cells is based on critical timing paths in critical path based placement.
Timing is checked after each iteration of refined placement.
 Placement constraints
 Various placement constraint are applied in order to avoid violations. It is desired to have
persistent area utilization across IC. To avoid congestion issue we can define maximum
cell density for block.
16

 Guide
Guides are used as assistance for tool to place cell. It is not hard constraint. Cells defined in
guide can be plced outside of guide also. Similarly, cell outside deifnintion of guide are
allowed to be placed inside.
 Fence
Fence is hard constraint for placement. Cells defined under fence are strictly placed inside
fence. Cells not defined for fence are strictly placed outside of fence. Fencing is done
with low power design techniques when multi power supply cells are used. It is required
to place cells with given operating voltage in area containing appropriate powerplan.
 Region
Region definition prohibit cells defined for egion placed outside region boundary. But cells
not defined for region can be placed inside region.
 For placement, apart from timing congestion is observed. Congestion describes number of extra
metal layer required in either horizontal or vertical direction. If there are congestion hotspots
than we need to remove them.
We can reduce congestion by adding partial blockages, increase macro channel width, avoid using
high pin density standard cells.

Figure 3.7 placement of cell with flipped/ original orientation


17

 During placement timing based optimization is done by tool to reduce timing violations.
 To increase drive strength, original standard cell is replaced by standard cell larger size.
This reduces slew.
 Output of standard cell can be buffered to retain signal strength. This path also can be
split in buffered branches if required.
 Cells can be cloned, outputs are devided between clones to improve timing.
 Pin swapping is done if possible.
 Area utilization is kept around 65-70% in order to avoid DRC violations and timing, congestion
related issues

3.2.4 Clock tree synthesis


After placement of standard cells, clock network is created. All sequential elements require clock input.
Clock routing must be done in such way that it require minimum routing resources and provides
minimum latency and skew.
For CTS required inputs are: placed netlist, clock definitions, sdc constraints
 CTS terminologies
 Clock tree
 Distribution of clock across block in tree configuration. Clock tree branches from root
node which is clock port.
 Clock tree leafs are clock sinks at end points (registers, memories, other sequential
elements)
 Clock domain is group of registers controlled by same edge of given clock
 Clock latency
Clock latency is addition of source latency and network latency. Source latency is clock signal
delay from PLL/ clock generator to clock port. Network latency is delay from clock port to
register clock pin.
 Clock delay
Clock delay is difference between arrival of active clock edge at chip port and arrival of active
clock adge at flops.
18

Figure 3.8 clcok delay

Clock delay
 Clock skew
It is difference between arrival time of clocks at different registers.

Figure 3.9 clock skew

 For clock tree synthesis goals are: to minimize skew, power (high switching activity), noise in
clock signals, slew rate etc.
 Clock is strong aggressor due to high switching activity. Shielding of nets is done and NDR (non
default rules) are applied to retain signal integrity.
 Clock nets can be shielded by either power source or grounds signal carrying nets. Generally
metal DRC have 1w1s (single width single spacing) rule. However clock signal can cause signal
integrity issues for default relues. Non-default rules are defined to avoid this issue. Example of
NDR is 1w2s, 2w1s, 3w2s rules.
 After clock tree is built, both setup and hold based optimization is done to improve timing.
 For trunk nets higher metal layers are used while for leaf connections lower metal layers can be
sufficient.
19

Figure 3.10 Clock tree debugger

3.2.5 Routing
After optimizing CTS, routing is performed. All the signal nets are routed with goal of minimizing DRC and
timing violations. After routing, filler cells are added in netlist. Output def, netlist are generated. SPEF is
generated with QRC extraction.
3.2.6 Outputs
After routing is finished, metal filling is done to prepare GDSII output file. In this process metal polygons
are added where metal routing density is low and have large spaces in between. This increases yield and
reduces probability of physical damage. Bumps are added at top level block for connectivity.
20

Chapter 4
STA
4.1 Introduction
Static timing analysis is technique to validate timing performance of given design. It splits input
design into various timing paths and calculate signal propagation delay along each path except
defined exemption. Path elsements are defined as: begin point, combinational network, end point.
Paths are defined with type of start point and end points encountered. Most common path groups are:
in2reg, reg2out, reg2reg, in2out.

Figure 11 Pathgroups

Path group : Begin point  End point


in2reg : Input port  register input
reg2reg : register clock pin at launch clock  register input pin at capture clock
reg2out : register clock pin  output port
in2out : input port  output port
Registers are sequential elements and therefore timing for registers is calculated with respect to clock
period. For IO ports, input/ output delays are assumed and defined in sdc with virtual clock.
21

4.2 SDC constraints


SDC is file containing timing constraints for given design. It guides EDA tools for timing path
optimizations. Delays are calculated with respect to clock definition. SDC constraints have clock
definitions, port delay information and timing exceptions. Some of the commands are discussed
here.
 create_clock
Defines clock period, duty cycle, clock name and port connected to clock. We can define
virtual clocks with same command without defining port.
 set_input_delay
Defines delays from input ports. Generally delay is described as some percentage of virtual
clock period defined.
 set_output_delay
Define delays at output ports. Delay is defined with respect to virtual clock.
 set_max_transition
Defines maximum allowable slew value for signal. If transition time of any signal exceed this
value than it is reported as violation.
 set_false_path
Generally false path is defined between logically exclusive components of design. It can be
used as well for physically excusive and asynchronous paths.
 set_clock_groups
If there are more than one clocks defined in design than we can define various clock groups
according to their relation.
 Physically exclusive
Physically exclusive clock paths never cross each other physically. An example of
this is test clock and functional clock, these two are never used simultaneously in
design.
 Logically exclusive
Logic is defined in such a way that there is no signal propagation possible for this
path. These are logically exclusive groups.
 Asynchronous clocks
22

No timing analysis to be done for paths having combination of asynchronous clocks as


launch clock and capture clock.
 set_dont_touch_network
This command is used when we don’t want any optimization done for defined paths. Generally
defined in synthesis so that tool don’t modify clock buffer network.
 set_clock_uncertainty
Defines uncertainty for particular clock. Until clock tree is not synthesized, clock delay
information is unknown. If clock is considered as ideal and pessimism is not added at initial
stages of design than at later stages we may face too many timing violations. Uncertainty
consists of clock jitter, skew and latency.
 set_multicycle_path
Generally setup and hold checks are done with respect to single clock cycle timing period.
However there can be scenarios where delays are intended/ inflexible. These paths are
constrained as multicycle paths.
4.3 MMMC analysis
Multi-mode multi-corner analysis is performing STA for different process parameter and functional
modes at the same time. Modes define design functionality type. Exmples of modes are: functional
mode, AT-SPEED mode, BIST mode. Corner defines combination of process (interconnect as well
as component design process), voltage, temperature conditions. For each unique process-votage-
tempurature combination, unique timing library (.lib) is defined.
There can be various environmental conditions which can affect design bahaviour. Design should be
tested properly for each probable functional conditions. This makes design robust against failure in
adverse conditions. For each mode various timing corners are defined. For single mode unique SDC
is defined. For different corners different library sets are defined. Also, STA for setup and hold
checks is done saperately.
4.4 Timing optimizations
STA is performed by every design tool in order to otimize design for timing. There are various
optimization techniques used to improve timing of violating paths. Soe of them are discussed here.
 Resizing of cells
If timing path volation is there due to high cell delay, tool replaces cell with higher fanout cell
having larger area and same functionality. This improves cell delay.
23

 Buffering of output nets


When cell fanout is very high or net is long, slew increases and causes timing violation. To
reduce slew and therefore delay, bufferes or inverters (in even number) are added for that net.
 Duplicating cell
If fanout of a cell is very high, than we can duplicate cell and devide output eelments into
different branch.
 Dividing output net path
For cell with very high fanout, output path can be divided into branches and all branches can
be buffered saperately to retain signal strength.
24

Chapter 5
IR analysis
5.1 Introduction
Having good power delivery network is highly important in IC design. Every cell should get proper
operating voltage. If operating voltage itself would be low, signal strength would be lower and
circuit might go in metastability state.
To prevent weak power delivery network, IR analysis is done. IR analysis reveals voltage drop value
between power source and sync (standard cell/ macro power pin). Due to parasitic effect, metal
layers have specific amount of paracitic capacitance and resistance. This causes voltage drop in rails.
This results in less operating voltage to power pins of cell/ macro.
In given figure below, V2 = V1 - I.R

Figure 5.1 Voltage drop in rails

IR drop and ground bounce are define as the variations in voltage caused by current flowing through
rails having distributed paracitics.
Ground Bounce:
Ground bounce is increase in voltage at VSS/ GND, which should be zero always. This occurs on
ground networks (VSS or GND) in IC. Due to combination of small amount of current at ground
network and paracitic effects in network, increment of voltage is observed in the ground voltages
around the chip.
25

Power voltage drop:


Power voltage drop is a decline in voltage that arises on power supply networks (VDD) in integrated
circuits. At lower metal layers, metal width decreases causing increase in power-grid resistance. This
increases localized voltage drops within power grid.

Figure 5.2 voltage drop and ground bounce in power rails

5.2 Static IR
Static IR drop is an average voltage drop occurs for the design. It is dependent on the paracitic RC value
of the power grid connecting the power supply to the respective standard cells. The average current
depends totally on the time period, unlike instantaneous current. Static IR drop highlights power grid
weakness of the design.

Figure 12 Static current


26

5.3 Dynamic IR
Dynamic IR drop is a voltage drop due to the high switching activity of transistors. It occurs when there
is an increasing demand for current from the power supply due to high switching activities of the chip.
Dynamic IR drop depends on the switching time of the logic and is less dependent on the clock period as
clock period defines avarage current.
Dynamic IR drop takes the instantaneous current drawn from the power grid into account, which arises
in the switching event . The Average current depends totally on the time period as it is total current
passed during cycle. While the dynamic IR drop depends on the instantaneous current which is higher
while the cell is switching. Dynamic IR drop Evaluates the IR drop caused when large amounts of
circuitry switch simultaneously, causing peak current demand.

Figure 13 Dynamic current vs static current

5.4 Power analysis


Power dissipation is major issue while designing ASIC. Trade off between power and timing
performance is challenging. Power dissipation is devided into two major parts: static power
dissipation and Dynamic power dissipation. For each calculation is different. In voltus tool, total
27

power is devided into: Internal power, leakage power and switching power. Three of them are
derived as given.
5.4.1 Internal Power
Internal power is calculated from LIBS. It needs Load capacitance , slew and frequency & toggle rate.
We get Load capacitance from SPEFs , slew & frequency from timing files. Slew rate is defined as the
maximum rate of output voltage change per unit time.

Figure 14 Internal power table in .lib

5.4.2 Leakage power

Figure 15 Leakage power Table


28

Leakage power is also defined in .lib. For given cell, leakage power is defined for each input
combination with respect to power and ground sources.
5.4.3 Switching power
Switching Power is due to the charging and discharging of total load, which includes the output
capacitors and other parasitic capacitors.

Pswitch = α.(Vdd)[Link].f
Figure 5.7 Switching power

α = Activity Factor
Vdd = supply voltage
CL = total load capacitance
f = frequency of operation
29

Chapter 6
Results
6.1 Synthesis
In synthesis, design sanity checks should be clean. Calculated power and area are reported. Timing is
checked and violated timing paths are analysed.

Figure 6.1 Design checks

In this report, design related issues are described. There shouldn’t be any unresolved references or
empty modules in design. there shouldn’t be any “assign” statements in netlist. Assigns statements are
non-synthesizable and this kind of syntax should be converted into tie logic. Library sets should be
defined properly. Every .lib has a corresponding lef. Missing lef can cause design failure. Generally, in
synthesis there are no physical cell insertion.
30

Figure 16 Timing check summary

There shouldn’t be any illegal timing issues in final netlist. Clocks should be defined properly with
correct port name. there shouldn’t be any unconnected/ logic driven nets except generated clock which
are defined in SDC. Sequential deata pins shouldn’t be driven by clock signel. Every register should be
defined with unique clock per mode for timing. There shouldn’t be any ambiguity about it.

 QoR report
This report shows statistics of various type of cells. Give information about timing summary,
area and power. Here, change in area report is shown, before and after applying don’t use cell
constraint.

Figure 6.3 Area report before implementing don’t use cell constraint

Figure 6.4 Area report after implementing don’t use cell constraint
31

After applying don’t use constraint, cell area increased by 16.73% which is significant.

6.2 PnR
For each stage in PnR, timing violations are observed. Placement, connectivity, physical cell related
issues are also resolved. For different area utilization, difference in PnR results is listed here.

Area utilization 70% VS 75%


Init stage
Increase in violating path number (setup) 568
Degradation in worst negative slack (setup) 50 ps
Total negative slack increased by (setup) 32.539 ns
Buffer cell number difference (decreased in 75% -336
util)
Inverter cell difference (increased in 75% util) +2377
Placeopt stage
Increase in violating path number (setup) 16614
Degradation in worst negative slack (setup) 116 ps
Total negative slack increased by (setup) 1809.98 ns
Buffer cell number difference (decreased in 75% -1526
util)
Inverter cell difference (increased in 75% util) +3215
0.24% H and 1.04% V routing overflow in 70% 0.24% H and 0.67% V routing overflow in
util 75% util
CTS stage
Increase in violating path number (setup) 10702
Degradation in worst negative slack (setup) 179 ps
Total negative slack increased by (setup) 856.326 ns
Decrease in violating path number (Hold) 80373
Improvement in worst negative slack (Hold) 3 ps
Total negative slack decreased by (Hold) 2951.125 ns
32

Buffer cell number difference (decreased in 75% -888


util)
Inverter cell difference (increased in 75% util) +6061
1.93% H and 3.46% V routing overflow in 70% 3.70% H and 5.27% V routing overflow in
util 75% util
Increase in total number of cells in clock path +2510
Route stage
Increase in violating path number (setup) 7669
Degradation in worst negative slack (setup) 109 ps
Total negative slack increased by (setup) 406.649 ns
Increase in violating path number (Hold) 896
Degradetion in worst negative slack (Hold) 93 ps
Total negative slack increased by (Hold) 10.373 ns
Buffer cell number difference (Increased in 75% +3646
util)
Inverter cell difference (increased in 75% util) +6066
0.14% H + 0.23% V routing overflow in 70% util 0.18% H + 0.30% V routing overflow in 75%
util
Increase in DRC violations No change – Both DRC clean
Table 6.1 PnR results comparison

As per given data, overall timing violation increases as area utilization increases. However both
are managable to close design.

6.3 IR analysis
IR analysis in 70% VS 75% utilization
IR analysis
Static IR
1.38% 1.33%
Dynamic IR
5.9% 9.45%
33

Power analysis in 70% VS 75% utilization


Power
Increment in leakage power -66.35 mW
Increment in dynamic Power 14.8 mW
Increment in Internal power 8.6 mW
Table 5.2 IR analysis and Power results

As area utilization increased, IR drop increased. However, power consumption decreased.


34

Conclusion
Here, results of different area utilization of same netlist are compared. For each stage
difference in results is noted. This results shows that 75% area utilization is managable to close
with good performance parameters. However, lower area utilization would be easier to close.
This also depends on logical design complexity.
When PnR with 80% initial area utilization is tested, results degrades significantly. Machine
run time per stage increases and routing is not even possible.
35

References
[1] J. Bhasker, R. Chadha, “Static timing analysis for Nanometer Designs”, Springer
[2] A. Teman, “Logic Synthesis”, [Link]
[3] [Link]
[4] Luca Amaru, Patrick Vuillod, Jiong Luo, Janet Olson, “Logic Optimization
and Synthesis: Trends and Directions in Industry”
[5] K. Bernstein, R. K. Cavin, W. Porod, A. Seabaugh, J. Welser, “ Device
and architecture outlook for beyond CMOS switches”
[6] Gary Yeap, “practical low power Digital vlsi design”, Springer
[7] Shen Lin and Norman Chang. (2001) “Challenges in power-ground integrity”,
International Conference on Computer Aided Design, Pages: 651 - 654
[8] Zhou, Yunyao Yan, Wei Yan, 2017,” A method to speed up VLSI hierarchical
physical design in Floorplanning Yanling”, IEEE
[9] Neil H. E. Weste and David Money Harris, 2009, ”CMOS VLSI DESIGN A CIRCUIT
AND SYSTEM PERSPECTIVE”, fourth edition, ISBN 10: 0-321-54774-8
[10] A. B. Kahng et al., 2011, “VLSI Physical Design: From Graph Partitioning to Timing
Closure”, Springer Science+Business Media B.V. 2011

You might also like