ASIC Physical Design in 12nm Technology
ASIC Physical Design in 12nm Technology
Project Report
MASTER OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION (VLSI) ENGINEERING
By
Pooja Kevat
(19MECV09)
May 2021
II
Acknowledgement
Experience shapes person, and having good initial experience in any field strengthens base of carrier.
Working with telanted team is an exciting opportunity as well as a challenge. In this internship I learned
various concepts of physical design and VLSI from training, experience and colleagues.
I am immensely greatful for internship opportunity provided by Open-Silicon Research [Link], A
Sifive Company. It was a pleasure to be associated with the Company. I thank company to providing me
with amazing opportunities which increased my knowledge and boosted my confidence. I would like to
thank Mr. Mayur Deshpande (Staff Engineer II and manager) to provide both technical and non-
technical support and being there when needed. His guidance and training are valuable. I would also like
to thank Mr. Anand J. Bariya to select me for internship and providing with good learning opportunities.
I thank every staff member who helped me in my learning process and it is indeed long list. A special
thanks to IT department also which maintained system throughout and made work from home really
easy.
I would also like to thank my internal guide Prof. Jayesh Kumar Patel to support me and guide me
during internship. His support and guidance is vital. I would also like to extent my thanks to all faculties
who taught me. This helped me in gaining fundamental knowledge required for internship.
Pooja Kevat
III
ABSTRACT
VLSI is a engineering related to design, verification and physical implementation of Integrated circuits.
It has been growing since invention of first integrated circuit. As technology advanced, it became
possible to reduce device dimentions. With that more and more devices were possible to be fabricated
on single IC.
Due to it’s requirements in communication devices and computing devices, research on IC went on to
improve performance. Today IC have dimension in nanometers and research on better devices is going
on. This have improved area performance but increased design complexity. As area of components
reduced, the routing requirement for devices per area also increased. This tradeoff limits performance of
IC. It needs tradeoff between three performance parameters: power, area and performance. ASIC is
application specific chip and is designed and optimized for given technology node and requirements
from architecture.
RISC-V architecture is developed and preserved as open architecture. Many people are contributing to
this effort. This architecture is designed by keeping in mind all the performance parameters, especially
power consumption. Physical design engineers converts architecture/ logic design into final IC. As
technology is shrinking, physical design is becoming increasingly challenging. Design Rule Manuals are
used to implement design optimally. These rules are becoming complex as technology node is shrinking.
Improving all performance parameters- power, timing and area is challenging. Many experiments are
done on single node single block design before getting perfect violation free output.
Reducing runtime of PnR processes is one of the issues to be resolved. Single error can waste a lot of
human working hours. Using tool smartly is also one of the skill an engineer can possess. Deep
understanding of input timing constraints is required. Wrong input causes flawed output with correct
process. Understanding of design is necessary, understanding critical paths is half the solution for issues
faced. To approach design with generic point of view helps a lot. If we understand every process and
have knowledge of undetected issues in one process that can affect another process, it is easier to resolve
issues.
In this report, various PnR stages are explained with optimization techniques. With each stage various
experiments are done to get final optimum solution. Here, results of different area utilization are
compared and discussed.
IV
INDEX
3.1 Introduction 9
3.2 Stages of PnR 9
4 STA 20
4.1 Introduction 20
4.2 SDC constraints 21
4.3 MMMC analysis 22
4.4 Timing optimizations 22
5 IR Analysis 24
5.1 Introduction 24
5.2 Static IR 25
5.3 Dynamic IR 26
5.4 Power analysis 26
6 Results 29
V
6.1 Synthesis 29
6.2 PnR 31
6.3 IR Analysis 32
7 Conclusion 34
8 References 35
LIST OF FIGURES
LIST OF TABLES
Figure Title Page
No. No.
NOMENCLATURE
Abbreviations
Chapter 1
Introduction
1.1 Introduction
VLSI is branch of engineering which deals with large scale integration of semiconductor devices
designed for various purpose. after invention of first IC, continuous research was done with goal
of decreasing size of devices and getting maximum utilization out of IC. As research went on, it
became cheaper and easier to manufacture consumer based electronics. Computers, mobile and
other electronic devices became integral part of life.
In figure above VLSI design cycle is described. This report is based on physical design for which
detailed process is described here.
Physical design convertes given input netlist into layout (GDSII) form. VLSI design process is
generally devided into two broad categories: front end and back end. Front end deals with logic
design while backend deals with converting given logic design into GDSII file which can be
given as input to foundary.
I. Front end
RTL designing
RTL design is basically a code describing required functionality with use of logical elements
used for digital design. This code are designed with HDL like VHDL, Verilog.
Functional verification
Functional verification of HDL description have become more challenging then designing
itself as technology improved. UVM, OVM are general methodologies developed to test
digital designs efficiently. Generally system Verilog is used to program testbench.
Synthesis of RTL netlist
Generic design described with HDL is converted into technology specific gate level netlist.
II. Back end
Physical design
Physical design is the process which converts gate level netlist into manufacturable GDS. It
is a complex process and consist of many stages like florrplanning, power planning,
placement, clock tree synthesis, routing, logical equivalence checks, physical verification and
mask data generation.
As process nodes are shrinking, design process is becoming more challenging. Here, regular design
techniques are described in physical design domain.
3
Chapter 2
Synthesis
2.1 Introduction
Synthesis is process to convert given RTL netlist into technology specific gate-level netlist. Required
inputs for logical synthesis are: library information, RTL netlist file and SDC constraint file. For
physical synthesis DEF file is also required for placement information.
2.2 Synthesis flow
Here, logical synthesis flow is discussed. There are various intermediate stages involved in this
process which are disussed here. They are listed here as: library reading, netlist reading, SDC
reading, elaboration, initial synthesis, incremental synthesis, report generation, sdc generation and
netlist generation.
With given inputs we might add extra constraints as per requirement.
1) Library reading
We need information about technology and foundry specific libreries in order to map logical
functionality described in RTL and logical elements defined in technology specific library.
Technology library gives information about timing, area and power consumption of design.
Primarily, .lib and .lef files are given as basic input. For some advanced synthesis flows def can
be required.
i. .lib (liberty file formate)
Liberty files contains information about timing, power (internal and leakage), operating
condition, area of cell and functionality. All the respective units for timing, power etc are
defined in .lib files.
Liberty files can be based on various timing models like CCS (Composite Current
Source), NLDM (Non Linear Delay Model) etc. A timing model consists of driver, net
and receiver model. Driver models are characterized by circuit simulators, same goes for
receiver model. However, for net model can be either estimated or extracted. NLDM
model is based on voltage source, while CCS model is based on current source.
ii. LEF (Library Exchange Format)
LEF are of two types: technology lef and cell lef/ macro lef. Technology lef files contains
information about various metal layers, their DRC rules, layer and via size information.
While cell/ macro lef contains information about geometry of cell/ macro.
iii. DEF (Design Exchange Format)
It contains floorplan information like, placement of macros, standard cells used in design,
I/O pin locations, physical cell placement. In detail it contains Die area, tracks, macro
placement information, I/O pin placement information, nets, blockage information, halo,
scan chain placement information, vias, slots, fills, regions, rows and metal layer
information.
Don’t use cell list generation
This is optional process. Synthesis just convrt an RTL input file into netlist. Synthesis
tool don’t have information about placement of cells, net lengths (rc information for
nets), area/ shape of block, pin placements etc.
When STA is done on synthesized netlist timing delays information might be too
optimistic. This causes less optimization and can create problems in later PnR stages.
5
At this stage we have generic netlist describing logic present in design. However, this is not
optimized fully. There might be some redundant logic still present in generic stage. This
redundancy is removed and wherever it is possible generic logic netlist is optimized.
6) Technology mapping
Here, gates from technology library is chosen in place of genenric gates. Perticular cells are chosen
in such a way that they provide enough drive strength with minimum area requirement.
7) Post mapping optimization
In post mapping optimization, gates sizes and drive strength are chosen in such a way that timing is
improved. After overall optimization, uniquification of instances is done. When logic is
instantiated multiple times, it is important to define each instantiation uniquely. This is done to
identify each instantiation at correct logic connection, so PnR tool can place it as required.
8) Results and reports
Synthesis outputs are a technology mapped netlist and an sdc generated for next stage. Most
Important synthesis reports are check design report, check timing report, QoR and pathwise
timing summary.
The idea of clock gating cell is to disable clock signal when there is no signal transition at input
of sequential element. We can implement clock gating cell with any combinational cell like
“and” or “or” gate. Clock can be one input and other is controlling signal. If controlling signal is
dominant signal than clock is blocked. If it’s non-dominant input, clock is propagated.
Dominant input 0
Non-dominant input 1 output
Clock
Figure 2.2 single cell clock gating example
This kind of logic is avoided in clock gating as it can cause spikes, which can cause failure of
circuit. To remove glitch, integrated clock cells are used as shown below. Clock gating cells are
defined in standard cell libraries. However, clock gating can be inserted with RTL coding also.
In synthesis clock gating attributes are defined saperately. Clock gating cells from library is
defined with “set_attribute lp_clock_gating_cell {cell_name}”. we can choose multiple clock
gating cell for this. We also need to specify minimum number of fanout from ICG as it is not
advisable to insert ICG for each individual flop saperately. This can be defined with
“set_attribute lp_clock_gating_min_flops {number}” command.
Here, clock signal is input to and gate as well as at negative triggered latch. This way, enable
signal is synchronized with clock. This removes possibility of glitch on output clock wave form.
When clock gating is applied, clock waveform becomes non-uniform. To deal with this different
kind of setup and hold checks are used. Above example is of “AND” gate based integrated clock
cell. For “AND” gate “0” is controlling input value. When enable is at logic low clock is not
propogagted. Similarly for “OR” gate “1” is controlling input value.
8
As shown in figure, setup check is done at clock edge when clock goes into non-controlling state from
controlling state. Failure in setup can cause glitch at leading edge of clock or clipping of clock signal.
Hold check is done at clock edge when clock goes into controlling state from non-controlling state.
Failure in hold check can cause glitch at trailing edge or clipping in clock signal.
9
Chapter 3
Place and Route
3.1 Introduction
Place and route is process to convert synthesized netlist into detailed physical layout in form of GDS
file. Its’ inputs are: Netlist, SDC, Timing analysis view definition file and various library files (lef, lib,
qrcTech, layermap files)
Based on information from synthesis output and reports; area estimation, uncertainty value, additional
constraints are defined.
Macro cells requires power connection and connection to physical cell properly. To give
proper power connection, minimum channel width is defined in such a way that atleast a
pair of voltage source and ground stripe can be routed through channel.
Channel width should accommodate all the necessary routing . Channel width should be
wide enough to let every pin of macro route through channel. If there is any
combinational logic present which is tightly connected to macro than it should be
constrained to placed near that particular macro.
Placement of macros is done as per design hierarchy. Macros defined in same modules/
design hierarchy are placed together as per requirement of their connection with each
other and ports.
pin
macro
result in increased standard cell placement in channels which can cause congestion issues.
Physical cells are added at macro boundary. End cap cells are added around macro, other
physical cells are added at channels available around macro.
Placement of blockages
There are various type of blockages: Placement blockage, routing blockage. Placement
blockage can be of various types. Hard placement blockage blocks every cell from being
placed in area. Soft blockage allows only bufferes and inverters type cells which are used
to maintain signal strength. Partial blockages allows placement of cells in given area upto
certain area optimization, beyond that it won’t allow placement of cells.
We can define regions which constraints tool to place certain cells in defined area. This is
also used in low power design techniques. To use Multi source voltage cells, cell using
same operating voltage are kept in same region.
Routing blockage fence/region
Partial blockage Placement blockage
As per floorplan, blockage requirement can be decided. Hard placement blockages are
kept at boundary to avoid congestion and signal integrity issues. To keep block IO signals
robust, IO buffers are placed near boundary.
12
Soft blockages can be applied in channel between macro if we don’t want to put
combinational logic in that area. Generally, partial blockages are used to control area
utilization near macros.
Routing blockages are applied over macros to reduce congestion. They can be created as
per requirement. We can create routing blockage for signal nets and let special nets
(power, clock) routed over some regions if required.
Power plan
In IC design, it is required to route power nets to each and every standard cell and macro.
We need to ensure that powr planning is robust with low IR drop. Good power planning
is required for overall performance of IC.
If IR drop is high than cell timing performance degrades, also static power consumption
increases. This causes low battery life and bad timing performance.
Generally nowadays bumps are used as to provide signals to design. Bumps are created
with highest metal layer available. From these bumps, power is routed to lower metal
layers with striped and vias.
As shown in figure, blues stripes are of metal 1, and rest of metal are connected to these
stripes. Optimum power routing is required for best performance. If powerplan is too
dense and utilizes more routing resources it create problem in signal routing.
Combination of stapling, stripes and via only connections is created for power planning.
Physical cell placement
Various physicl issues occures on IC which can cause failure of functionality or poor
performance and reduced life span of IC. Latch up issue, decrease in operating voltage
due to high current demand, discontinuity in n-well or p-well, non-uniform cell
distribution across IC are some examples of issues faced in IC.
Well TAP cells
Latch up issue cause direct current flow between VDD and ground. It can cause circuit
failure. To avoid it, N-well is connected to VDD and P-well is connected to ground using
well TAP cells. These cells avoids latch up issue. Also, TAP cells provide continuity of
wells. Well tap cells are placed in rows and distance between rows is decided using
design rule manual.
Endcap cells
These cells are placed at row start and end. Basically they define row start and end point.
Endcap cells are placed to avoid cell damage at time of fabrication. During fabrication
process, probability of poly variation is high at edges.
Decap cells
MOSFET based capacitor cells added to ensure constant operating voltage when
instantaneous current demand is high. When switching activity is high, large amount of
instantaneous current is drawn from power rails. This can force cells to go in metastable
stage which is undesirable. To avoid this Decap cells are added, they store voltage and
15
supply voltage when required. These are added at regular interval in rows. Decap cells
contribute to lekage power consumption.
Filler cells
Entire block/ IC region is never filled entirely with functional cells. Some gaps are
always present there. This can cause DRC violations at base layer and well discontinuity.
To avoid these issues filler cells are added. These are small cells with no functionality.
Generally after routing stage filler cels are added.
Tie cells
During ASIC design, some cells might be assigned constant value. They can be
connected to either power or ground. We can’t connect cells directly to power/ ground
rails due to probability of ground bounce/ voltage drop. These cells are connected to
power/ ground via Tie cells.
Spare cells
These cells are added to apply ECO (Engineering Change Order) if required. A clean
GDSII file is set to foundary, however there are still chances for bugs in design. To
accommodate for this situation, spare cells are added. Spare cells are placed across the
design. We can change masks of metal layers according to change in spare cell
connection.
3.2.3 Placement
After macro placement, powerplanning, physical cell placement standard cells are placed. To
reduce area reuirement power reils are shared. Cells are flipped between rows.
Timing driven placement is done to reduce timing violations. It can be either net based or timing
path based
In net based placement, nets are given weights based on logic connected to net and timing
criticality of net. Cell placement is based on weight of net.
Placement of cells is based on critical timing paths in critical path based placement.
Timing is checked after each iteration of refined placement.
Placement constraints
Various placement constraint are applied in order to avoid violations. It is desired to have
persistent area utilization across IC. To avoid congestion issue we can define maximum
cell density for block.
16
Guide
Guides are used as assistance for tool to place cell. It is not hard constraint. Cells defined in
guide can be plced outside of guide also. Similarly, cell outside deifnintion of guide are
allowed to be placed inside.
Fence
Fence is hard constraint for placement. Cells defined under fence are strictly placed inside
fence. Cells not defined for fence are strictly placed outside of fence. Fencing is done
with low power design techniques when multi power supply cells are used. It is required
to place cells with given operating voltage in area containing appropriate powerplan.
Region
Region definition prohibit cells defined for egion placed outside region boundary. But cells
not defined for region can be placed inside region.
For placement, apart from timing congestion is observed. Congestion describes number of extra
metal layer required in either horizontal or vertical direction. If there are congestion hotspots
than we need to remove them.
We can reduce congestion by adding partial blockages, increase macro channel width, avoid using
high pin density standard cells.
During placement timing based optimization is done by tool to reduce timing violations.
To increase drive strength, original standard cell is replaced by standard cell larger size.
This reduces slew.
Output of standard cell can be buffered to retain signal strength. This path also can be
split in buffered branches if required.
Cells can be cloned, outputs are devided between clones to improve timing.
Pin swapping is done if possible.
Area utilization is kept around 65-70% in order to avoid DRC violations and timing, congestion
related issues
Clock delay
Clock skew
It is difference between arrival time of clocks at different registers.
For clock tree synthesis goals are: to minimize skew, power (high switching activity), noise in
clock signals, slew rate etc.
Clock is strong aggressor due to high switching activity. Shielding of nets is done and NDR (non
default rules) are applied to retain signal integrity.
Clock nets can be shielded by either power source or grounds signal carrying nets. Generally
metal DRC have 1w1s (single width single spacing) rule. However clock signal can cause signal
integrity issues for default relues. Non-default rules are defined to avoid this issue. Example of
NDR is 1w2s, 2w1s, 3w2s rules.
After clock tree is built, both setup and hold based optimization is done to improve timing.
For trunk nets higher metal layers are used while for leaf connections lower metal layers can be
sufficient.
19
3.2.5 Routing
After optimizing CTS, routing is performed. All the signal nets are routed with goal of minimizing DRC and
timing violations. After routing, filler cells are added in netlist. Output def, netlist are generated. SPEF is
generated with QRC extraction.
3.2.6 Outputs
After routing is finished, metal filling is done to prepare GDSII output file. In this process metal polygons
are added where metal routing density is low and have large spaces in between. This increases yield and
reduces probability of physical damage. Bumps are added at top level block for connectivity.
20
Chapter 4
STA
4.1 Introduction
Static timing analysis is technique to validate timing performance of given design. It splits input
design into various timing paths and calculate signal propagation delay along each path except
defined exemption. Path elsements are defined as: begin point, combinational network, end point.
Paths are defined with type of start point and end points encountered. Most common path groups are:
in2reg, reg2out, reg2reg, in2out.
Figure 11 Pathgroups
Chapter 5
IR analysis
5.1 Introduction
Having good power delivery network is highly important in IC design. Every cell should get proper
operating voltage. If operating voltage itself would be low, signal strength would be lower and
circuit might go in metastability state.
To prevent weak power delivery network, IR analysis is done. IR analysis reveals voltage drop value
between power source and sync (standard cell/ macro power pin). Due to parasitic effect, metal
layers have specific amount of paracitic capacitance and resistance. This causes voltage drop in rails.
This results in less operating voltage to power pins of cell/ macro.
In given figure below, V2 = V1 - I.R
IR drop and ground bounce are define as the variations in voltage caused by current flowing through
rails having distributed paracitics.
Ground Bounce:
Ground bounce is increase in voltage at VSS/ GND, which should be zero always. This occurs on
ground networks (VSS or GND) in IC. Due to combination of small amount of current at ground
network and paracitic effects in network, increment of voltage is observed in the ground voltages
around the chip.
25
5.2 Static IR
Static IR drop is an average voltage drop occurs for the design. It is dependent on the paracitic RC value
of the power grid connecting the power supply to the respective standard cells. The average current
depends totally on the time period, unlike instantaneous current. Static IR drop highlights power grid
weakness of the design.
5.3 Dynamic IR
Dynamic IR drop is a voltage drop due to the high switching activity of transistors. It occurs when there
is an increasing demand for current from the power supply due to high switching activities of the chip.
Dynamic IR drop depends on the switching time of the logic and is less dependent on the clock period as
clock period defines avarage current.
Dynamic IR drop takes the instantaneous current drawn from the power grid into account, which arises
in the switching event . The Average current depends totally on the time period as it is total current
passed during cycle. While the dynamic IR drop depends on the instantaneous current which is higher
while the cell is switching. Dynamic IR drop Evaluates the IR drop caused when large amounts of
circuitry switch simultaneously, causing peak current demand.
power is devided into: Internal power, leakage power and switching power. Three of them are
derived as given.
5.4.1 Internal Power
Internal power is calculated from LIBS. It needs Load capacitance , slew and frequency & toggle rate.
We get Load capacitance from SPEFs , slew & frequency from timing files. Slew rate is defined as the
maximum rate of output voltage change per unit time.
Leakage power is also defined in .lib. For given cell, leakage power is defined for each input
combination with respect to power and ground sources.
5.4.3 Switching power
Switching Power is due to the charging and discharging of total load, which includes the output
capacitors and other parasitic capacitors.
Pswitch = α.(Vdd)[Link].f
Figure 5.7 Switching power
α = Activity Factor
Vdd = supply voltage
CL = total load capacitance
f = frequency of operation
29
Chapter 6
Results
6.1 Synthesis
In synthesis, design sanity checks should be clean. Calculated power and area are reported. Timing is
checked and violated timing paths are analysed.
In this report, design related issues are described. There shouldn’t be any unresolved references or
empty modules in design. there shouldn’t be any “assign” statements in netlist. Assigns statements are
non-synthesizable and this kind of syntax should be converted into tie logic. Library sets should be
defined properly. Every .lib has a corresponding lef. Missing lef can cause design failure. Generally, in
synthesis there are no physical cell insertion.
30
There shouldn’t be any illegal timing issues in final netlist. Clocks should be defined properly with
correct port name. there shouldn’t be any unconnected/ logic driven nets except generated clock which
are defined in SDC. Sequential deata pins shouldn’t be driven by clock signel. Every register should be
defined with unique clock per mode for timing. There shouldn’t be any ambiguity about it.
QoR report
This report shows statistics of various type of cells. Give information about timing summary,
area and power. Here, change in area report is shown, before and after applying don’t use cell
constraint.
Figure 6.3 Area report before implementing don’t use cell constraint
Figure 6.4 Area report after implementing don’t use cell constraint
31
After applying don’t use constraint, cell area increased by 16.73% which is significant.
6.2 PnR
For each stage in PnR, timing violations are observed. Placement, connectivity, physical cell related
issues are also resolved. For different area utilization, difference in PnR results is listed here.
As per given data, overall timing violation increases as area utilization increases. However both
are managable to close design.
6.3 IR analysis
IR analysis in 70% VS 75% utilization
IR analysis
Static IR
1.38% 1.33%
Dynamic IR
5.9% 9.45%
33
Conclusion
Here, results of different area utilization of same netlist are compared. For each stage
difference in results is noted. This results shows that 75% area utilization is managable to close
with good performance parameters. However, lower area utilization would be easier to close.
This also depends on logical design complexity.
When PnR with 80% initial area utilization is tested, results degrades significantly. Machine
run time per stage increases and routing is not even possible.
35
References
[1] J. Bhasker, R. Chadha, “Static timing analysis for Nanometer Designs”, Springer
[2] A. Teman, “Logic Synthesis”, [Link]
[3] [Link]
[4] Luca Amaru, Patrick Vuillod, Jiong Luo, Janet Olson, “Logic Optimization
and Synthesis: Trends and Directions in Industry”
[5] K. Bernstein, R. K. Cavin, W. Porod, A. Seabaugh, J. Welser, “ Device
and architecture outlook for beyond CMOS switches”
[6] Gary Yeap, “practical low power Digital vlsi design”, Springer
[7] Shen Lin and Norman Chang. (2001) “Challenges in power-ground integrity”,
International Conference on Computer Aided Design, Pages: 651 - 654
[8] Zhou, Yunyao Yan, Wei Yan, 2017,” A method to speed up VLSI hierarchical
physical design in Floorplanning Yanling”, IEEE
[9] Neil H. E. Weste and David Money Harris, 2009, ”CMOS VLSI DESIGN A CIRCUIT
AND SYSTEM PERSPECTIVE”, fourth edition, ISBN 10: 0-321-54774-8
[10] A. B. Kahng et al., 2011, “VLSI Physical Design: From Graph Partitioning to Timing
Closure”, Springer Science+Business Media B.V. 2011