

**ICCAD 2017 Tutorial** 

# Standard Cell Design and Optimization Methodology for ASAP7 PDK

Xiaoqing Xu, Nishi Shah, Andrew Evans, Saurabh Sinha, Brian Cline and Greg Yeric Arm Inc

xiaoqing.xu@arm.com

10/15/2017

### **Outline**

**ASAP7 PDK** 

Standard Cell Library Design and Optimization

Design Synthesis and Exploration

How to Download and Use

Summary

### **ASAP7 PDK**

Predictive 7nm Process Design Kit – Arm and ASU: <a href="http://asap.asu.edu/asap/">http://asap.asu.edu/asap/</a>

FinFET with discrete transistor sizing

Transistor geometries

• 20/54nm gate length/pitch, 27nm fin pitch.

Key design rules

- 18/36nm metal-1 width/pitch (two-dimensional layout with EUV)
- Metal minimum tip-to-tip 31nm, metal minimum tip-to-side: 25nm
- Minimum horizontal distance between diff-net active areas: 92nm



- Diffusion break
- Horizontal metal routing
- Vertical metal routing
- Gate contact
- Gate cut usage



- Diffusion break
- Horizontal metal routing
- Vertical metal routing
- Gate contact
- Gate cut usage



- Diffusion break
- Horizontal metal routing
- Vertical metal routing
- Gate contact
- Gate cut usage



- Diffusion break
- Horizontal metal routing
- Vertical metal routing
- Gate contact
- Gate cut usage

# Standard Cell Library Design and Optimization



### **Standard Cell Architecture**

#### 9-Track and 7.5-Track



| SC architecture                        | 9-track | 7.5-track |
|----------------------------------------|---------|-----------|
| Total # of fins                        | 12      | 10        |
| # of fins per transistor               | 4       | 3         |
| # of metal-1 tracks for signal routing | 8       | 5.5       |
| # of metal-2 track for signal routing  | 8       | 6         |
| metal-2 and metal-1 track offset (nm)  | 0       | 9         |

### **Exhaustive Transistor Sizing**

NAND2\_X1N under 7.5-track architecture

- Exhaustive timing simulations to choose the balanced rising and falling slew/delay
- NAND2\_X1R and NAND2\_X1F, rising/falling dominated cells









### **Transistor Placement**



AOI31\_X2N: consistent Euler path for pull-up and pull-down logic



[Uehara+, DAC'1979] [Maziasz+, DAC'1987]

MXT2\_X1N: pass-gate-based multiplexer

Multigraphs for PMOS/NMOS are no longer dual





(S0, A, S0, ns0, B, ny)



(S0, A, ns0, S0, B, ny)

MXT2\_X1N: pass-gate-based multiplexer

Area-compact placement leads to routability issues: pin A is blocked





MXT2\_X1N: pass-gate-based multiplexer

Area-compact placement leads to routability issues: pin A is blocked





MXT2\_X1N: pass-gate-based multiplexer

A different area-compact placement solution: pin A is accessible





MXT2\_X1N: pass-gate-based multiplexer

A different area-compact placement solution: pin A is accessible





## **Cell Layout Comparisons**

### LATNQ\_X1N









7.5-track: 13 poly-pitch wide Normalized area: 97.5 Single gate diffusion

9-track:11 poly-pitch wide Normalized area: 99 Gate cut usage

### **FO4 Comparisons**

Fan-out-4 (FO4) for basic logic cells

9-track cells provide smaller delay by consuming higher power/area





# Design Synthesis and Exploration



### **Design Synthesis Flow**

Arm<sup>®</sup> Cortex<sup>®</sup>- M0 processor from Arm DesignStart<sup>™</sup> portal

7.5-track/9-track minimum/alpha SC library

Cadence® Genus™ Synthesis Solution, v15.12 & Innovus™ Implementation System, v15.10



Cadence Reference Flow: up-to postrouting stage Evaluation metrics:

- Frequency, Power, Leakage, WNS
- TNS, Utilization, gate count and area

### **Explore Standard Cell Architecture**

Total negative slack (TNS) and worst negative slack (WNS): 9-track lib pushes the frequency





# **Explore Library Richness - 9-track libraries**

Total negative slack (TNS) and worst negative slack (WNS): alpha lib pushes the frequency





# **How to Download**



### **Arm DesignStart Portal**

#### Arm DesignStart – University Program

https://developer.arm.com/products/designstart/university-program



Coming soon !!!

# Suggested Research Topics with the ASAP7 Standard Cell Library



## **Sizing with One-Fin Transistor**



Current libraries are designed with minimum 2 fins per transistor

One-fin transistor has variation concerns but benefits cell timing/power



Resize the SDFFQ\_X1N with one-fin transistor

- Setup time: 11.8ps → 9.6ps (18.6%), Clock-to-Q delay: 42.1ps → 40.0ps (5%)
- Energy delay product (EDP): 8.15  $\rightarrow$  7.13 (10<sup>-17</sup> J\*s) (12.5%)

## Research Topics for Standard Cell Design Methodology

### Transistor sizing

- How to avoid brute-force efforts for transistor sizing?
- What is the library-level advantage of enabling one-fin transistor?

### Squeeze the track height

- How far can you reduce the track height?
- 5-track cells IMEC at IEDM 2016

Multi-row height cells – design and design automation

- How to place and route transistors across multiple rows?
- What set of cells (not just flops) should be designed across multiple rows?

### **Broader Research Topics**

### **Automatic Cell Synthesis**

- the multigraph is not always Eulerian
- the "best" transistor placement is not always routable
- the "best" solution could be technology/architecture-dependent
- Automatic cell synthesis to beat our "alpha" quality in terms of PPA?

Technology-independent stick diagram generation

- placement and routing are co-optimized under lexical cost formulation
- generate more-than-one solution to break technology/architecture dependence

Design-technology co-optimization, reliability, hardware security and accelerator designs

### A Successful and Published Example for Aging Research

Layout-dependent aging behaviors



Aging mitigation for critical-path timing

- Aging models w/ ASAP7 PDK Peking University
- Aging optimization with detailed placement UTDA
- Che-Lun Hsu et. al, "Layout-Dependent Aging Mitigation for Critical-Path Timing" at ASP-DAC 2018

[Ren+, IEDM'2015]

SA – Length between gate and edge of diffusion

ODS – Active to active spacing

SPM – Poly extension from active

| SA ↓  | NBTI, HCI&PBTI↑ |
|-------|-----------------|
| ODS ↓ | NBTI, HCI&PBTI↑ |
| SPM ↓ | NBTI ↓          |

### **Summary**

Standard cell library design and optimization methodology

- Transistor sizing, placement and routing
- Front-end and back-end views built, tested and freely available for academic usages

| Vt options         | Track heights        | PVT corners                                                                                                                    | Cell views                                                                         |
|--------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| RVT<br>LVT<br>SLVT | 7.5-track<br>9-track | ff_typical_max_0p77v_25c ff_typical_max_0p77v_m40c ss_typical_max_0p63v_125c ss_typical_max_0p63v_25c tt_typical_max_0p70v_25c | cdl, db, db-ccs-tn<br>gds2, gds2-ascii, LEF,<br>lib, lib-ccs-tn, spice,<br>verilog |

### **Summary**

Standard cell library design and optimization methodology

- Transistor sizing, placement and routing
- Front-end and back-end views built, tested and freely available for academic usages

**Design Synthesis and Exploration** 

Library architecture and richness explorations

How to Download and Use

Arm DesignStart portal – university program

| Freq.<br>(GHz) | SC arch.  | TNS<br>(ps) | Power<br>(mW) | Gate area<br>(um²) |
|----------------|-----------|-------------|---------------|--------------------|
| 1.0            | 7.5.track | -893        | 2.26          | 1537.9             |
|                | 9-track   | 0           | 2.21          | 1646.9             |
| 0.7            | 7.5-track | 0           | 1.29          | 1306.9             |
|                | 9-track   | 0           | 1.41          | 1463.5             |

Multiple research topics of interest and a successful/published research study ©

Thank You! Danke! Merci! 谢谢! ありがとう! **Gracias!** Kiitos! 감사합니다 धन्यवाद

