

# Engineering Design Automation (EDA) Tools for Super-Conducting Electronics (SCE) ARM Summit

Jamil Kawa, Synopsys Fellow 9/17/2018



Taking SCE from "Hand Crafted" Circuits to the VLSI Era

A Comprehensive EDA Flow

Main EDA Tools Considerations & Challenges

Fast-forwarding Time to Results: DTCO

Final Thoughts

Acknowledgements



### **Motivation of SCE's VLSI Target**

**CPU Clock Speed & Power Consumption** 

- CPU clock speed has remained stagnant since 2001 at 3Ghz and below
- Energy for world computing is projected to exceed production capacity by 2040
- Many innovations in CMOS like parallelism, low power, etc. have moved us forward....
- How much more runway?



### Taking SCE from "Hand Crafted" Circuits to the VLSI Era The IARPA Challenge !

#### The IARPA Challenge:

- To provide SCE EDA software supporting 1M gates/ 10M JJ
- To demonstrate a working processor, such as ARC, RISC 32/64, OpenSparc, or Leon3 (SparcV8) at 100GHz

#### **CRITERIA**:

• Support one or more SCE logic families, enable Scalability, QofR, Speed of Tool, flexibility to handle standards, multiple clocks



### **Overview: VLSI Flow**



### A Simplified CMOS ASIC / SOC Design Flow SCE vs CMOS



- SCE flow "Rhymes" well with CMOS flow
- Sub-flows will have SCE specific variations dictated by unique features of SCE technology
  - ✤ JJ 1-D and 3-D modeling
  - JJ, 2 terminal device has no voltage gain, fan-in = fan-out =1.
    - Need for Splitters and confluence gates
  - Power delivery: high currents at very low voltages
    - > Calls for special power delivery strategies
  - ✤ Interconnect is a JTL / PTL vs. a "wire"
- Lots we can learn from CMOS:
  - DTCO, DFM techniques (OPC, etc.)



### **SCE Simplified EDA Flow**



#### **SYNOPSYS**°

# **DTCO: Design Technology Co-Optimization**

DTCO is a Famous Acronym Now – But You Need a Complete Flow to realize full benefits



### **Layout vs Schematic**



DRC = Design Rules Check LVS = Layout vs Schematics



#### LVS Process:

- Extract a hierarchical netlist from the layout based on primitive device definitions
- Compare the extracted netlist to the design schematic

#### LVS applied to a SCE testcase:

- LVS maps devices extracted from the layout to devices in the schematic
- ind1 parasitics in the schematic are filtered (shorted) before comparison
- Ports are defined at D, Q, and CK
- Shunt resistors on layout are not extracted but used to calculate effective Vc across a JJ

# TCAD

- TCAD simulation
  - Predictive simulation of microscopic physics
  - Technology path finding
  - Detailed simulation of one (or few) devices
  - Acts as calibration tool for higher-level tools such as compact models
  - Generate data for model parameter extraction prior to fabrication
    DTCO: select between technology options
- EDA tools for superconductivity are lacking
  - TCAD tools are virtually nonexistent
  - Need for industry-level superconductivity-enabled TCAD tools
- Study of viable theories of superconductivity
  - The London equations
  - Ginzburg-Landau equation
  - BCS theory
  - Bogoliubov-de Gennes equation
  - Gor'kov equation

#### **TARGET**

- Formulate the details of the physics based model to simulate the steady-state current versus voltage/ phase behavior of an ideal onedimensional Josephson junction
- Expand to model to cover 3D behavior of the Josephson junction

### **Post Layout Parasitic Extraction**

### **Cell level extraction flow:**

- EDA tool extracts inductance and capacitance using field solvers
- Highest accuracy, for cells and small designs



### **Front-end EDA Tools Integration**

- We covered in addition to TCAD the following front-end tools
  - Schematic capture
  - -Layout
  - -DRC
  - -LVS
  - Extraction
  - Simulation

A well designed EDA tool allows the designer to invoke and run any and all the tools needed for the front-end implementation of the design from the same GUI interface. This capability shortens the design debug cycle tremendously

### **Design in an Integrated EDA Platform (Custom Compiler ex)**

- 256-bit shift register design and simulation in CustomCompiler environment
  - Schematic, Netlisting, DRC, LVS, HSPICE simulation launch, Waveform display (with WaveView)



### Synthesis Challenge: Extend Traditional Synthesis Methods to SCE

#### **SCE Technology Implications for Synthesis**

- Gate inputs signals arrive at the same time/clock
  - Wave pipelining insert clocked buffers
- Fan-out restriction
  - Insertion of splitter trees
- Favors new efficient logic primitives
  - Boolean extraction and native algebras
- Empower traditional multi-level synthesis algorithms with this information:
  - Area optimization aims at maximizing logic sharing
    - But this creates high fanout gates -> splitter cost
  - Depth (logic levels) minimization as main timing goal
    - Correlates with latency of computation in gate-clocked scenario
  - XOR/MAJ extraction and manipulation
    - XOR methods and MAJ methods in synthesis
  - Balancing levels through all paths
    - Minimize buffer insertion



### Synthesis Challenge: Extend Traditional Synthesis Methods to SCE





### **Digital Platform Timing Signoff Flow**





# SCE Static Timing Analysis & Issues (SFQ Example)

- Delay dependence on cell parameters
  - -Parameter and bias variations
- Best/worst case corner design vs parameter margins
- Clocked logic gates (setup/hold time)
  - Clocked register and combinatorial logic data path vs clocked data path
- Data dependent gate delays
- Interconnect delays and delay uncertainty
  - -Jitter in JTL
  - -Parameter variations in JTL and PTL driver/receiver
- Power estimation
  - -Similar to CMOS



2ALIGh2A2

### **Timing Closure Challenges cont'd**

- Margin / variation handling
  - –We can analyze variation with POCV (Parametric On-Chip Variation) method.
  - -Modeling with single random variation may not be sufficient.
- Deep clock distribution network coupled with large variation and small clock cycle (compared to gate delay) can make timing closure difficult.
  - -CRPR (clock re-convergence permission reduction) can reduce, but not eliminate the problem.

### **Power Delivery & VLSI Realization for SCE**



# **Signal Routing for SFQ Circuits**

- Composition of interconnect
  - -Two different types available
    - –Josephson transmission lines (JTL)
    - –Passive transmission lines (PTL)
- Optimal length and width
- Repeater/buffer insertion
  - –Number and placement of buffers
  - –Similar to repeater insertion in CMOS





# PTL vs. JTL Wiring

- Two types of interconnect in SFQ circuits
  - –Josephson transmission lines (JTL)
    - -Non-storage inductance between JJs
    - Delay depends on number of stages, JJ sizes, bias
  - -Passive transmission lines (PTL)
    - -Stripline, driver and receiver circuits
    - –Delay depends on length of line and driver/receiver delay
- PTL require driver/receiver overhead
  - –Delay, area
  - -CMOS-like routing
- JTLs need to be abutted



SALIONSA2

### Placement Considerations – on PTL & JTL

- If we use only PTL routing the placer and legalizer flows should just work as-is, this flow is compatible with CMOS.
- General JTL or abutted routing are not compatible with current CMOS flows because the tools will move cells around with no concept of preserving space for the JTL routing.
- JTL or abutted cells can still be used but the cell groups must be treated as macros for placer and legalizer.
- A possible hybrid flow could introduce macros automatically after one pass of course placement so the short net JTL or abutment can be inferred and then locked down as macros for subsequent placer passes and for legalizer.

### **Bias Network for RSFQ/ERSFQ**

- RSFQ biased by resistive tree
- ERSFQ biased by inductive tree with current controlling JJs
  - Large inductors required for small current variations
  - –Large feeding JTL (FJTL) provides average voltage source
    - -Connected to clock line



D. Kirichenko, S. Sarwana, and A. Kirichenko, "Zero Static Power Dissipation Biasing of RSFQ Circuits," *IEEE Transactions on Applied Superconductivity*, vol. 21, no. 3, pp. 776–779, January 2011.



### Standard Cells Libraries, ERSFQ and AQFP

Posters Wed Aug 29 13:15 -16:30



# **ERSFQ D Flip-Flop**





### Example Characterization of Library Cell - XOR 🛞





| Parameter | Min %  | Max % |
|-----------|--------|-------|
| XI        | -44.94 | 39.75 |
| XJ        | -26.79 | 38.89 |
| XL        | -49.26 | 70.00 |

| The individual | elements |
|----------------|----------|
| margins > ±30% |          |

| Sr. No | Corners          | Pass Rate |
|--------|------------------|-----------|
| 1      | Nominal          | 100%      |
| 2      | Fast             | 100%      |
| 3      | FastFast         | 100%      |
| 4      | Slow             | 100%      |
| 5      | SlowSlow         | 100%      |
| 6      | Slow_dR_0        | 99%       |
| 7      | Slow_dR_1        | 98%       |
| 8      | Fast_dR_0        | 100%      |
| 9      | Fast_dR_1        | 98%       |
| 10     | XI_25_0 (X⊨0.75) | 99%       |
| 11     | XI_25_1 (X⊨1.25) | 100%      |

| Parameter | Description                    | Time, ps<br>(Nominal) |
|-----------|--------------------------------|-----------------------|
| t_A_CK    | Minimum<br>clock after<br>data | 10                    |
| t_CK_A    | Minimum<br>data after<br>clock | 11                    |
| t_CK_Q    | Clock to<br>output delay       | 7.55                  |

# **Summary and Conclusions**

- SCE is embarking on its own Moore's Law aided by strong support from EDA tools
- SCE flow "Rhymes" well with CMOS flow
  - -Can re-use a significant amount of the CMOS EDA infra-strucurte
  - -Sub-flows will have SCE specific variations dictated
  - DTCO can seriously speed up SCE process maturity
- Arriving at true VLSI scale (even LSI) has significant challenges
  - Power delivery: serial vs. parallel, delivering "huge" currents at very low voltage
  - -Narrow "pulse arrival window" -> path balancing a challenge
  - -Lack of large-scale memory
  - Flux trapping
- Current available processes are confined to < 10 layers of metal
  - -Need to seriously address and evaluate the needs of VLSI
- SCE is a "new old" technology with vast potential
- Let's bring VLSI to SCE

### Acknowledgements

This work is partially supported by <u>IARPA</u> through a 5 years program to create EDA tools for SCE in support of VLSI automation capable of 1M gates / 10M JJ design

We also like to acknowledge our partners

#### > HYPRES

- University of Rochester (Prof Eby Friedman and team)
- > YNU (Prof Nobuyuki Yoshikawa and team)
- Stony Brook (Prof Dimitri Averin and team)

We also like to acknowledge our colleagues in the **COLDFLUX** team also focused in this area and also supported by <u>IARPA</u>





().unique)

# Thank You

meneys Inc. Proprietary and Confidential

funct