Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cell- based design Overview of AsAP Implementation of the first generation AsAP Standard cell based IC vs. Custom design IC Standard cell based IC: Design using standard cells Standard cells come from library provider Many different choices for cell size, delay, leakage power Many EDA tools to automate this flow Shorter design time Custom design IC: Design all by yourself Higher performance Standard cell based VLSI design flow Front end System specification and architecture HDL coding & behavioral simulation Synthesis & gate level simulation Back end Placement and routing DRC (Design Rule Check), LVS (Layout vs Schematic) dynamic simulation and static analysis Outline Overview of standard cell-based design Overview of AsAP Implementation of the first generation AsAP AsAP (Asynchronous Array of Simple Processors) A processing chip containing multiple uniform simple processor elements Each processor has its local clock generator Each processor can communicate with its neighbor processors using dual-clock FIFOs Diagram of a 3x3 AsAP More information: http://www.ece.ucdavis.edu/vcl/asap/ Inst Mem ALU MAC Control Data Mem Clock In- FIFO0 In- FIFO1 Output Outline Overview of standard cell-based design Overview of AsAP Implementation of the first generation AsAP Simple diagram of the front- end design flow System Specification RTL Coding Synthesis Gate level code INV (.in (a), .out (a_inv)); AND (.in1 (a_inv), .in2 (b), .out (c)); Ex: c = !a & b Simple diagram of the back- end design flow gate level Verilog from synthesis Place & Route Final layout (go for fabrication) DRC Gate level Verilog LVS Timing information Gate level dynamic and/or static analysis Design rule check Layout vs. schematic Back-end design of AsAP Technology: TSMC 0.18 m CMOS Standard cell library: Artisan Tools Synthesis: Synopsis Design compiler Placement & Route: Cadence Encounter DRC & LVS: Calibre Static timing analysis: Primetime Flow of placement and routing Import needed files Floorplan Placement & in-place optimization Clock tree generation Routing Import needed files Gate level verilog (.v) Geometry information (.lef) Timing information (.lib) INV (.in (a), .out (a_inv)); AND (.in1 (a_inv), .in2 (b), .out (c)); INV: 1um width AND: 2 um width INV: 1ns delay; AND: 2 ns delay INV AND a b C Delay (a->c): 1ns + 2ns = 3ns Floorplan Size of chip Location of Pins Location of main blocks Power supply: give enough power for each gate VDD (Metal) Power supply (1.8V) current Gate 1 Gate 2 Gate 3 Gate 4 1.75v Voltage drop equation: V2 = V1 I * R 1.7v (need another power) 1.65v VSS Floorplan of a single processor Inst Mem Clock InFIFO 0 Data Mem ALU MAC Control InFIFO 0 Placement & in-placement optimization Placement: place the gates In-placement optimization Why: timing information difference between synthesis and layout (wire delay) How: change gate size, insert buffers Should not change the circuit function!! Placement of a single processor Clock tree Main parameters: skew, delay, transition time Clock tree of single processor Routing Connect the gates using wires Two steps Connect the global signals (power) Connect other signals Metal Layer Topology Routing Layout of a single processor Area: 0.8mm x 0.8mm Estimated speed: 450 MHz Layout of the first generation 6x6 AsAP One processor Area: 30 mm^2 in 180 nm CMOS 36 processors 114 PADs Verification after layout DRC (design rule check) LVS (layout vs. schematic) .GDS vs. (verilog + spice module) Gate level verilog dynamic simulation Mainly check the function Different with synthesis result Useful tools Dynamic Simulation: Modelsim (Mentor), NC-verilog (Cadence), Active-HDL Synthesis: Design-compiler, design-analyzer (Synopsys) Placement & Routing Encounter & icfb (Cadence) Astro (Synopsys) DRC & LVS Calibre (Mentor) Dracula (Cadence) Static Analysis Primetime (Synsopsys)