# ECE 574 Advanced Digital System Design What is an ASIC? Why do we need it? Patrick Schaumont Electrical and Computer Engineering pschaumont@wpi.edu # A few cryptographic chips ## Why Is All That Hardware Needed? Figure 1. Energy efficiency of the Advanced Encryption Standard on five platforms. The energy efficiency, which spans seven orders of magnitude, is expressed as the number of gigabits the system can encrypt per joule (www.ee. For the same functionality, hardware is more energy efficient than software P. Schaumont and I. Verbauwhede, "Domain-specific codesign for embedded security," in Computer, vol. 36, no. 4, pp. 68-74, **April 2003**, doi: 10.1109/MC.2003.1193231. # **Need for Efficiency Became A Call to Arms!** **Table 1. Speedups from performance engineering a program that multiplies two 4096-by-4096 matrices.** Each version represents a successive refinement of the original Python code. "Running time" is the running time of the version. "GFLOPS" is the billions of 64-bit floating-point operations per second that the version executes. "Absolute speedup" is time relative to Python, and "relative speedup," which we show with an additional digit of precision, is time relative to the preceding line. "Fraction of peak" is GFLOPS relative to the computer's peak 835 GFLOPS. See Methods for more details. | Version | Implementation | Running time (s) | GFLOPS | Absolute speedup | Relative speedup | Fraction<br>of peak (%) | |---------|-----------------------------|------------------|---------|------------------|------------------|-------------------------| | 1 | Python | 25,552.48 | 0.005 | 1 | _ | 0.00 | | 2 | Java | 2,372.68 | 0.058 | 11 | 10.8 | 0.01 | | 3 | С | 542.67 | 0.253 | 47 | 4.4 | 0.03 | | 4 | Parallel loops | 69.80 | 1.969 | 366 | 7.8 | 0.24 | | 5 | Parallel divide and conquer | 3.80 | 36.180 | 6,727 | 18.4 | 4.33 | | 6 | plus vectorization | 1.10 | 124.914 | 23,224 | 3.5 | 14.96 | | 7 | plus AVX intrinsics | 0.41 | 337.812 | 62,806 | 2.7 | 40.45 | C. E. Leiserson, N. C. Thompson, J. S. Emer, B. C. Kuszmaul, B. W. LHampson, D. Sanchez, et al., "There's plenty of room at the Top: What will drive computer performance after Moore's law?" in Science, American Association for the Advancement of Science, vol. 368, no. 6495, **2020**. https://doi.org/10.1126/science.aam9744 # **Need for Efficiency Became A Call to Arms!** Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2017 by K. Rupp - In the future, performance gains will come from better utilization of transistors - "Plenty of room at the top" by - algorithmic innovation - 2. better software (better mapping to hardware) - better hardware (domain specialization) https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/ ### The Hardware Renaisscance Is Now - Agile Hardware Development - Small teams and quick iterations achieve better results, faster than large teams in waterfall development cycles - Open Source Hardware Development - Based on Reuse and Improvement - Tremendous Opportunity from Open Access - Affects every Aspect of Hardware Design - Open Intellectual Property - Open Design and Verification Tools - Open Technology Agile and Open-Source Hardware IEEE Micro July/August 2020 All right, let's look beyond software optimization and build better hardware! How do you design a chip, really? #### **Cross Section of a traditional 2D ASIC** [Teshima 2014] p-substrate [Weste & Harris 2016] Substrate Tap; n+ Source nMOS Transistor pMOS Transistor n-well Well Tap ### **Cross Section of an Inverter** # Diagram and GDS View of an Inverter **Standard Cell** ## **Standard cells** **AND (6 transistors)** ## **Standard Cells** Flip-Flop (28 transistors) ## **Standard Cells** **NAND** with drive 4 (16 transistors) **NAND** with drive 1 (4 transistors) # Standard Cells Laid Out Using A Floorplan ## **Standard Cell Placement** ### **Utilization** Utilization (%) Standard Cell Area Core Area Figure has 50% utilization Typical value is 70-80% Excessive utilization makes routing hard (and sometimes impossible) #### Interconnect - Filler cells - Routing congestion ## **Padcells** ## **ASIC** Design Flow # **Front-end Design** ## **Back-end Design**