.. ECE 574

.. attention::

   This document was last updated |today|

.. _06designflow:

Design Flow
===========

.. contents:: 
   
.. important::

   The purpose of this lecture is as follows.

   * To enumerate the major steps in a digital design flow to go from
     RTL code to a layout
   * To describe a sample implementation of this flow by the example of OpenLane
   * To define the practical realization of a digital design flow for 574
   * To illustrate the role of gate-level simulation in this design flow
   * To provide implementation guidance such as Makefile coding and coding guidelines
   * To describe the FuseSoC tool for design flow construction
   * To illustrate a combination of FuseSoC and the standard 574 flow.

.. attention::

   The following references are relevant background to this lecture.

   * Andrew Kahng et al., "VLSI Physical Design: From Graph
     Partitioning to Timing Closure," `Chapter 1 (Introduction)
     <https://link.springer.com/book/10.1007/978-90-481-9591-6>`_,
     Springer Publishers

   * M Shalan, T. Edwards, "`Building OpenLANE: A 130nm OpenROAD-based
     Tapeout-Proven Flow
     <https://ieeexplore.ieee.org/document/9256623>`_," Proc. ICCAD
     2020

.. attention::

   Examples for this lecture are available under
   https://github.com/wpi-ece574-f24/ex-flow
     
Generic IC Design Flow
----------------------

In this lecture, we take a step back and look at the big picture in
Digital IC Design. The modern IC design flow is a marvel of efficiency
and pragmatism. The number of hard optimization problems that are
addressed by automatic tools in IC design is truly remarkable. The
majority of design automation problems, such as placing of standard
cells, routing a clock tree, deciding on the proper power distribution
network, etc, can only be heuristically solved. Yet, without the
design flow, there would be no complex chips, and no need for Moore's
law.

For standard cell based design (our focus on this class), different IC
design flows share many common ideas and steps. We will therefore
start with a discussion of a generic design flow, and afterwards
refine that example into a concrete implementation based on `OpenLANE
<https://openlane2.readthedocs.io/en/latest/index.html>`_, an
open-source IC design flow. We start with the generic flow. There are
two major phases in a design flow, generally called *front-end* design
flow and *back-end* design flow. The front-end converts an HDL
implementation into a netlist, i.e. a network of technology-specific
logic gates, The back-end converts the netlist into a structure of
place standard cells, interconnected by wires. Complexity wise, the
backend is more intricate and involved than the front-end. This looks
somewhat surprising, given that the main user input, the *RTL*, is
provided at the front-end. The reality, of course, is that the design
space of the back-end has many more dimensions and trade-offs than the
design space of the front-end. Besides the RTL, you'll find that
design flows require a large number of *constraints*, *technology
libraries* and *design scripts* (not shown in the figure), to guide
the process of converting RTL into a netlist and afterwards into a
layout.

.. figure:: img/flow_l5_1.png
   :figwidth: 500px
   :align: center

The technology library is a crucial component to support the design
flow. At high level, the technology library is a description of the
standard cells and low-level technology components required to
complete the chip layout. A technology library provides several
different *views* which describe different aspects of each standard
cell in that library.

- A *timing view* and a *power view* allow tools to evaluate the speed and power consumption of cells after their integration in a netlist.
- A *functional view* allows tools to simulate the functionality of the cells after their integration in a netlist.
- A *layout view* allows tools to know the physical outline of the cell, including the location where connections should be made.

As an example, let's take one cell from the skywater 130nm library, a
two-input NAND gate with drive strength 1. The technology library for
the skywater 130nm is located at
``/opt/skywater/libraries/sky130_fd_sc_hd`` on the design server. The
timing and power characteristics of the two-input NAND gate is
captured in a *lib* file, for example
``timing/sky130_fd_sc_hd__tt_025C_1v80.lib``. The functional behavior
of the two-input NAND gate is captured in a Verilog file such as
``cells/nand2/sky130_fd_sc_hd__nand2_1.v``. The layout view of the
two-input NAND gate is captured in a LEF file such as
``cells/nand2/sky130_fd_sc_hd__nand2_1.lef``. We will discuss some of
these formats in further detail in future lectures. The main point,
however, is to see that a 'standard cell' is not an atomic
entity. Depending on what you want to achieve in the design flow
(simulation, timing evaluation, layout construction, ...), different
file formats come into play that each highlight specific aspects of
the standard cell. In that sense, a 'standard cell library' is very
different from a traditional software library.

Design Steps in the front-end
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. figure:: img/flow_l5_2.png
   :figwidth: 500px
   :align: center

The first task in front-end design is to verify that the design is
correct.  This requires the design of a testbench and a simulation
tool. Typically, a testbench will contain one or more tests to
validate that the output of the design under test matches the expected
output. For in-depth debugging purposes, a *Value Change Dump* (VCD)
file can be produced that records every signal change over the course
of the simulation. However, generating VCDs is time- and
disk-space-hungry, so it is rarely done in an exhaustive manner.

A correct RTL is next mapped to a technology netlist by a *logic
synthesis* tool. The logic synthesis tool requires a technology
library containing a list of target cells, a set of synthesis
constraints, and (not shown in the figure) a synthesis script. Typical
synthesis constraints specify to desired clock period of the design,
or the constraint to apply specific synthesis techniques
(e.g. encoding finite state machine states with one-hot codes).
Synthesis constraints play an important rule in the quality of the
logic synthesis output. 

The netlist (or *gate-level netlist*) can be simulated with the same
testbench as the RTL design, and of course one would expect an
equivalent output compared to the RTL simulation. There may be subtle
differences, however. For example, the reset behavior of a gate-level
netlist may not be identical to the RTL design. Also, a gate-level
simulation is able to express proper technology delays (propagation
delay through the gates, for example), and timing-depedent effects
such as glitches may become visible at the gate-level. At the RTL,
every computation step is defined by a clock cycle. At the gate-level,
a computation step can be as small as a single gate transition.

Another result form the logic synthesis is a constraints file that
captures the delays of the netlist (SDF). These delays are based on
the actual cells used in the circuit, in combination with the fanout
and specific wireload model adopted using synthesis. This SDF file
will enable accurate timing simulation of the gate-level circuit. The
SDF can also be used by static timing analysis (STA) tools.

Because the timing properties of a netlist are different from the
(essentially untimed) RTL, the netlist can also be verified using
timing analysis -- which we will discuss in detail during a future
lecture. The most important outcome of the timing analysis is the
*slack*, which is the margin between the designs' project clock period
(clock frequency) and the actual delay experienced by the clock. A
positive slack means that the logic is faster enough to finish a
single-cycle computation within a clock period. A negative slack,
however, means that the design experiences a *timing violation* and
requires performance updates. Such updates can imply adjusting the
synthesis constraints, the synthesis script, or even improving the
RTL.

Design Steps in the back-end
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. figure:: img/flow_l5_3.png
   :figwidth: 500px
   :align: center

The backend starts with a design netlist that meets the clock period
constraint, along with other synthesis constraints.  A sequence of
tools will now convert the netlist into a structure.  Each tool
handles a specific aspect in the definition of that structure.

* A *floorplanner* defines the major outline characteristics of the
  chip, such as the regions where standard cells can be placed, where
  hard macro's (memory cells) can be placed, and how input/output
  cells are implemented. Furthermore, the floorplanner defines the
  power grid infrastructure on the chip. For a chip with hundreds of
  thousands of cells, the power grid becomes a hierarchical network
  which must ensure that power is evenly distributed across the chip,
  such that each individual cell can operate at nominal capacity.

* Next a *placement* tool will decide where to place the standard
  cells, making sure that cells with lots of local interconnections
  are placed together, and making sure that there is enough room
  between the cells to implement the interconnections.

* Next, the clock signal network is created by *clock tree
  synthesis*. The clock is the single most important global net of the
  chip (or block). The key challenge is to make sure that the clock
  signal arrives at each cell (flip-flop) at the same time, and it
  requires the definition of a hierarchical network called a clock
  tree.

* Next, the signal interconnections between individual gate and macro
  pins are implemented in the *routing* step. Routing is done in two
  phases: *Global routing* decides roughly on the path taken by each
  wire, by allocating each wire to a track of grid cells. Next,
  *detailed routing* decides on the detailed implementation of
  individual wires, such as the metal layers and via interconnections
  used.

* The final step *layout finishing* makes sure that the layout meets
  all the *design rules* in place for the technology. Design rules
  specify the density and spacing of low-level chip elements such as
  metal wires, and polysilicon regions. The final output of the layout
  finishing phase is a GDS (graphic design system), a structural
  specification of a chip design. Another file that is produced as
  outcome is a new delay estimation file (SDF) in addition to a layout
  parasitics file (SPEF). The SDF has the same meaning as the earlier
  SDF produced after synthesis, except that the wireload model is now
  replaced by the actual interconnect delay estimates. The SDF file is
  used for timing analysis and post-layout gate-level simulation. The
  SPEF file reflects capacitance and resistance values of the layout
  interconnections. SPEF files are used for checks of signal integrity
  of the layout, as well as low-level transistor simulation.
  
Clearly, there are a tremendous number of parameters that play a role
in moving a netlist into a layout. Multiple technology libraries support
the implementation of standard cells, I/O pad cells and macro's. Layout
constraints craft the structure into a given desired shape, which must
meet the design rules for the selected technology.

OpenLane: An Open Source Design Flow
------------------------------------

.. attention::

   This course does not express a preference on which approach is
   better: open-source or (commercial) closed-source. Both approaches
   have advantages and disadvantages, and there are good reasons for
   either approach.

Perhaps remarkably, all of the above tools nowadays are available as
open-source components, up to and including the technology
libraries. This is a very recent evolution that is having a profound
impact on the hardware design process, certainly in educational
context. We will discuss one example, the *OpenLane Flow*, which is
the same design flow used for the TinyTapeout projects.

`OpenLane <https://openlane2.readthedocs.io/en/latest/index.html>`_ is
an extension of another open-source project called `OpenROAD
<https://theopenroadproject.org/>`_. The latter was initiated as a
DARPA project with the objective of creating a tool chain that is able
to complete an RTL-to-layout flow within 24 hours. A full day may
sound a lot, if you are used to compiling C or FPGA code. But, given
the complexity of this task, plus the fact that this task is entirely
supported using open-source technology, this is a truly remarkable
achievement which was near impossible just a few years
ago. Furthermore, most smaller designs (such as the *tiles* in
TinyTapeout projects), take far less time to implement. The following
table shows that for each major design step in a chip design flow,
there is now an open-source alternative.

+------------------+-----------------------+------------------------+
|  Step            |   Cadence             |  OpenLane              |
+==================+=======================+========================+
| Logic Synthesis  | Genus                 | Yosys, ABC             |
+------------------+-----------------------+------------------------+
| ATPG             | Genus                 | Fault                  |
+------------------+-----------------------+------------------------+
| Placement        | Innovus               | RePlAce, OpenDP        |
+------------------+-----------------------+------------------------+
| Routing          | Innovus               | FastRoute, TritonRoute |
+------------------+-----------------------+------------------------+
| CTS              | Innovus               | TritonCTS              |
+------------------+-----------------------+------------------------+
| Timing Analysis  | Tempus                | OpenSTA                |
+------------------+-----------------------+------------------------+
| LVS, DRC         | Calibre (Siemens)     | Netgen, Magic          |
+------------------+-----------------------+------------------------+

The standard Openlane flow follows overall a flow that is similar to
the generic flow presented earlier. 

.. figure:: img/openlane2.png
   :figwidth: 300px
   :align: center

.. attention::

   The following instructions walk you through an example using OpenLane.
   You need a good Ubuntu box or WSL2 on your laptop to be able to run this.
   
Installing OpenLane2
^^^^^^^^^^^^^^^^^^^^

The following instructions describe an installation of OpenLane2 on
docker.  If you don't have a docker installation on your laptop,
follow the instructions the `OpenLane2 documentation
<https://openlane2.readthedocs.io/en/latest/getting_started/common/docker_installation/installation_ubuntu.html>`_

.. code::

   # Remove old installations
   sudo apt-get remove docker docker-engine docker.io containerd runc
   
   # Installation of requirements
   sudo apt-get update
   sudo apt-get install \
      ca-certificates \
      curl \
      gnupg \
      lsb-release
   # Add the keyrings of docker
   sudo mkdir -p /etc/apt/keyrings
   curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
   # Add the package repository
   echo \
   "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
   # Update the package repository
   sudo apt-get update

   # Install Docker
   sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

   # Check for installation
   sudo docker run hello-world
   
If all goes well, you will see a 'Hello from Docker!' message printed
by the last command.

Many of the Openlane commands require the user to be a member of the docker group (with sudo privileges!). The following commands make this happen.

.. code::

   sudo groupadd docker
   sudo usermod -aG docker $USER
   sudo reboot # REBOOT!
   
Once you have docker running, you can now download and install the openlane container:

.. code::

   python3 -m pip install openlane
   
Followed by a simple test:

.. code::

   python3 -m openlane --dockerized --smoke-test
   
Running OpenLane2
^^^^^^^^^^^^^^^^^

.. attention::

   This example code is also available on the repository https://github.com/wpi-ece574-f24/ex-flow.git

The following is a multiply-accumulate module that we will implement
using the openlane2 flow. The design accepts two variables ``x1`` and
``x2`` and computes the product as well as the accumulated
product. The output of the design is 10 bit, and the product is
internally rescaled so that only the 10 most significant bits are
kept.


.. code::

   module mac(
    input [7:0] x1,
    input [7:0] x2,
    output reg [9:0] y,
    output reg [9:0] m,
    input  reset,
    input  clk
  );

   reg [9:0] y_next;
   reg [15:0] m16;

   always @(posedge clk)
       if (reset)
         y <= 10'b0;
       else
         y <= y_next;

   always @(*)
     begin
        m16 = x1 * x2;
        m = m16[15:6];
        y_next = y + m;
     end

  endmodule

To build a chip for this Verilog file, we must provide a configuration
file to tell Openlane about the physical characteristics of the
chip. The overall flow is set up such that it will run the entire
frontend and backend in one iteration. Furthermore, this
implementation does not include simulation/validation of the design,
which would have to be completed separately. 

The following is an example configuration file for the
multiply-accumulate chip.

.. code::

   {
    "DESIGN_NAME": "mac",
    "VERILOG_FILES": "dir::src/mac.v",
    "CLOCK_PORT": "clk",
    "CLOCK_PERIOD": 100,
    "pdk::sky130A": {
      "MAX_FANOUT_CONSTRAINT": 6,
      "FP_CORE_UTIL": 40,
      "PL_TARGET_DENSITY_PCT": "expr::($FP_CORE_UTIL + 10.0)",
      "scl::sky130_fd_sc_hd": {
        "CLOCK_PERIOD": 15
       }
     }
   }

In a nutshell, this configuration file indicates the source code files
that make up the design, and the selection of the target technology
along with physical constraints for the design.  The two most
important variables are the clock period, and floorplan target
utilization (the ratio of the standard cell active area to the total
core). There are a great number of additional configurarion variables
available that help you handle many other aspects of the design.  They
are described in detail in `online documentation
<https://openlane2.readthedocs.io/en/latest/reference/flows.html#>`_


To implement the design, first start the openlane docker container:

.. code::

   python3 -m openlane --dockerized

And next, run the design flow.

.. code::

   openlane config.json

The flow runs through a great number of individual steps, but every
step along the way is documented in a log. These logs can be consulted
in the ``runs`` subdirectory after the flow completes.

.. code::

   OpenLane Container (2.1.8):/home/pschaumont/ex-flow/openlane2/runs/RUN_2024-09-26_14-45-48% ls
   01-verilator-lint                   27-openroad-globalplacement       53-openroad-irdropreport
   02-checker-linttimingconstructs     28-odb-writeverilogheader         54-magic-streamout
   03-checker-linterrors               29-checker-powergridviolations    55-klayout-streamout
   04-checker-lintwarnings             30-openroad-stamidpnr             56-magic-writelef
   05-yosys-jsonheader                 31-openroad-repairdesignpostgpl   57-odb-checkdesignantennaproperties
   06-yosys-synthesis                  32-openroad-detailedplacement     58-klayout-xor
   07-checker-yosysunmappedcells       33-openroad-cts                   59-checker-xor
   08-checker-yosyssynthchecks         34-openroad-stamidpnr-1           60-magic-drc
   09-checker-netlistassignstatements  35-openroad-resizertimingpostcts  61-klayout-drc
   10-openroad-checksdcfiles           36-openroad-stamidpnr-2           62-checker-magicdrc
   11-openroad-checkmacroinstances     37-openroad-globalrouting         63-checker-klayoutdrc
   12-openroad-staprepnr               38-openroad-checkantennas         64-magic-spiceextraction
   13-openroad-floorplan               39-odb-diodesonports              65-checker-illegaloverlap
   14-odb-checkmacroantennaproperties  40-openroad-repairantennas        66-netgen-lvs
   15-odb-setpowerconnections          41-openroad-stamidpnr-3           67-checker-lvs
   16-odb-manualmacroplacement         42-openroad-detailedrouting       68-checker-setupviolations
   17-openroad-cutrows                 43-odb-removeroutingobstructions  69-checker-holdviolations
   18-openroad-tapendcapinsertion      44-openroad-checkantennas-1       70-checker-maxslewviolations
   19-odb-addpdnobstructions           45-checker-trdrc                  71-checker-maxcapviolations
   20-openroad-generatepdn             46-odb-reportdisconnectedpins     72-misc-reportmanufacturability
   21-odb-removepdnobstructions        47-checker-disconnectedpins       error.log
   22-odb-addroutingobstructions       48-odb-reportwirelength           final
   23-openroad-globalplacementskipio   49-checker-wirelength             flow.log
   24-openroad-ioplacement             50-openroad-fillinsertion         resolved.json
   25-odb-customioplacement            51-openroad-rcx                   tmp
   26-odb-applydeftemplate             52-openroad-stapostpnr            warning.log

In the ``final`` subdirectory, the layout file is available as
``final/gds/mac.gds``. You can open this file in the ``klayout``
viewer which is part of openlane. Once inside KLayout, you can also
load a *layer properties file* which replaces the abstract numbered
layers with names. For SKY130, there's such a layer properties file
included in the example code. The following shows the layout in
klayout, when all metal layers (drawing) are enables, as well as local
interconnect, polysilicon, and the cell outline. It's easy to spot
large empty areas in the middle with cells that are no connected to
any net. Such cells as *filler* cells, they are there to ensure to
that the power rails on metal1 are connected across the entire row of
standard cells.


.. code::

   klayout final/gds/mac.gds

.. figure:: img/maclayout_ol2.png

Because the view offered by klayout is purely structural, it is not
easy to connect the layout to the higher level properties of the
design (such as net names, or names of registers). On the other hand,
klayout allows you to inspect the layout of individual standard cells.
You can do this by selecting one of the standard cells in the design
hierarchy and set that as the new design top in klayout. For example,
the following shows a flip-flop layout.

.. figure:: img/floplayout.png

  
Standard 574 Design Flow
------------------------

In this course, we are building a flow from Cadence tools. Also, we
are studying individual aspects of the design flow separately, and
this reflects how we organize the data files.

In the following example, we demonstrate the directory structure for
the frontend.

.. code::

   .
   ├── constraints
   │   └── constraints_clk.sdc
   ├── glsim
   │   ├── Makefile
   │   └── tb.sv
   ├── rtl
   │   └── mac.sv
   ├── sim
   │   ├── Makefile
   │   └── tb.sv
   ├── sta
   │   ├── Makefile
   │   └── tempus.tcl
   └── syn
       ├── genus_script.tcl
       └── Makefile

The source code of the design is stored in ``rtl``. The code can be
simulated with the testbench ``tb.sv`` under ``sim``.

RTL Simulation
^^^^^^^^^^^^^^

The following is the testbench ``tb.sv``. Note the following aspects.
The testbench records a value change dump (VCD) that allow inspection
of the simulation using a waveform viewer such as ``gtkwave``. The
testbench applies pseudorandom stimuli, but also verifies the
correctness of the results generated. 

.. code::

   module tb;
   logic [7:0] x1;
   logic [7:0] x2;
   logic [9:0] m;
   logic [9:0] y;
   logic reset;
   logic clk;

   mac dut(.x1(x1),
           .x2(x2),
           .m(m),
           .y(y),
           .reset(reset),
           .clk(clk));

   always
   begin
      clk = 1'b0;
      #5 clk = 1'b1;
      #5;
   end

   logic [15:0] test_m;
   logic [9:0] test_y;

   initial
   begin
      $dumpfile("trace.vcd");
      $dumpvars(0, tb);
      x1 = 8'b0;
      x2 = 8'b0;
      test_m = 10'b0;
      test_y = 10'b0;

      reset = 1'b1;
      repeat(3)
         @(posedge clk);
      #1;
      reset = 1'b0;

      $display("m %d y %d", m, y);

      repeat(30)
        begin
           x1 = $random;
           x2 = $random;
           @(posedge clk);
           #1;

           test_m = (x1 * x2) >> 6;
           test_y = test_y + test_m;
           $display("x1 %d x2 %d m %d y %d exp_m %d exp_y %d ERR %d",
                    x1, x2, m, y, test_m, test_y, ~((test_m == m) && (test_y == y)));
           #1;
        end // repeat (30)

      $finish;
   end

   endmodule

To understand the simulation command, inspect the ``Makefile``. The
general principle of the ECE574 flow is to drive all implementation
steps from a Makefile which will call the simulator with the proper
command line parameters.

This particular makefile has two targets: ``sim`` and ``simg``. For
batchmode simulation, use ``make sim``. To access the simulator GUI,
use ``make simg``.

RTL Synthesis
^^^^^^^^^^^^^

Building a gate-level netlist for the multiply-accumulate requires
synthesis constraints as well as a synthesis script.

The synthesis constraints are stored under
``constraints/constraints_clk.sdc``.  Constraints include the clock
period, as well as the input delay and output delay for ports. The
constraints file is written in such a way that the user can provide a
clock period value through an environment variable ``CLOCKPERIOD``.

.. code::

   if {![info exists ::env(CLOCKPERIOD)] } {
      set clockPeriod 20
   } else {
      set clockPeriod [getenv CLOCKPERIOD]
   }

   create_clock -name clk -period $clockPeriod [get_ports "clk"]

   set_input_delay  0 -clock clk [all_inputs -no_clocks]
   set_output_delay 0 -clock clk [all_outputs]

The use of environment variables allows you to develop scripts that
can serve multiple designs and multiple purposes. In a Makefile we
frequently rely on environment variables as well, to define tool
parameters and options.

Consider the ``Makefile`` in the ``syn`` subdirectory.

.. code::
   
   all: syn

   syn:
        BASENAME=mac \
        CLOCKPERIOD=4 \
        TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing \
        TIMINGLIB=slow_vdd1v0_basicCells.lib \
        VERILOG='../rtl/mac.sv' \
        genus -f genus_script.tcl

   clean:
        rm -rf outputs reports genus.log* genus.cmd* *~ fv

The command ``make syn`` would run ``genus``. Note how the makefile
sets several environment variables, including ``BASENAME``,
``CLOCKPERIOD``, ``TIMINGPATH``, ``TIMINGLIB`` and ``VERILOG``. These
environment variables are used inside of the genus script, as well as
by the design constraints. The use of environment variables allows you
to separate the variable parts from the fixed parts (e.g., the
scripts) in the design flow. Take a look at the example genus script
below.  The environment variables are accessed from within the TCL
environment

.. code::

   if {![info exists ::env(TIMINGPATH)] } {
      puts "Error: missing TIMINGPATH"
      exit(0)
   }

   if {![info exists ::env(TIMINGLIB)] } {
      puts "Error: missing TIMINGLIB"
      exit(0)
   }

   set_db init_lib_search_path [getenv TIMINGPATH]
   read_libs [getenv TIMINGLIB]

   if {![info exists ::env(VERILOG)] } {
      puts "Error: missing VERILOG"
      exit(0)
   }

   set_db init_hdl_search_path ../rtl/
   read_hdl -language sv [getenv VERILOG]

   elaborate
   read_sdc ../constraints/constraints_clk.sdc

   set_db syn_generic_effort high
   set_db syn_map_effort high
   set_db syn_opt_effort high

   syn_generic
   syn_map
   syn_opt

   if {![info exists ::env(BASENAME)] } {
      set basename "default"
   } else {
      set basename [getenv BASENAME]
   }

   #reports
   report_timing > reports/${basename}_report_timing.rpt
   report_power  > reports/${basename}_report_power.rpt
   report_area   > reports/${basename}_report_area.rpt
   report_qor    > reports/${basename}_report_qor.rpt

   set outputnetlist     outputs/${basename}_netlist.v
   set outputconstraints outputs/${basename}_constraints.sdc
   set outputdelays      outputs/${basename}_delays.sdf

   write_hdl > $outputnetlist
   write_sdc > $outputconstraints
   write_sdf -timescale ns \
             -nonegchecks \
	     -recrem split \
	     -edges check_edge  \
	     -setuphold split > $outputdelays

   exit

Static Timing Analysis
^^^^^^^^^^^^^^^^^^^^^^

The ``sta`` directory is used to run Static Timing Analysis after gate
level synthesis.  Similar to synthesis, Static Timing Analysis scripts
can be developed such that all the variable parts are stored outside
of the scripts as environment variables.  This leads to the following
``Makefile``.

.. code::

   sta:
        BASENAME=mac \
        TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing \
        TIMINGLIB=slow_vdd1v0_basicCells.lib \
           tempus -files tempus.tcl

   clean:
        rm -f *.rpt *.slk tempus.cmd* tempus.rpt* tempus.log* *~

As well as the following tempus script:
	
.. code::

   if {![info exists ::env(TIMINGLIB)] } {
          puts "Error: missing TIMINGLIB"
          exit(0)
   }

   if {![info exists ::env(TIMINGPATH)] } {
          puts "Error: missing TIMINGPATH"
          exit(0)
   }

   if {![info exists ::env(BASENAME)] } {
          puts "Error: missing BASENAME"
          exit(0)
   }

   read_lib [getenv TIMINGPATH]/[getenv TIMINGLIB]

   set basename [getenv BASENAME]

   read_verilog ../syn/outputs/${basename}_netlist.v

   set_top_module ${basename}

   read_sdc ../syn/outputs/${basename}_constraints.sdc
   read_sdf ../syn/outputs/${basename}_delays.sdf

   report_timing -late -max_paths 10 > late.rpt
   report_timing -early -max_paths 10 > early.rpt

   report_timing  -from [all_inputs] \
                  -to [all_outputs] \
		  -max_paths 10 \
		  -path_type summary  > allpaths.rpt
   report_timing  -from [all_inputs] \
                  -to [all_registers] \
		  -max_paths 10 \
		  -path_type summary  >> allpaths.rpt
   report_timing  -from [all_registers] \
                  -to [all_registers] \
		  -max_paths 10 \
		  -path_type summary >> allpaths.rpt
   report_timing  -from [all_registers] \
                  -to [all_outputs] \
		  -max_paths 10 \
		  -path_type summary >> allpaths.rpt
   exit

Gate Level Simulation
^^^^^^^^^^^^^^^^^^^^^

After RTL synthesis and static timing analysis, you can also simulate
the gate-level netlist of your hardware design. If your design is created following
proper synchronous design practice, you will be able to reuse the same
testbench as used for the RTL simulation.

However, since this is a gate-level simulation, there are a few things
that deserve attention.

1. A gate-level netlist uses cells from a cell library, and functional
   views for those cells have to be included in the simulation.

2. A gate-level netlist, as an outcome of RTL synthesis, will adopt specific
   delays in terms of an SDF file. The SDF file has to be added to the
   testbench to ensure proper gate delays are simulated.


We'll address the second problem first. Adding an SDF file to a
simulation can be done using ``$sdf_annotate`` in Verilog. Thus, in
the testbench, the following block is added. This block is conditional
to the definition of an environment variable ``USE_SDF``. The idea is
that we can simulate the gate-level netlist with or without timing
back-annotation. When the timing back-annotation is not included
(i.e., the ``USE_SDF`` macro is not use), we get a purely functionial
simulation of the gate netlist. When timing back-annotation is
included, gate-delays are included and we will observe glitching
effects as well as (possible) timing faults.

.. code::
   
   `ifdef USE_SDF
   initial
     begin
        $sdf_annotate("../syn/outputs/mac_delays.sdf",tb.dut,,"sdf.log","MAXIMUM");

     end
   `endif

The makefile command specifies the other aspect of gate-level
simulation.  The simulation now includes a functional view of the
standard cell library used to produce the gate-level netlist.  The
simulation makefile below is used for gate-level simulation of the
*post-synthesis* netlist. We can also perform the gate-level
simulation after the layout is complete, and that would be a
*post-layout* simulation.

.. code::

   sim-postsyn:
        xrun -timescale 1ns/1ps \
        tb.sv \
        ../syn/outputs/mac_netlist.v \
        /opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/verilog/slow_vdd1v0_basicCells.v \
        -access +rwc \
        -define USE_SDF \
        -top tb

    sim-postsyn-gui:
        xrun -timescale 1ns/1ps \
        tb.sv \
        ../syn/outputs/mac_netlist.v \
        /opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/verilog/slow_vdd1v0_basicCells.v \
        -access +rwc \
        -define USE_SDF \
        -top tb \
        +gui

    clean:
        rm -rf trace.vcd  xcelium.d  xrun.history  xrun.log xrun.key *~

In this overview of the ece574 design flow, we do not include the
backend portion of the flow yet. That backend portion will be added
after we discuss floorplanning and place-and-route using Cadence
Innovus.
	
FuseSoC
-------

The final flow which we will discuss is `FuseSoC
<https://fusesoc.readthedocs.io/en/stable/index.html>`_. FuseSoC is
not a complete flow, but rather a pacakge manager that can configure
(and run) flows for complex SoC designs. FuseSoC supports the
description of dependencies between *cores*. A core is captured
in a ``.core`` file.

We have already used ``FuseSoC`` in the SoC lecture. The IBEX
processor, or example, is described as follows in a ``ibex_core.core``
file. Only the first few lines are included. It defines a core with
name ``lowrisc:ibex:ibex_core``, followed by a definition of the
fileset called ``files_rtl``. There are dependencies as well as
specific files.  The dependencies (such as ``lowrisc:prim:lfsr``)
point to other cores, while the files point to specific files.

.. code::
   
      name: "lowrisc:ibex:ibex_core:0.1"
      description: "Ibex CPU Core Components"
      
      filesets:
        files_rtl:
          depend:
            - lowrisc:prim:assert
            - lowrisc:prim:clock_gating
            - lowrisc:prim:lfsr
            - lowrisc:prim:mubi
            - lowrisc:ibex:ibex_pkg
            - lowrisc:ibex:ibex_icache
            - lowrisc:dv:dv_fcov_macros
          files:
            - rtl/ibex_alu.sv
            - rtl/ibex_branch_predict.sv
            - rtl/ibex_compressed_decoder.sv
            - rtl/ibex_controller.sv
            - rtl/ibex_cs_registers.sv
            - rtl/ibex_csr.sv
            - rtl/ibex_counter.sv
            - rtl/ibex_decoder.sv
            - rtl/ibex_ex_block.sv
            - rtl/ibex_fetch_fifo.sv
            - rtl/ibex_id_stage.sv
            - rtl/ibex_if_stage.sv
            - rtl/ibex_load_store_unit.sv
            - rtl/ibex_multdiv_fast.sv
            - rtl/ibex_multdiv_slow.sv
            - rtl/ibex_prefetch_buffer.sv
            - rtl/ibex_pmp.sv
            - rtl/ibex_wb_stage.sv
            - rtl/ibex_dummy_instr.sv
            - rtl/ibex_core.sv
            - rtl/ibex_pmp_reset_default.svh: {is_include_file: true}
          file_type: systemVerilogSource

The core dependencies will expand into additional core files.
For example, the dependency ``lowrisc:prim:lfsr`` refers to a file
called ``./vendor/lowrisc_ip/ip/prim/prim_lfsr.core``.

.. code::

   name: "lowrisc:prim:lfsr:0.1"
   description: "A Linear-Feedback Shift Register (LFSR) primitive"
   filesets:
     files_rtl:
       depend:
         - lowrisc:prim:assert
         - lowrisc:prim:cipher_pkg
       files:
         - rtl/prim_lfsr.sv
       file_type: systemVerilogSource

FuseSoC can also define one or more *targets*. These are the targets of
the design flow, and can include simulation as well as concrete hardware
synthesis (such as yosys or vivado). A target defines a set of
concrete parameters and file sets that should be used for a tool.

When FuseSoC is called, then one of three actions can be specified:

1. In ``setup``, FuseSoC will assemble all file sets into a dependency
   tree such that a flattened view of the design is created.

2. In ``build``, FuseSoC will call the tool specified in the tool flow.
   Tools can include e.g., verilator, vivado or yosys.

3. In ``run``, FuseSoC will call the output of the tool flow,
   with an action specific to the flow. For example, a verilator
   flow runs the simulation, while a vivado flow can configure an
   FPGA.

We will illustrate FuseSoC on a github repository that uses the IBEX
on a demonstration system. The repository was
``https://github.com/lowRISC/ibex-demo-system``.

The IBEX demo system is a small demonstration containing a RAM, a
UART, a GPIO and an SPI. It is used to implement IBEX demo designs on
FPGA platform.

.. code::

   git clone https://github.com/lowRISC/ibex-demo-system

As the class server does not contain the vivado flow, we will demonstrate
the design verification flow.

First, make sure you enable the fusesoc environment and the updated
RedHat development toolset.

.. code::

   pyenv activate fusesoc
   scal enable devtoolset-9 bash

You can then build a simulation as follows.

.. code::

   fusesoc --cores-root=. run --target=sim --tool=verilator --setup --build lowrisc:ibex:demo_system

The resulting simulator is built into
`build/lowrisc_ibex_demo_system_0/sim-verilator/`, but to run it we
first need a software application. The application software for the
system is held under ``sw``. To compiler the C applications:

.. code::

   cd sw/c
  
   # if build does not exists, create it
   mkdir build
   cd build

   cmake ..
   make

Finally, to run the simulator, provide a compiled application as argument, for example:

.. code::

   # run from the main demo directory
   
   ./build/lowrisc_ibex_demo_system_0/sim-verilator/Vtop_verilator \
   -t sim.fst \
   --meminit=ram,./sw/c/build/demo/hello_world/demo

This simulation runs forever, so you have to stop it using Ctrl-C. As
with the ibex example discussed earlier, the '-t' parameter can be
used to generate a VCD (FST) file.

.. code::

   ./build/lowrisc_ibex_demo_system_0/sim-verilator/Vtop_verilator \
            -t sim.fst \
            --meminit=ram,./sw/c/build/demo/hello_world/demo
            
   Simulation of Ibex Demo System
   ==============================
   
   Tracing can be toggled by sending SIGUSR1 to this process:
   $ kill -USR1 7511
   
   UART: Created /dev/pts/11 for uart0. Connect to it with any terminal program, e.g.
   $ screen /dev/pts/11
   UART: Additionally writing all UART output to 'uart0.log'.
   
   Simulation running, end by pressing CTRL-c.
   Tracing enabled.
   Writing simulation traces to sim.fst
   ^CReceived stop request, shutting down simulation.
   
   Simulation statistics
   =====================
   Executed cycles:  9256002
   Wallclock time:   59.906 s
   Simulation speed: 154509 cycles/s (154.509 kHz)
   Trace file size:  170321473 B
   
   You can view the simulation traces by calling
   $ gtkwave sim.fst
   
   Performance Counters
   ====================
   Cycles:                     6465
   Instructions Retired:       3677
   LSU Busy:                   1624
   Fetch Wait:                 285
   Loads:                      987
   Stores:                     637
   Jumps:                      308
   Conditional Branches:       220
   Taken Conditional Branches: 127

To build de FPGA, you run the following command. The command ends in
an error message on the class server, because vivado is not
available. However, we will inspect the files that have been created.

.. code::

   fusesoc --cores-root=. run --target=synth --setup --build lowrisc:ibex:demo_system

Note that the target, in this case, says ``synth`` instead of ``sim``.
The ``ibex_demo_system.core`` specifies what happens for this target:

.. code::

   targets:
     default: &default_target
       filesets:
         - files_rtl
     synth:
       <<: *default_target
       default_tool: vivado
       filesets_append:
         - files_xilinx
         - files_constraints
       toplevel: top_artya7
       tools:
         vivado:
           part: "xc7a35tcsg324-1"  # Default to Arty A7-35
       parameters:
         - SRAMInitFile
         - PRIM_DEFAULT_IMPL=prim_pkg::ImplXilinx
       flags:
         use_bscane_tap: true

After the command terminates, inspect the directory structure under
``build``.  The ``src`` is where the RTL source code tree is stored,
``sim-verilator`` is where the simulation is compiled, and
``synth-vivado`` is where bitstream generation is completed.

.. code::
   
     .
     ├── sim-verilator
     ├── src
     └── synth-vivado

Inside of ``synth-vivado``, you'll find synthesis scripts and constraint
files for this implementation. In particular, take a look at the ``*.tcl``
files.

If you have a laptop configured with WSL2, you can configure the
entire implementation including vivado. My setup includes vivado
2024.1, fusesoc, and ibex. Follow the configuration commands under
https://github.com/lowRISC/ibex for setup and configuration of a RISC-V
toolchain and the fusesoc setup. Then, run the fpga synthesis:

.. code::

   fusesoc --cores-root=. run --target=synth --run lowrisc:ibex:demo_system


which yields the following synthesis results:

+-----------+--------------+
| Primitive | Count        |
+-----------+--------------+
| FLOP      | 5209         |
+-----------+--------------+
| LUT       | 6110         |
+-----------+--------------+
| BMEM      | 16           |
+-----------+--------------+
| DMEM      | 88           |
+-----------+--------------+

The following is an FPGA floorplan of IBEX demo system in an Artix7 FPGA (default
target).

.. figure:: img/ibex-demo-system-fpga.png
   :figwidth: 500px
   :align: center