Attention

This document was last updated Nov 25 24 at 21:59

Design Flow

Important

The purpose of this lecture is as follows.

  • To enumerate the major steps in a digital design flow to go from RTL code to a layout

  • To describe a sample implementation of this flow by the example of OpenLane

  • To define the practical realization of a digital design flow for 574

  • To illustrate the role of gate-level simulation in this design flow

  • To provide implementation guidance such as Makefile coding and coding guidelines

  • To describe the FuseSoC tool for design flow construction

  • To illustrate a combination of FuseSoC and the standard 574 flow.

Attention

The following references are relevant background to this lecture.

Attention

Examples for this lecture are available under https://github.com/wpi-ece574-f24/ex-flow

Generic IC Design Flow

In this lecture, we take a step back and look at the big picture in Digital IC Design. The modern IC design flow is a marvel of efficiency and pragmatism. The number of hard optimization problems that are addressed by automatic tools in IC design is truly remarkable. The majority of design automation problems, such as placing of standard cells, routing a clock tree, deciding on the proper power distribution network, etc, can only be heuristically solved. Yet, without the design flow, there would be no complex chips, and no need for Moore’s law.

For standard cell based design (our focus on this class), different IC design flows share many common ideas and steps. We will therefore start with a discussion of a generic design flow, and afterwards refine that example into a concrete implementation based on OpenLANE, an open-source IC design flow. We start with the generic flow. There are two major phases in a design flow, generally called front-end design flow and back-end design flow. The front-end converts an HDL implementation into a netlist, i.e. a network of technology-specific logic gates, The back-end converts the netlist into a structure of place standard cells, interconnected by wires. Complexity wise, the backend is more intricate and involved than the front-end. This looks somewhat surprising, given that the main user input, the RTL, is provided at the front-end. The reality, of course, is that the design space of the back-end has many more dimensions and trade-offs than the design space of the front-end. Besides the RTL, you’ll find that design flows require a large number of constraints, technology libraries and design scripts (not shown in the figure), to guide the process of converting RTL into a netlist and afterwards into a layout.

_images/flow_l5_1.png

The technology library is a crucial component to support the design flow. At high level, the technology library is a description of the standard cells and low-level technology components required to complete the chip layout. A technology library provides several different views which describe different aspects of each standard cell in that library.

  • A timing view and a power view allow tools to evaluate the speed and power consumption of cells after their integration in a netlist.

  • A functional view allows tools to simulate the functionality of the cells after their integration in a netlist.

  • A layout view allows tools to know the physical outline of the cell, including the location where connections should be made.

As an example, let’s take one cell from the skywater 130nm library, a two-input NAND gate with drive strength 1. The technology library for the skywater 130nm is located at /opt/skywater/libraries/sky130_fd_sc_hd on the design server. The timing and power characteristics of the two-input NAND gate is captured in a lib file, for example timing/sky130_fd_sc_hd__tt_025C_1v80.lib. The functional behavior of the two-input NAND gate is captured in a Verilog file such as cells/nand2/sky130_fd_sc_hd__nand2_1.v. The layout view of the two-input NAND gate is captured in a LEF file such as cells/nand2/sky130_fd_sc_hd__nand2_1.lef. We will discuss some of these formats in further detail in future lectures. The main point, however, is to see that a ‘standard cell’ is not an atomic entity. Depending on what you want to achieve in the design flow (simulation, timing evaluation, layout construction, …), different file formats come into play that each highlight specific aspects of the standard cell. In that sense, a ‘standard cell library’ is very different from a traditional software library.

Design Steps in the front-end

_images/flow_l5_2.png

The first task in front-end design is to verify that the design is correct. This requires the design of a testbench and a simulation tool. Typically, a testbench will contain one or more tests to validate that the output of the design under test matches the expected output. For in-depth debugging purposes, a Value Change Dump (VCD) file can be produced that records every signal change over the course of the simulation. However, generating VCDs is time- and disk-space-hungry, so it is rarely done in an exhaustive manner.

A correct RTL is next mapped to a technology netlist by a logic synthesis tool. The logic synthesis tool requires a technology library containing a list of target cells, a set of synthesis constraints, and (not shown in the figure) a synthesis script. Typical synthesis constraints specify to desired clock period of the design, or the constraint to apply specific synthesis techniques (e.g. encoding finite state machine states with one-hot codes). Synthesis constraints play an important rule in the quality of the logic synthesis output.

The netlist (or gate-level netlist) can be simulated with the same testbench as the RTL design, and of course one would expect an equivalent output compared to the RTL simulation. There may be subtle differences, however. For example, the reset behavior of a gate-level netlist may not be identical to the RTL design. Also, a gate-level simulation is able to express proper technology delays (propagation delay through the gates, for example), and timing-depedent effects such as glitches may become visible at the gate-level. At the RTL, every computation step is defined by a clock cycle. At the gate-level, a computation step can be as small as a single gate transition.

Another result form the logic synthesis is a constraints file that captures the delays of the netlist (SDF). These delays are based on the actual cells used in the circuit, in combination with the fanout and specific wireload model adopted using synthesis. This SDF file will enable accurate timing simulation of the gate-level circuit. The SDF can also be used by static timing analysis (STA) tools.

Because the timing properties of a netlist are different from the (essentially untimed) RTL, the netlist can also be verified using timing analysis – which we will discuss in detail during a future lecture. The most important outcome of the timing analysis is the slack, which is the margin between the designs’ project clock period (clock frequency) and the actual delay experienced by the clock. A positive slack means that the logic is faster enough to finish a single-cycle computation within a clock period. A negative slack, however, means that the design experiences a timing violation and requires performance updates. Such updates can imply adjusting the synthesis constraints, the synthesis script, or even improving the RTL.

Design Steps in the back-end

_images/flow_l5_3.png

The backend starts with a design netlist that meets the clock period constraint, along with other synthesis constraints. A sequence of tools will now convert the netlist into a structure. Each tool handles a specific aspect in the definition of that structure.

  • A floorplanner defines the major outline characteristics of the chip, such as the regions where standard cells can be placed, where hard macro’s (memory cells) can be placed, and how input/output cells are implemented. Furthermore, the floorplanner defines the power grid infrastructure on the chip. For a chip with hundreds of thousands of cells, the power grid becomes a hierarchical network which must ensure that power is evenly distributed across the chip, such that each individual cell can operate at nominal capacity.

  • Next a placement tool will decide where to place the standard cells, making sure that cells with lots of local interconnections are placed together, and making sure that there is enough room between the cells to implement the interconnections.

  • Next, the clock signal network is created by clock tree synthesis. The clock is the single most important global net of the chip (or block). The key challenge is to make sure that the clock signal arrives at each cell (flip-flop) at the same time, and it requires the definition of a hierarchical network called a clock tree.

  • Next, the signal interconnections between individual gate and macro pins are implemented in the routing step. Routing is done in two phases: Global routing decides roughly on the path taken by each wire, by allocating each wire to a track of grid cells. Next, detailed routing decides on the detailed implementation of individual wires, such as the metal layers and via interconnections used.

  • The final step layout finishing makes sure that the layout meets all the design rules in place for the technology. Design rules specify the density and spacing of low-level chip elements such as metal wires, and polysilicon regions. The final output of the layout finishing phase is a GDS (graphic design system), a structural specification of a chip design. Another file that is produced as outcome is a new delay estimation file (SDF) in addition to a layout parasitics file (SPEF). The SDF has the same meaning as the earlier SDF produced after synthesis, except that the wireload model is now replaced by the actual interconnect delay estimates. The SDF file is used for timing analysis and post-layout gate-level simulation. The SPEF file reflects capacitance and resistance values of the layout interconnections. SPEF files are used for checks of signal integrity of the layout, as well as low-level transistor simulation.

Clearly, there are a tremendous number of parameters that play a role in moving a netlist into a layout. Multiple technology libraries support the implementation of standard cells, I/O pad cells and macro’s. Layout constraints craft the structure into a given desired shape, which must meet the design rules for the selected technology.

OpenLane: An Open Source Design Flow

Attention

This course does not express a preference on which approach is better: open-source or (commercial) closed-source. Both approaches have advantages and disadvantages, and there are good reasons for either approach.

Perhaps remarkably, all of the above tools nowadays are available as open-source components, up to and including the technology libraries. This is a very recent evolution that is having a profound impact on the hardware design process, certainly in educational context. We will discuss one example, the OpenLane Flow, which is the same design flow used for the TinyTapeout projects.

OpenLane is an extension of another open-source project called OpenROAD. The latter was initiated as a DARPA project with the objective of creating a tool chain that is able to complete an RTL-to-layout flow within 24 hours. A full day may sound a lot, if you are used to compiling C or FPGA code. But, given the complexity of this task, plus the fact that this task is entirely supported using open-source technology, this is a truly remarkable achievement which was near impossible just a few years ago. Furthermore, most smaller designs (such as the tiles in TinyTapeout projects), take far less time to implement. The following table shows that for each major design step in a chip design flow, there is now an open-source alternative.

Step

Cadence

OpenLane

Logic Synthesis

Genus

Yosys, ABC

ATPG

Genus

Fault

Placement

Innovus

RePlAce, OpenDP

Routing

Innovus

FastRoute, TritonRoute

CTS

Innovus

TritonCTS

Timing Analysis

Tempus

OpenSTA

LVS, DRC

Calibre (Siemens)

Netgen, Magic

The standard Openlane flow follows overall a flow that is similar to the generic flow presented earlier.

_images/openlane2.png

Attention

The following instructions walk you through an example using OpenLane. You need a good Ubuntu box or WSL2 on your laptop to be able to run this.

Installing OpenLane2

The following instructions describe an installation of OpenLane2 on docker. If you don’t have a docker installation on your laptop, follow the instructions the OpenLane2 documentation

# Remove old installations
sudo apt-get remove docker docker-engine docker.io containerd runc

# Installation of requirements
sudo apt-get update
sudo apt-get install \
   ca-certificates \
   curl \
   gnupg \
   lsb-release
# Add the keyrings of docker
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# Add the package repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update the package repository
sudo apt-get update

# Install Docker
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Check for installation
sudo docker run hello-world

If all goes well, you will see a ‘Hello from Docker!’ message printed by the last command.

Many of the Openlane commands require the user to be a member of the docker group (with sudo privileges!). The following commands make this happen.

sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot # REBOOT!

Once you have docker running, you can now download and install the openlane container:

python3 -m pip install openlane

Followed by a simple test:

python3 -m openlane --dockerized --smoke-test

Running OpenLane2

Attention

This example code is also available on the repository https://github.com/wpi-ece574-f24/ex-flow.git

The following is a multiply-accumulate module that we will implement using the openlane2 flow. The design accepts two variables x1 and x2 and computes the product as well as the accumulated product. The output of the design is 10 bit, and the product is internally rescaled so that only the 10 most significant bits are kept.

 module mac(
  input [7:0] x1,
  input [7:0] x2,
  output reg [9:0] y,
  output reg [9:0] m,
  input  reset,
  input  clk
);

 reg [9:0] y_next;
 reg [15:0] m16;

 always @(posedge clk)
     if (reset)
       y <= 10'b0;
     else
       y <= y_next;

 always @(*)
   begin
      m16 = x1 * x2;
      m = m16[15:6];
      y_next = y + m;
   end

endmodule

To build a chip for this Verilog file, we must provide a configuration file to tell Openlane about the physical characteristics of the chip. The overall flow is set up such that it will run the entire frontend and backend in one iteration. Furthermore, this implementation does not include simulation/validation of the design, which would have to be completed separately.

The following is an example configuration file for the multiply-accumulate chip.

{
 "DESIGN_NAME": "mac",
 "VERILOG_FILES": "dir::src/mac.v",
 "CLOCK_PORT": "clk",
 "CLOCK_PERIOD": 100,
 "pdk::sky130A": {
   "MAX_FANOUT_CONSTRAINT": 6,
   "FP_CORE_UTIL": 40,
   "PL_TARGET_DENSITY_PCT": "expr::($FP_CORE_UTIL + 10.0)",
   "scl::sky130_fd_sc_hd": {
     "CLOCK_PERIOD": 15
    }
  }
}

In a nutshell, this configuration file indicates the source code files that make up the design, and the selection of the target technology along with physical constraints for the design. The two most important variables are the clock period, and floorplan target utilization (the ratio of the standard cell active area to the total core). There are a great number of additional configurarion variables available that help you handle many other aspects of the design. They are described in detail in online documentation

To implement the design, first start the openlane docker container:

python3 -m openlane --dockerized

And next, run the design flow.

openlane config.json

The flow runs through a great number of individual steps, but every step along the way is documented in a log. These logs can be consulted in the runs subdirectory after the flow completes.

OpenLane Container (2.1.8):/home/pschaumont/ex-flow/openlane2/runs/RUN_2024-09-26_14-45-48% ls
01-verilator-lint                   27-openroad-globalplacement       53-openroad-irdropreport
02-checker-linttimingconstructs     28-odb-writeverilogheader         54-magic-streamout
03-checker-linterrors               29-checker-powergridviolations    55-klayout-streamout
04-checker-lintwarnings             30-openroad-stamidpnr             56-magic-writelef
05-yosys-jsonheader                 31-openroad-repairdesignpostgpl   57-odb-checkdesignantennaproperties
06-yosys-synthesis                  32-openroad-detailedplacement     58-klayout-xor
07-checker-yosysunmappedcells       33-openroad-cts                   59-checker-xor
08-checker-yosyssynthchecks         34-openroad-stamidpnr-1           60-magic-drc
09-checker-netlistassignstatements  35-openroad-resizertimingpostcts  61-klayout-drc
10-openroad-checksdcfiles           36-openroad-stamidpnr-2           62-checker-magicdrc
11-openroad-checkmacroinstances     37-openroad-globalrouting         63-checker-klayoutdrc
12-openroad-staprepnr               38-openroad-checkantennas         64-magic-spiceextraction
13-openroad-floorplan               39-odb-diodesonports              65-checker-illegaloverlap
14-odb-checkmacroantennaproperties  40-openroad-repairantennas        66-netgen-lvs
15-odb-setpowerconnections          41-openroad-stamidpnr-3           67-checker-lvs
16-odb-manualmacroplacement         42-openroad-detailedrouting       68-checker-setupviolations
17-openroad-cutrows                 43-odb-removeroutingobstructions  69-checker-holdviolations
18-openroad-tapendcapinsertion      44-openroad-checkantennas-1       70-checker-maxslewviolations
19-odb-addpdnobstructions           45-checker-trdrc                  71-checker-maxcapviolations
20-openroad-generatepdn             46-odb-reportdisconnectedpins     72-misc-reportmanufacturability
21-odb-removepdnobstructions        47-checker-disconnectedpins       error.log
22-odb-addroutingobstructions       48-odb-reportwirelength           final
23-openroad-globalplacementskipio   49-checker-wirelength             flow.log
24-openroad-ioplacement             50-openroad-fillinsertion         resolved.json
25-odb-customioplacement            51-openroad-rcx                   tmp
26-odb-applydeftemplate             52-openroad-stapostpnr            warning.log

In the final subdirectory, the layout file is available as final/gds/mac.gds. You can open this file in the klayout viewer which is part of openlane. Once inside KLayout, you can also load a layer properties file which replaces the abstract numbered layers with names. For SKY130, there’s such a layer properties file included in the example code. The following shows the layout in klayout, when all metal layers (drawing) are enables, as well as local interconnect, polysilicon, and the cell outline. It’s easy to spot large empty areas in the middle with cells that are no connected to any net. Such cells as filler cells, they are there to ensure to that the power rails on metal1 are connected across the entire row of standard cells.

klayout final/gds/mac.gds
_images/maclayout_ol2.png

Because the view offered by klayout is purely structural, it is not easy to connect the layout to the higher level properties of the design (such as net names, or names of registers). On the other hand, klayout allows you to inspect the layout of individual standard cells. You can do this by selecting one of the standard cells in the design hierarchy and set that as the new design top in klayout. For example, the following shows a flip-flop layout.

_images/floplayout.png

Standard 574 Design Flow

In this course, we are building a flow from Cadence tools. Also, we are studying individual aspects of the design flow separately, and this reflects how we organize the data files.

In the following example, we demonstrate the directory structure for the frontend.

.
├── constraints
│   └── constraints_clk.sdc
├── glsim
│   ├── Makefile
│   └── tb.sv
├── rtl
│   └── mac.sv
├── sim
│   ├── Makefile
│   └── tb.sv
├── sta
│   ├── Makefile
│   └── tempus.tcl
└── syn
    ├── genus_script.tcl
    └── Makefile

The source code of the design is stored in rtl. The code can be simulated with the testbench tb.sv under sim.

RTL Simulation

The following is the testbench tb.sv. Note the following aspects. The testbench records a value change dump (VCD) that allow inspection of the simulation using a waveform viewer such as gtkwave. The testbench applies pseudorandom stimuli, but also verifies the correctness of the results generated.

module tb;
logic [7:0] x1;
logic [7:0] x2;
logic [9:0] m;
logic [9:0] y;
logic reset;
logic clk;

mac dut(.x1(x1),
        .x2(x2),
        .m(m),
        .y(y),
        .reset(reset),
        .clk(clk));

always
begin
   clk = 1'b0;
   #5 clk = 1'b1;
   #5;
end

logic [15:0] test_m;
logic [9:0] test_y;

initial
begin
   $dumpfile("trace.vcd");
   $dumpvars(0, tb);
   x1 = 8'b0;
   x2 = 8'b0;
   test_m = 10'b0;
   test_y = 10'b0;

   reset = 1'b1;
   repeat(3)
      @(posedge clk);
   #1;
   reset = 1'b0;

   $display("m %d y %d", m, y);

   repeat(30)
     begin
        x1 = $random;
        x2 = $random;
        @(posedge clk);
        #1;

        test_m = (x1 * x2) >> 6;
        test_y = test_y + test_m;
        $display("x1 %d x2 %d m %d y %d exp_m %d exp_y %d ERR %d",
                 x1, x2, m, y, test_m, test_y, ~((test_m == m) && (test_y == y)));
        #1;
     end // repeat (30)

   $finish;
end

endmodule

To understand the simulation command, inspect the Makefile. The general principle of the ECE574 flow is to drive all implementation steps from a Makefile which will call the simulator with the proper command line parameters.

This particular makefile has two targets: sim and simg. For batchmode simulation, use make sim. To access the simulator GUI, use make simg.

RTL Synthesis

Building a gate-level netlist for the multiply-accumulate requires synthesis constraints as well as a synthesis script.

The synthesis constraints are stored under constraints/constraints_clk.sdc. Constraints include the clock period, as well as the input delay and output delay for ports. The constraints file is written in such a way that the user can provide a clock period value through an environment variable CLOCKPERIOD.

if {![info exists ::env(CLOCKPERIOD)] } {
   set clockPeriod 20
} else {
   set clockPeriod [getenv CLOCKPERIOD]
}

create_clock -name clk -period $clockPeriod [get_ports "clk"]

set_input_delay  0 -clock clk [all_inputs -no_clocks]
set_output_delay 0 -clock clk [all_outputs]

The use of environment variables allows you to develop scripts that can serve multiple designs and multiple purposes. In a Makefile we frequently rely on environment variables as well, to define tool parameters and options.

Consider the Makefile in the syn subdirectory.

all: syn

syn:
     BASENAME=mac \
     CLOCKPERIOD=4 \
     TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing \
     TIMINGLIB=slow_vdd1v0_basicCells.lib \
     VERILOG='../rtl/mac.sv' \
     genus -f genus_script.tcl

clean:
     rm -rf outputs reports genus.log* genus.cmd* *~ fv

The command make syn would run genus. Note how the makefile sets several environment variables, including BASENAME, CLOCKPERIOD, TIMINGPATH, TIMINGLIB and VERILOG. These environment variables are used inside of the genus script, as well as by the design constraints. The use of environment variables allows you to separate the variable parts from the fixed parts (e.g., the scripts) in the design flow. Take a look at the example genus script below. The environment variables are accessed from within the TCL environment

if {![info exists ::env(TIMINGPATH)] } {
   puts "Error: missing TIMINGPATH"
   exit(0)
}

if {![info exists ::env(TIMINGLIB)] } {
   puts "Error: missing TIMINGLIB"
   exit(0)
}

set_db init_lib_search_path [getenv TIMINGPATH]
read_libs [getenv TIMINGLIB]

if {![info exists ::env(VERILOG)] } {
   puts "Error: missing VERILOG"
   exit(0)
}

set_db init_hdl_search_path ../rtl/
read_hdl -language sv [getenv VERILOG]

elaborate
read_sdc ../constraints/constraints_clk.sdc

set_db syn_generic_effort high
set_db syn_map_effort high
set_db syn_opt_effort high

syn_generic
syn_map
syn_opt

if {![info exists ::env(BASENAME)] } {
   set basename "default"
} else {
   set basename [getenv BASENAME]
}

#reports
report_timing > reports/${basename}_report_timing.rpt
report_power  > reports/${basename}_report_power.rpt
report_area   > reports/${basename}_report_area.rpt
report_qor    > reports/${basename}_report_qor.rpt

set outputnetlist     outputs/${basename}_netlist.v
set outputconstraints outputs/${basename}_constraints.sdc
set outputdelays      outputs/${basename}_delays.sdf

write_hdl > $outputnetlist
write_sdc > $outputconstraints
write_sdf -timescale ns \
          -nonegchecks \
          -recrem split \
          -edges check_edge  \
          -setuphold split > $outputdelays

exit

Static Timing Analysis

The sta directory is used to run Static Timing Analysis after gate level synthesis. Similar to synthesis, Static Timing Analysis scripts can be developed such that all the variable parts are stored outside of the scripts as environment variables. This leads to the following Makefile.

sta:
     BASENAME=mac \
     TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing \
     TIMINGLIB=slow_vdd1v0_basicCells.lib \
        tempus -files tempus.tcl

clean:
     rm -f *.rpt *.slk tempus.cmd* tempus.rpt* tempus.log* *~

As well as the following tempus script:

if {![info exists ::env(TIMINGLIB)] } {
       puts "Error: missing TIMINGLIB"
       exit(0)
}

if {![info exists ::env(TIMINGPATH)] } {
       puts "Error: missing TIMINGPATH"
       exit(0)
}

if {![info exists ::env(BASENAME)] } {
       puts "Error: missing BASENAME"
       exit(0)
}

read_lib [getenv TIMINGPATH]/[getenv TIMINGLIB]

set basename [getenv BASENAME]

read_verilog ../syn/outputs/${basename}_netlist.v

set_top_module ${basename}

read_sdc ../syn/outputs/${basename}_constraints.sdc
read_sdf ../syn/outputs/${basename}_delays.sdf

report_timing -late -max_paths 10 > late.rpt
report_timing -early -max_paths 10 > early.rpt

report_timing  -from [all_inputs] \
               -to [all_outputs] \
               -max_paths 10 \
               -path_type summary  > allpaths.rpt
report_timing  -from [all_inputs] \
               -to [all_registers] \
               -max_paths 10 \
               -path_type summary  >> allpaths.rpt
report_timing  -from [all_registers] \
               -to [all_registers] \
               -max_paths 10 \
               -path_type summary >> allpaths.rpt
report_timing  -from [all_registers] \
               -to [all_outputs] \
               -max_paths 10 \
               -path_type summary >> allpaths.rpt
exit

Gate Level Simulation

After RTL synthesis and static timing analysis, you can also simulate the gate-level netlist of your hardware design. If your design is created following proper synchronous design practice, you will be able to reuse the same testbench as used for the RTL simulation.

However, since this is a gate-level simulation, there are a few things that deserve attention.

  1. A gate-level netlist uses cells from a cell library, and functional views for those cells have to be included in the simulation.

  2. A gate-level netlist, as an outcome of RTL synthesis, will adopt specific delays in terms of an SDF file. The SDF file has to be added to the testbench to ensure proper gate delays are simulated.

We’ll address the second problem first. Adding an SDF file to a simulation can be done using $sdf_annotate in Verilog. Thus, in the testbench, the following block is added. This block is conditional to the definition of an environment variable USE_SDF. The idea is that we can simulate the gate-level netlist with or without timing back-annotation. When the timing back-annotation is not included (i.e., the USE_SDF macro is not use), we get a purely functionial simulation of the gate netlist. When timing back-annotation is included, gate-delays are included and we will observe glitching effects as well as (possible) timing faults.

`ifdef USE_SDF
initial
  begin
     $sdf_annotate("../syn/outputs/mac_delays.sdf",tb.dut,,"sdf.log","MAXIMUM");

  end
`endif

The makefile command specifies the other aspect of gate-level simulation. The simulation now includes a functional view of the standard cell library used to produce the gate-level netlist. The simulation makefile below is used for gate-level simulation of the post-synthesis netlist. We can also perform the gate-level simulation after the layout is complete, and that would be a post-layout simulation.

sim-postsyn:
     xrun -timescale 1ns/1ps \
     tb.sv \
     ../syn/outputs/mac_netlist.v \
     /opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/verilog/slow_vdd1v0_basicCells.v \
     -access +rwc \
     -define USE_SDF \
     -top tb

 sim-postsyn-gui:
     xrun -timescale 1ns/1ps \
     tb.sv \
     ../syn/outputs/mac_netlist.v \
     /opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/verilog/slow_vdd1v0_basicCells.v \
     -access +rwc \
     -define USE_SDF \
     -top tb \
     +gui

 clean:
     rm -rf trace.vcd  xcelium.d  xrun.history  xrun.log xrun.key *~

In this overview of the ece574 design flow, we do not include the backend portion of the flow yet. That backend portion will be added after we discuss floorplanning and place-and-route using Cadence Innovus.

FuseSoC

The final flow which we will discuss is FuseSoC. FuseSoC is not a complete flow, but rather a pacakge manager that can configure (and run) flows for complex SoC designs. FuseSoC supports the description of dependencies between cores. A core is captured in a .core file.

We have already used FuseSoC in the SoC lecture. The IBEX processor, or example, is described as follows in a ibex_core.core file. Only the first few lines are included. It defines a core with name lowrisc:ibex:ibex_core, followed by a definition of the fileset called files_rtl. There are dependencies as well as specific files. The dependencies (such as lowrisc:prim:lfsr) point to other cores, while the files point to specific files.

name: "lowrisc:ibex:ibex_core:0.1"
description: "Ibex CPU Core Components"

filesets:
  files_rtl:
    depend:
      - lowrisc:prim:assert
      - lowrisc:prim:clock_gating
      - lowrisc:prim:lfsr
      - lowrisc:prim:mubi
      - lowrisc:ibex:ibex_pkg
      - lowrisc:ibex:ibex_icache
      - lowrisc:dv:dv_fcov_macros
    files:
      - rtl/ibex_alu.sv
      - rtl/ibex_branch_predict.sv
      - rtl/ibex_compressed_decoder.sv
      - rtl/ibex_controller.sv
      - rtl/ibex_cs_registers.sv
      - rtl/ibex_csr.sv
      - rtl/ibex_counter.sv
      - rtl/ibex_decoder.sv
      - rtl/ibex_ex_block.sv
      - rtl/ibex_fetch_fifo.sv
      - rtl/ibex_id_stage.sv
      - rtl/ibex_if_stage.sv
      - rtl/ibex_load_store_unit.sv
      - rtl/ibex_multdiv_fast.sv
      - rtl/ibex_multdiv_slow.sv
      - rtl/ibex_prefetch_buffer.sv
      - rtl/ibex_pmp.sv
      - rtl/ibex_wb_stage.sv
      - rtl/ibex_dummy_instr.sv
      - rtl/ibex_core.sv
      - rtl/ibex_pmp_reset_default.svh: {is_include_file: true}
    file_type: systemVerilogSource

The core dependencies will expand into additional core files. For example, the dependency lowrisc:prim:lfsr refers to a file called ./vendor/lowrisc_ip/ip/prim/prim_lfsr.core.

name: "lowrisc:prim:lfsr:0.1"
description: "A Linear-Feedback Shift Register (LFSR) primitive"
filesets:
  files_rtl:
    depend:
      - lowrisc:prim:assert
      - lowrisc:prim:cipher_pkg
    files:
      - rtl/prim_lfsr.sv
    file_type: systemVerilogSource

FuseSoC can also define one or more targets. These are the targets of the design flow, and can include simulation as well as concrete hardware synthesis (such as yosys or vivado). A target defines a set of concrete parameters and file sets that should be used for a tool.

When FuseSoC is called, then one of three actions can be specified:

  1. In setup, FuseSoC will assemble all file sets into a dependency tree such that a flattened view of the design is created.

  2. In build, FuseSoC will call the tool specified in the tool flow. Tools can include e.g., verilator, vivado or yosys.

  3. In run, FuseSoC will call the output of the tool flow, with an action specific to the flow. For example, a verilator flow runs the simulation, while a vivado flow can configure an FPGA.

We will illustrate FuseSoC on a github repository that uses the IBEX on a demonstration system. The repository was https://github.com/lowRISC/ibex-demo-system.

The IBEX demo system is a small demonstration containing a RAM, a UART, a GPIO and an SPI. It is used to implement IBEX demo designs on FPGA platform.

git clone https://github.com/lowRISC/ibex-demo-system

As the class server does not contain the vivado flow, we will demonstrate the design verification flow.

First, make sure you enable the fusesoc environment and the updated RedHat development toolset.

pyenv activate fusesoc
scal enable devtoolset-9 bash

You can then build a simulation as follows.

fusesoc --cores-root=. run --target=sim --tool=verilator --setup --build lowrisc:ibex:demo_system

The resulting simulator is built into build/lowrisc_ibex_demo_system_0/sim-verilator/, but to run it we first need a software application. The application software for the system is held under sw. To compiler the C applications:

cd sw/c

# if build does not exists, create it
mkdir build
cd build

cmake ..
make

Finally, to run the simulator, provide a compiled application as argument, for example:

# run from the main demo directory

./build/lowrisc_ibex_demo_system_0/sim-verilator/Vtop_verilator \
-t sim.fst \
--meminit=ram,./sw/c/build/demo/hello_world/demo

This simulation runs forever, so you have to stop it using Ctrl-C. As with the ibex example discussed earlier, the ‘-t’ parameter can be used to generate a VCD (FST) file.

./build/lowrisc_ibex_demo_system_0/sim-verilator/Vtop_verilator \
         -t sim.fst \
         --meminit=ram,./sw/c/build/demo/hello_world/demo

Simulation of Ibex Demo System
==============================

Tracing can be toggled by sending SIGUSR1 to this process:
$ kill -USR1 7511

UART: Created /dev/pts/11 for uart0. Connect to it with any terminal program, e.g.
$ screen /dev/pts/11
UART: Additionally writing all UART output to 'uart0.log'.

Simulation running, end by pressing CTRL-c.
Tracing enabled.
Writing simulation traces to sim.fst
^CReceived stop request, shutting down simulation.

Simulation statistics
=====================
Executed cycles:  9256002
Wallclock time:   59.906 s
Simulation speed: 154509 cycles/s (154.509 kHz)
Trace file size:  170321473 B

You can view the simulation traces by calling
$ gtkwave sim.fst

Performance Counters
====================
Cycles:                     6465
Instructions Retired:       3677
LSU Busy:                   1624
Fetch Wait:                 285
Loads:                      987
Stores:                     637
Jumps:                      308
Conditional Branches:       220
Taken Conditional Branches: 127

To build de FPGA, you run the following command. The command ends in an error message on the class server, because vivado is not available. However, we will inspect the files that have been created.

fusesoc --cores-root=. run --target=synth --setup --build lowrisc:ibex:demo_system

Note that the target, in this case, says synth instead of sim. The ibex_demo_system.core specifies what happens for this target:

targets:
  default: &default_target
    filesets:
      - files_rtl
  synth:
    <<: *default_target
    default_tool: vivado
    filesets_append:
      - files_xilinx
      - files_constraints
    toplevel: top_artya7
    tools:
      vivado:
        part: "xc7a35tcsg324-1"  # Default to Arty A7-35
    parameters:
      - SRAMInitFile
      - PRIM_DEFAULT_IMPL=prim_pkg::ImplXilinx
    flags:
      use_bscane_tap: true

After the command terminates, inspect the directory structure under build. The src is where the RTL source code tree is stored, sim-verilator is where the simulation is compiled, and synth-vivado is where bitstream generation is completed.

.
├── sim-verilator
├── src
└── synth-vivado

Inside of synth-vivado, you’ll find synthesis scripts and constraint files for this implementation. In particular, take a look at the *.tcl files.

If you have a laptop configured with WSL2, you can configure the entire implementation including vivado. My setup includes vivado 2024.1, fusesoc, and ibex. Follow the configuration commands under https://github.com/lowRISC/ibex for setup and configuration of a RISC-V toolchain and the fusesoc setup. Then, run the fpga synthesis:

fusesoc --cores-root=. run --target=synth --run lowrisc:ibex:demo_system

which yields the following synthesis results:

Primitive

Count

FLOP

5209

LUT

6110

BMEM

16

DMEM

88

The following is an FPGA floorplan of IBEX demo system in an Artix7 FPGA (default target).

_images/ibex-demo-system-fpga.png