Lecture 1/29 Finite State Machines 1/ Define iterative computation 2/ Moore and Mealy automata 3/ Mapping to Verilog - Safe FSM - Assign all outputs 4/ Synthesis 5:00 Iterative Computations There are two different styles at implementing computations in digital hardware: 1. Combinational - Reads all inputs and produces all outputs, instantly: z = f(x) - The function f( ) can be implemented using combinational logic gates only: AND, OR, NOT - The complexity of f( ) defines the number of gates that are needed to implement the function. The number of gates can be linear (such as with addition), but it can be quadratic (such as with multiplication) or higher (such as with exponentiation) - Purely combinational computation is limited in practice to basic operations. 2. Sequential - Reads the inputs using iterated operations: # input is split (x0, x1, x2, x3, ..) = x # iterated computation (z0, y1) = g(x0, y0) (z1, y2) = g(x1, y1) (z2, y3) = g(x2, y2) (z3, y4) = g(x3, y3) ... # output is recombined z = (z0, z1, z2, ...) - The function g( ) can still be implemented using combinational logic gates. g( ) is expected to have a lower complexity than f( ) because it has to do less work. Example: if f() represents a 32-bit addition, then g() could be a 1-bit addition and g() could compute the result of f() in 32 steps - A new intermediate variable y defines the initial condition for each step in the sequence. The initial condition for each step in the sequence is defined by the final condition in the previous step. g() therefore has TWO inputs (xi and yi) and TWO outputs (zi and yi+1) - Iterated computation can be implemented as combinational logic (see above), but it can also be implemented as sequential logic. This requires state (register): +-----------+ | | xi --+ +--> zi | g | +----+ +-->-- yi+1 --+ | | | | | +-----------+ | | +===========+ | +----< MEMORY <-------------+ +===========+ - There are two standard techniques to implement g: Moore automata and Mealy automata 5:10 Mealy Automata A Mealy automaton is a structure consisting of 5 elements: MealyFSM = - set of input values I - set of output values Q - set of states S - A state transition function G: S x I -> S - An output function H: S x I -> Q Example: a two-bit enabled counter module twobitenabled(input clk, input reset, input enable, output [1:0] q); q will increment each time enable is 1. when q=11, it will wrap around to 00. inputs I: enable signal {0, 1} output Q: q signal {00, 01, 10, 11} states S: S = {S0, S1, S2, S3} the counting activity will have to distinguish four different states to allow four different outputs state transition function: G: S x I -> S G0 = G when enable signal is 0 = {(S0,S0), (S1,S1), (S2,S2), (S3,S3)} (S0,S0) read as: of the start state is S0, then the target state is S0 G1 = G when enable signal is 0 = {(S0,S1), (S1,S2), (S2,S3), (S3,S0)} output transition function H: S x I -> Q H0 = H when enable signal is 0 = {(S0, 00), (S1, 01), (S2, 10), (S3, 11)} H1 = H when enable signal is 1 = {(S0, 00), (S1, 01), (S2, 10), (S3, 11)} STATE TRANSITION TABLE: state\ enable 0 1 -----+------------------------------ S0 | S0, 00 S1, 00 S1 | S1, 01 S2, 01 S2 | S2, 10 S3, 10 S3 | S3, 11 S0, 11 STATE TRANSITION GRAPH (Refer to in-class overhead) 5:20 Moore Automata Similar to Mealy. Only the output function is simplified: MooreFSM = - set of input values I - set of output values Q - set of states S - A state transition function G: S x I -> S - An output function H: S -> Q For Moore machines, we can write a similar state transition table, with the difference that the output is tied to the state, not to the state transition. Example: a bit-serial adder module bitserialadd(input clk, input reset, input a, input b, output q); After reset, a and b are digital words, processed from lsb to msb q is the sum bit, produced from LSB to msb. a 1011 b 1001 ------------ q 1|0100 Bit by bit: Iteration 0 1 + 1 => q = 0 carry = 1 | +--------------------------+ | Iteration 1 1 + 1 + 0 => q = 0 carry = 1 | +--------------------------+ | Iteration 2 1 + 0 + 0 => q = 1 carry = 0 | +--------------------------+ | Iteration 3 0 + 1 + 1 => q = 0 carry = 1 Inputs I: ab = {00, 01, 10, 11} Outputs Q: q = {0, 1} States S: Basically, we need to remember the carry bit from one iteration (bit position) as well as the output bit. So we need four states. S = {S0, S1, S2, S3} S0 q=0, carry=0 S1 q=1, carry=0 S2 q=0, carry=1 S3 q=1, carry=1 State transition function G: S x I -> S G00 {(S0,S0), (S1,S0), (S2,S1), (S3, S1)} G01 {(S0,S1), (S1,S1), (S2,S2), (S3, S2)} G10 {(S0,S1), (S1,S1), (S2,S2), (S3, S2)} G11 {(S0,S2), (S1,S2), (S2,S3), (S3, S3)} Output function H: S -> Q H = {0, 1, 0, 1} STATE TRANSITION TABLE: state\ ab 00 01 10 11 | q -----+--------------------------------+------- S0 | S0 S1 S1 S2 | 0 S1 | S0 S1 S1 S2 | 1 S2 | S1 S2 S2 S3 | 0 S3 | S1 S2 S2 S3 | 1 STATE TRANSITION GRAPH (Refer to in-class overhead) * It's possible to show that for every Mealy machine, there's a Moore machine that has the same behavior, i.e., that produces the same sequence of outputs for the same sequence of inputs. * It's possible to demonstrate that a given Moore/Mealy is minimal, ie. that every state is unique and that there are no redundant or 'equivalent' states. 5:30 Mapping to Verilog In Hardware Design we use FINITE STATE MACHINES to implement Moore and Mealy automata. The term 'Automata' is slightly more general than an FSM. Every FSM is an automaton, but not every automaton is an FSM. For this lecture, we will focus next on FSM. To discuss mapping of FSM to Verilog, it's essential to keep the structure the FSM in mind. +----------------------------------------+ | | | +---------------+ +---------+ | +------------+ +--->+ Next State +----->+ Register+--+-->+ Output | ----+--->+ Function | | | +-->+ Function +---> | +---------------+ +---------+ | +------------+ | | +------------( for Mealy FSM )-----------+ So we need to write Verilog code. - to capture the next state function - to capture the state register - to capture the output function The design process of a Finite State Machine is as follows: 1. Create a state transition graph 2. Choose a state encoding 3. Extract a state transition table 4. Create the next state logic and output logic Depending on how you write your Verilog, HDL synthesis tools will handle some of these steps for you: Method I: - Step 1, 2, 3 by hand - Create a STRUCTURAL Verilog Model - Tools synthesize logic for step 4. Method II: - Step 1, 2 by hand - Create a BEHAVIORAL Verilog Model with explicit state encoding - Tools synthesize logic for step 3, 4. Method III: - Step 1 by hand - Create a BEHAVIORAL Verilog Model with automatic state encoding - Tools synthesize logic for step 2, 3, 4. Method III is less work than method II, which is less work than the method I. We will focus on writing Verilog following Method III. Mapping of bit-serial adder using method III - Using three different behavioral modules Next State Function: always @(*) Register: always @(posedge clk) Output Function: always @(*) module bitserialadd(input clk, input reset, input a, input b, output q); localparam S0 = 0, S1 = 1, S2 = 2, S3 = 3; reg [1:0] state, statenext; // register always @(posedge clk) begin if (reset) // synchronous reset state <= S0; // initial state else state <= statenext; end // next-state logic always @(*) begin statenext = state; // default assignment case (state) S0,S1: begin if (a & b) statenext = S2; else if (a | b) statenext = S1; else statenext = S0; end S2,S3: begin if (a & b) statenext = S3; else if (a | b) statenext = S2; else statenext = S1; end default: statenext = S0; // safe endcase end // output function assign q = ((state == S1) || (state == S2)) ? 1'b1 : 1'b0; endmodule Remarks: 1/ The state is a 'symbolic assignment.' That means that we can ask the tool to optimize the encoding of the four states, even using multiple bits for it (cfr one-hot encoding, later). localparam S0 = 0, S1 = 1, S2 = 2, S3 = 3; reg [1:0] state, statenext; 2/ CODING STYLE: Every register has two 'reg' variables in Verilog. These variables correspond to the input and the output of the register. statenext is the input; the state is the output. In the always @(posedge), the state is assigned In the always @(*), statenext is assigned 3/ CODING STYLE: In every always @(*), the output variables are ALWAYS assigned by default. always @(*) begin statenext = state; // default assignment .. This is true for every reg variable that you will assign in an @(*). If you do an assignment on it, you MUST introduce a default assignment. 4/ CODING STYLE: Every case statement ALWAYS has a default assignment. default: statenext = S0; // safe For FSM, such a default assignment has the effect of creating a safe FSM that is, an FSM that will return to a known state if it every ends up in an unknown state. For example, assume there are 3 bits in the state register, and you have only five states. That means out of the 2^3 = 8 possible state encoding. You're only using 5. So there are three unknown or 'illegal.' states. The next-state logic MUST specify what should happen in those illegal states. Hence, you MUST have a default assignment. 5:40 Synthesis We demonstrate the synthesis of the FSM by encapsulating the FSM as follows: module top( input clock, input [3:0] key, output [9:0] led); bitserialadd dut(clock, key[0], key[1], key[2], led[0]); endmodule Where top is a module that maps into the DE1SoC. In other words, we have tied the inputs to buttons and the output to LED. That enables us to study the synthesis results of the tool. We synthesize this as before with ================================================================ quartus_sh --flow compile top ================================================================ This generates several report files: top.flow.rpt Overal Flow results & settings top.map.rpt Analyze Verilog, generate FPGA netlist **** top.fit.rpt Map FPGA netlist into FPGA fabric top.asm.rpt Generate bitstream top.sta.rpt Static Timing Analysis The output of interest, which describes finite state machine compilation, is in top.map.rpt. It says: Encoding Type: One-Hot +------------------------------------------------------+ ; State Machine - |top|bitserialadd:dut|state ; +----------+----------+----------+----------+----------+ ; Name ; state.S3 ; state.S2 ; state.S1 ; state.S0 ; +----------+----------+----------+----------+----------+ ; state.S0 ; 0 ; 0 ; 0 ; 0 ; ; state.S1 ; 0 ; 0 ; 1 ; 1 ; ; state.S2 ; 0 ; 1 ; 0 ; 1 ; ; state.S3 ; 1 ; 0 ; 0 ; 1 ; +----------+----------+----------+----------+----------+ The report also says: +------------------------------------------------------------+ ; Registers Removed During Synthesis ; +---------------------------------------+--------------------+ ; Register name ; Reason for Removal ; +---------------------------------------+--------------------+ ; bitserialadd:dut|state.S2 ; Lost fanout ; ; bitserialadd:dut|state~2 ; Lost fanout ; ; bitserialadd:dut|state~3 ; Lost fanout ; ; Total Number of Removed Registers = 3 ; ; +---------------------------------------+--------------------+ +------------------------------------------------------+ ; General Register Statistics ; +----------------------------------------------+-------+ ; Statistic ; Value ; +----------------------------------------------+-------+ ; Total registers ; 3 ; ; Number of registers using Synchronous Clear ; 0 ; ; Number of registers using Synchronous Load ; 0 ; ; Number of registers using Asynchronous Clear ; 0 ; ; Number of registers using Asynchronous Load ; 0 ; ; Number of registers using Clock Enable ; 0 ; ; Number of registers using Preset ; 0 ; +----------------------------------------------+-------+ So what happened? 1/ The synthesis tools did not follow our suggested encoding: localparam S0 = 0, S1 = 1, S2 = 2, S3 = 3; reg [1:0] state, statenext; but instead used something called 'one-hot' encoding. The idea of one-hot encoding is to allocate one register per state. So for four states, we would have 4 registers. 2/ The synthesis tools further optimized the design and remove the register used for state S2. That state is somehow implied since if you have three registers to record S0, S1, and S3, then all registers being zero implies that you are in state S2. Check for yourself: Run this design in Quartus, double-click on Compile Design. Then, go to Analysis and Synhesis - Netlist Viewers and click on RTL viewer, and after that on Technology Map Viewer. (See slides) 5:50 State Encoding Finite State Machines can use different kinds of state encoding. Quartus Prime provides the following options: "default" Use an encoding based on the number of enumeration literals in the Enumeration Type. If the number of literals is less than five, use the "sequential" encoding. If the number of literals is more than five, but fewer than 50, use a "one-hot" encoding. Otherwise, use a "gray" encoding. "sequential" Use a binary encoding in which the first enumeration literal in the Enumeration Type has encoding 0 and the second 1. "gray" Use an encoding in which the encodings for adjacent enumeration literals differ by exactly one bit. An N-bit gray code can represent 2N values. "johnson" Use an encoding similar to a gray code. An N-bit Johnson code can represent at most 2N states, but requires less logic than a gray encoding. "one-hot" The default encoding style requiring N bits, in which N is the number of enumeration literals in the Enumeration Type. "compact" Use an encoding with the fewest bits. "user" Encode each state using its value in the Verilog source. By changing the values of your state constants, you can change the encoding of your state machine. You can specify this in the Verilog using a SYNTHESIS ATTRIBUTE, or else using a tool option. Coding examples - Gray Code 0 0 1 1 2 11 3 10 4 110 5 111 6 101 7 100 8 1100 9 1101 10 1111 11 1110 12 1010 13 1011 14 1001 15 1000 Johnson Code 0 0000 1 1000 2 1100 3 1110 4 1111 5 0111 6 0011 7 0001 6:00 Conclusion - Iterative Computation - Mealy FSM - Moore FSM - Mapping to Verilog - State Assignment - Synthesis - Coming up: - Decomposition of Control problems into multiple FSM - Handshaking between FSM - Next lecture: - FSM continued - Introduction Homework 2 - Simulation internals