Memory Modelling

Inference & instantiation

Inference: letting the synthesis tool recognize memory patterns in your code and implement them using built-in memory blocks (BRAM, etc.)
Pros: portable code that can compile on different FPGA architectures, and simulation/synthesis tools
Cons: less control over memory implementation, may not be optimized for specific use cases
Instantiation: explicitly instantiating memory blocks provided by the FPGA vendor
Pros: more control over memory implementation, can be optimized for specific use cases
Cons: more complex code, NOT portable across different FPGA architectures, require simulation/synthesis libraries to simulate

Memory simulation

we typically use reg[W-1:0] d_mem [2**S-1:0]; to model memory with W-bit width and S-bit address space
to access memory, we use d_mem[addr] where addr is S-bit address

Memory elements: register files (RF)

the simplest memory element is a register file, which is an array of registers that can be read and written (reg [2**S-1:0] d_mem [W-1:0];)
typically used for small memories (e.g., CPU register files)
can be inferred by synthesis tools from simple read/write logic
write operations:
1. write address is send to the decoder (array indices) to locate the specific register
2. data is written to the register at the specified address on the rising edge of the clock if write enable is high
read operations:
1. read address is send to the decoder to locate the specific register
2. data is read from the register at the specified address (can be combinational or registered output)

BRAM (Block RAM)

able to transfer data between multiple clock domains and FPGA targets
stores large data set on FPGA more efficiently than using lookup tables (LUTs) or flip-flops (FFs)

Single port configuration

1765331457814

Dual port configuration

the memory array with dual write/read ports (support two simultaneous write/read operations to different address/data per clock cycle)
DOES NOT offer priority resolution: the memory location being accessed by both ports at the same time may lead to data corruption if not managed properly

1765331488900

Distrubuted RAM

best for very small memory sizes (e.g., less than 1-2 KB, regfiles)
supports async reads and sync writes
however, since the quartus synthesis tool does not support inference of distributed RAM, a block RAM is used so it will build out of general logic resources (ALUTs) - cause 36 ALUTs and 64 registers
the read-data output is available for less than 1 clock cycle

module single_port_distributed_ram_model1 #(
  parameter DATA_WIDTH=4, //width of data bus
  parameter ADDR_WIDTH=4 //width of addresses buses
  )(
input logic clk, // clock
input logic wr_en, // '1' indicates write and '0' indicates read
input logic[DATA_WIDTH-1:0] write_data, //data to be written to memory
input logic[ADDR_WIDTH-1:0] addr, //address for write or read operation
output logic[DATA_WIDTH-1:0] read_data //read data from memory
);
  // Two dimensional memory array
  logic[DATA_WIDTH-1:0] mem[2**ADDR_WIDTH-1:0];
  // Synchronous write
  always_ff@(posedge clk) begin
    if(wr_en) mem[addr] <= write_data;
  end
  // asynchronous read
  // since adddress is NOT registered for read, synthesizer will not map the memory model to internal block RAM IP and create a distributed RAM using ALUTs instead
  assign read_data = mem[addr];
endmodule:

because the block RAM

Block/embedded RAM

dedicated memory blocks available in FPGA fabric
best for moderate to large memory sizes (e.g., 2MB)
require sync read and sync write
the megafunction wizard in quartus can be used to instantiate block RAMs with desired configurations (e.g., single-port, dual-port, width, depth, etc.) (megafunction inferred), causing 0 ALUTs and 0 registers

module single_port_bram_model1 #(
parameter DATA_WIDTH=4, //width of data bus
parameter ADDR_WIDTH=4 //width of addresses buses
)(
input logic clk, // clock
input logic wr_en, // '1' indicates write and '0' indicates read
input logic[DATA_WIDTH-1:0] write_data, //data to be written
input logic[ADDR_WIDTH-1:0] addr, //address for write or read operation
output logic[DATA_WIDTH-1:0] read_data //read data from memory
);
  // Two dimensional memory array
  logic[DATA_WIDTH-1:0] mem[2**ADDR_WIDTH-1:0];
  logic[ADDR_WIDTH-1:0] read_addr_t;
  // Synchronous write
  always_ff@(posedge clk) begin
    if(wr_en) mem[addr] <= write_data;
    read_addr_t = addr; // since address is registered for read, synthesizer will map memory model to internal embedded block RAM IP
  end
  // asynchronous read
  assign read_data = mem[read_addr_t];
endmodule

example

Question

Consider the following two SystemVerilog memory models:

module ram_A ( 
input logic clk, 
input logic wr_en, 
input logic [3:0] addr, 
input logic [7:0] data_in, 
output logic [7:0] data_out); 
  logic [7:0] mem [15:0]; 
  // write 
  always_ff @(posedge clk) begin 
    if (wr_en)  mem[addr] <= data_in; 
  end 
  assign data_out = mem[addr]; 
endmodule 


module ram_B ( 
input logic clk, 
input logic wr_en, 
input logic [3:0] addr, 
input logic [7:0] data_in, 
output logic [7:0] data_out); 
  logic [7:0] mem [15:0]; 
  always_ff @(posedge clk) begin 
    if (wr_en) 
      mem[addr] <= data_in; 
    else 
      data_out <= mem[addr]; 
  end 
endmodule

Assume you target an Intel/Altera FPGA with RAM inference set to “auto”. Which statement is most accurate about how the synthesizer will implement these two modules? A. Both ram_A and ram_B will infer Block RAMs, because both use a two-dimensional array. B. Both ram_A and ram_B will infer Distributed RAM (LUT-based), because neither model registers the address explicitly. C. ram_A will infer Distributed RAM, while ram_B will infer a single-port Block RAM with synchronous read. D. ram_A will infer a True Dual-Port Block RAM and ram_B will infer a Simple Dual-Port Block RAM, since both have one write and one read.

Answer

C.

distributed RAM is async read, sync write

block RAM is sync read, sync write