Memory Modelling
Inference & instantiation
-
Inference: letting the synthesis tool recognize memory patterns in your code and implement them using built-in memory blocks (BRAM, etc.)
-
Pros: portable code that can compile on different FPGA architectures, and simulation/synthesis tools
- Cons: less control over memory implementation, may not be optimized for specific use cases
-
Instantiation: explicitly instantiating memory blocks provided by the FPGA vendor
-
Pros: more control over memory implementation, can be optimized for specific use cases
- Cons: more complex code, NOT portable across different FPGA architectures, require simulation/synthesis libraries to simulate
Memory simulation
- we typically use
reg[W-1:0] d_mem [2**S-1:0];to model memory with W-bit width and S-bit address space - to access memory, we use
d_mem[addr]where addr is S-bit address
Memory elements: register files (RF)
- the simplest memory element is a register file, which is an array of registers that can be read and written (
reg [2**S-1:0] d_mem [W-1:0];) - typically used for small memories (e.g., CPU register files)
-
can be inferred by synthesis tools from simple read/write logic
-
write operations:
- write address is send to the decoder (array indices) to locate the specific register
- data is written to the register at the specified address on the rising edge of the clock if write enable is high
- read operations:
-
- read address is send to the decoder to locate the specific register
- data is read from the register at the specified address (can be combinational or registered output)
BRAM (Block RAM)
- able to transfer data between multiple clock domains and FPGA targets
- stores large data set on FPGA more efficiently than using lookup tables (LUTs) or flip-flops (FFs)
Single port configuration

Dual port configuration
- the memory array with dual write/read ports (support two simultaneous write/read operations to different address/data per clock cycle)
- DOES NOT offer priority resolution: the memory location being accessed by both ports at the same time may lead to data corruption if not managed properly

Distrubuted RAM
- best for very small memory sizes (e.g., less than 1-2 KB, regfiles)
- supports async reads and sync writes
- however, since the quartus synthesis tool does not support inference of distributed RAM, a block RAM is used so it will build out of general logic resources (ALUTs) - cause 36 ALUTs and 64 registers
- the read-data output is available for less than 1 clock cycle
module single_port_distributed_ram_model1 #(
parameter DATA_WIDTH=4, //width of data bus
parameter ADDR_WIDTH=4 //width of addresses buses
)(
input logic clk, // clock
input logic wr_en, // '1' indicates write and '0' indicates read
input logic[DATA_WIDTH-1:0] write_data, //data to be written to memory
input logic[ADDR_WIDTH-1:0] addr, //address for write or read operation
output logic[DATA_WIDTH-1:0] read_data //read data from memory
);
// Two dimensional memory array
logic[DATA_WIDTH-1:0] mem[2**ADDR_WIDTH-1:0];
// Synchronous write
always_ff@(posedge clk) begin
if(wr_en) mem[addr] <= write_data;
end
// asynchronous read
// since adddress is NOT registered for read, synthesizer will not map the memory model to internal block RAM IP and create a distributed RAM using ALUTs instead
assign read_data = mem[addr];
endmodule:
- because the block RAM
Block/embedded RAM
- dedicated memory blocks available in FPGA fabric
- best for moderate to large memory sizes (e.g., 2MB)
- require sync read and sync write
- the megafunction wizard in quartus can be used to instantiate block RAMs with desired configurations (e.g., single-port, dual-port, width, depth, etc.) (megafunction inferred), causing 0 ALUTs and 0 registers
module single_port_bram_model1 #(
parameter DATA_WIDTH=4, //width of data bus
parameter ADDR_WIDTH=4 //width of addresses buses
)(
input logic clk, // clock
input logic wr_en, // '1' indicates write and '0' indicates read
input logic[DATA_WIDTH-1:0] write_data, //data to be written
input logic[ADDR_WIDTH-1:0] addr, //address for write or read operation
output logic[DATA_WIDTH-1:0] read_data //read data from memory
);
// Two dimensional memory array
logic[DATA_WIDTH-1:0] mem[2**ADDR_WIDTH-1:0];
logic[ADDR_WIDTH-1:0] read_addr_t;
// Synchronous write
always_ff@(posedge clk) begin
if(wr_en) mem[addr] <= write_data;
read_addr_t = addr; // since address is registered for read, synthesizer will map memory model to internal embedded block RAM IP
end
// asynchronous read
assign read_data = mem[read_addr_t];
endmodule
example
Consider the following two SystemVerilog memory models:
module ram_A (
input logic clk,
input logic wr_en,
input logic [3:0] addr,
input logic [7:0] data_in,
output logic [7:0] data_out);
logic [7:0] mem [15:0];
// write
always_ff @(posedge clk) begin
if (wr_en) mem[addr] <= data_in;
end
assign data_out = mem[addr];
endmodule
module ram_B (
input logic clk,
input logic wr_en,
input logic [3:0] addr,
input logic [7:0] data_in,
output logic [7:0] data_out);
logic [7:0] mem [15:0];
always_ff @(posedge clk) begin
if (wr_en)
mem[addr] <= data_in;
else
data_out <= mem[addr];
end
endmodule
Assume you target an Intel/Altera FPGA with RAM inference set to “auto”. Which statement is most accurate about how the synthesizer will implement these two modules? A. Both ram_A and ram_B will infer Block RAMs, because both use a two-dimensional array. B. Both ram_A and ram_B will infer Distributed RAM (LUT-based), because neither model registers the address explicitly. C. ram_A will infer Distributed RAM, while ram_B will infer a single-port Block RAM with synchronous read. D. ram_A will infer a True Dual-Port Block RAM and ram_B will infer a Simple Dual-Port Block RAM, since both have one write and one read.
C.
distributed RAM is async read, sync write
block RAM is sync read, sync write