This repository contains the RTL implementation and verification environment for an optimized FIR filter architecture, targeting low-power and efficient VLSI designs. The project focuses on optimizing a classic FIR filter using techniques such as adder-tree based accumulation, pipelining, and area-aware design decisions.
For efficient resource utilization, the FIR implements an adder tree. This adder tree aims to improve latency by reducing the critical path (data arrival time and data required time). An example of the architecture is shown in the following figure.
- Balanced Adder Tree: Reduces critical path by summing partial products in a log-depth tree.
- Inferred Clock Gating: Uses clock-enable signals to reduce switching activity and dynamic power (synthesis friendly).
- Optimized Latency: 10-cycle latency (down from 16) with full pipelining.
- Verilog RTL: Compatible with Icarus Verilog, Verilator, and standard synthesis flows.
sudo apt update
sudo apt install -y iverilog gtkwave makeEnsure you have Poetry installed. Then, run the following in the project root:
# Install dependencies into a virtual environment
poetry install
# Activate the virtual environment
poetry shellThe FIR coefficients are generated offline using Python.
To regenerate coefficients:
cd src
python3 filter_design.pyNotes:
- The script designs a band-pass FIR filter using fixed-point (Q15)
- The number of taps and cutoff frequencies are configurable (default: 5–50 Hz)
- If parameters are unchanged, regeneration is optional
The Verilog testbench provides cycle-accurate simulation and waveform visualization. The testbench is located at tb/verilog/tb_fir.v.
# Run simulation and generate waveforms
make test-verilog
# View waveforms in GTKWave (auto-loads saved signal configuration)
make view-verilogWorkflow:
make test-verilogcompiles the RTL and testbench with Icarus Verilog and runs the simulation- The simulation generates
wave.vcdwaveform file make view-verilogopens GTKWave with pre-configured signals (wave.gtkw)- GTKWave automatically displays key signals including
m_axis_fir_tdataands_axis_fir_tdata
Verification Scope:
- Clock and reset behavior
- FIR coefficient initialization
- Input/output data alignment across pipeline stages
- Critical path timing and latency
- Visual inspection of signal waveforms (e.g., sine wave output)
Manual Commands (Alternative):
# Compile and simulate
iverilog -g2005-sv -o sim_rtl.out src/fir.v tb/verilog/tb_fir.v
vvp sim_rtl.out
# View waveforms
gtkwave wave.gtkwThe cocotb testbench automates functional verification against a Python reference model. The testbench is located at tb/cocotb/test_fir.py and tests the design with various signal types.
# Run functional tests with Verilator
make test-cocotb
# View generated FST waveforms (with auto-loaded signals via TCL script)
make view-cocotbWorkflow:
make test-cocotbruns the Verilator simulation with cocotb Python tests- Drives Q15 fixed-point test vectors to the FIR module
- Captures RTL outputs and compares against Python golden reference
- Generates
dump.fstwaveform, HTML reports, plots, and detailed logs make view-cocotbopens GTKWave with signals auto-loaded viatcl_scripts/signals.tcl
Verification Scope:
- Numerical accuracy verification against reference model
- Filter behavior validation across multiple test datasets
- Automated regression testing
- Performance and correctness metrics
Provided datasets:
- Real neural LFP recordings:
test_files/lfp - Synthetic signals and noise:
test_files/synthetic_noise
Input files can be switched in testbench configuration.
The design is fully compatible with the OpenLane ASIC flow, targeting the SkyWater 130nm PDK.
To run the complete RTL-to-GDSII flow using LibreLane (OpenLane wrapper):
# Run full ASIC flow
make librelaneTo visualize the final layout in OpenROAD:
# Open the synthesized design in OpenROAD GUI
make view-openroad- Technology: SkyWater 130nm (
sky130_fd_sc_hd) - Area: ~1.13 mm² (107k cells)
- Performance: >50MHz
- Utilization: 40%
The FIR architecture is optimized for:
- Reduced critical path using balanced adder trees
- Lower switching activity through pipelining and gating
- Area-efficient implementation suitable for neuromorphic signal processing
Both Verilog and cocotb testbenches are supported for flexible verification.

