↖️ Blog Archive

perilus, Part Four

Bradley Gannon

2026-02-01

TL;DR: perilus can now run simple programs. I fixed the issue that was breaking our beq test—which turned out to be related to the memory module—and added further tests for several R-type and I-type instructions. That gave me enough functionality to write a short program that computes the nth Fibonacci number.


Memory Address Bug

Last time around, Brady and I tried to add a test for the beq instruction, but it kept failing and we couldn’t figure out why. I assumed it was due to some error in my transcription of the Harris design into Chisel, but the root cause was more subtle.

When I wrote the memory module, I made the element size equal to the word width (32 bits in our case). This decision makes it easy to store and load complete words without extra hardware to handle concatenating individual bytes. The problem was that I hadn’t accounted for the translation between address space (bytes) and memory space (words), which are related by a factor of four. We hadn’t hit this issue yet because all of our tests were single instructions at address 0x0. The beq test was the first one to actually run more than one instruction, so it uncovered the bug.

The fix is simple: within the memory module, shift the address bits to the right by two before indexing into the inner memory.1 This is easy in hardware and gets us the division by four that we need. It also forces alignment by always rounding down to the nearest word for unaligned addresses.2 This might bite us later, but it’s the simplest solution so we’ll roll with it for now.

Dynamically Writing Memory Images for Tests

I wrote in my previous post that I was interested in making a helper function to write the required memory and register initialization files in each test, rather than keeping them in the repo. Well, I did that. It was kind of annoying to deal with Scala’s numeric types—which apparently don’t include an unsigned 32-bit integer?—but in the end it worked fine. I’m happy to remove the asset files which had started accumulating up to that point because now all test logic can live in one place.

I also wrote some other helpers for testing R-type and I-type instructions. The helpers each take a few parameters and use them to assemble the bytes in the test instruction on the fly. Then they step through the instruction and verify the result by comparing it against the output of a given closure. I was maybe too proud of this little trick, which wasn’t really necessary but feels nice.

Viewing Tests in GTKWave

Screenshot of GTKWave with the Fibonacci program’s VCD file open

While debugging the beq test failure, I figured out how to get the ChiselSim test harness to write Value Change Dump (VCD) files for each test. These are helpful because I can open them in GTKWave, a GUI application that displays all the test signals over time. It’s much better than printf debugging, which I’d been doing until then, and quickly led me to the problem.

The ChiselSim explanation page mentions the existence of an emitVcd option for the tests, but my ignorance of the Scala ecosystem and sbt left me without any idea where to actually put the option. Eventually I figured out that the proper invocation is, for example:

sbt 'testOnly com.rinthyAi.perilus.test.main.PerilusTests -- -DemitVcd=1'

This runs the PerilusTests and writes the resulting VCDs to $PROJECT/build/chiselsim/**/workdir-verilator/trace.vcd

Verbatim Assembly Script

I also wrote a little script to assemble files without any kind of relocation or other fancy assembler features. All it does is a one-to-one translation from assembly mnemonics to machine code in ASCII hex. I knew I could do this with GNU tools and started messing with them, but then I found llvm-mc and just went with that. This script has already helped me a few times while writing tests. It’s no surprise that it’s faster and more reliable than hand assembly.


perilus has support for 12 of the 40 instructions in the RV32I set and is nearing parity with the Harris design. Next time, I’ll add a test for the jal instruction, which seems to be the only one that it’s missing compared to Harris. I’d also like to finish the R-type and I-type instructions and maybe get a few more B-types.


  1. More generally, the shift is log2Ceil(width.get / 8) for the class parameter width: Width.↩︎

  2. Brady pointed out that one of the safety requirements for Rust’s std::ptr::read is that the pointer must be aligned, which he said is due to this exact “forced alignment” behavior on some targets, particularly microcontrollers.↩︎