perilus, Part Four2026-02-01
TL;DR: perilus can now run simple
programs. I fixed the issue that was breaking our beq
test—which turned out to be related to the memory module—and added
further tests for several R-type and I-type instructions. That gave me
enough functionality to write a
short program that computes the nth Fibonacci
number.
Last time around, Brady and I tried to add a test for the
beq instruction, but it kept failing and we couldn’t figure
out why. I assumed it was due to some error in my transcription of the
Harris design into Chisel, but the root cause was more subtle.
When I wrote the memory module, I made the element size equal to the
word width (32 bits in our case). This decision makes it easy to store
and load complete words without extra hardware to handle concatenating
individual bytes. The problem was that I hadn’t accounted for the
translation between address space (bytes) and memory space (words),
which are related by a factor of four. We hadn’t hit this issue yet
because all of our tests were single instructions at address
0x0. The beq test was the first one to
actually run more than one instruction, so it uncovered the bug.
The fix is simple: within the memory module, shift the address bits to the right by two before indexing into the inner memory.1 This is easy in hardware and gets us the division by four that we need. It also forces alignment by always rounding down to the nearest word for unaligned addresses.2 This might bite us later, but it’s the simplest solution so we’ll roll with it for now.
I wrote in my previous post that I was interested in making a helper function to write the required memory and register initialization files in each test, rather than keeping them in the repo. Well, I did that. It was kind of annoying to deal with Scala’s numeric types—which apparently don’t include an unsigned 32-bit integer?—but in the end it worked fine. I’m happy to remove the asset files which had started accumulating up to that point because now all test logic can live in one place.
I also wrote some other helpers for testing R-type and I-type instructions. The helpers each take a few parameters and use them to assemble the bytes in the test instruction on the fly. Then they step through the instruction and verify the result by comparing it against the output of a given closure. I was maybe too proud of this little trick, which wasn’t really necessary but feels nice.

While debugging the beq test failure, I figured out how
to get the ChiselSim test harness to write Value Change
Dump (VCD) files for each test. These are helpful because I can open
them in GTKWave, a GUI
application that displays all the test signals over time. It’s much
better than printf debugging, which I’d been doing until
then, and quickly led me to the problem.
The ChiselSim
explanation page mentions the existence of an emitVcd
option for the tests, but my ignorance of the Scala ecosystem and
sbt left me without any idea where to actually put the
option. Eventually I figured out that the proper invocation is, for
example:
sbt 'testOnly com.rinthyAi.perilus.test.main.PerilusTests -- -DemitVcd=1'This runs the PerilusTests and writes the resulting VCDs
to
$PROJECT/build/chiselsim/**/workdir-verilator/trace.vcd
I also wrote a
little script to assemble files without any kind of relocation or
other fancy assembler features. All it does is a one-to-one translation
from assembly mnemonics to machine code in ASCII hex. I knew I could do
this with GNU tools and started messing with them, but then I found llvm-mc
and just went with that. This script has already helped me a few times
while writing tests. It’s no surprise that it’s faster and more reliable
than hand assembly.
perilus has support for 12 of the 40 instructions in the
RV32I set and is nearing parity with the Harris design. Next time, I’ll
add a test for the jal instruction, which seems to be the
only one that it’s missing compared to Harris. I’d also like to finish
the R-type and I-type instructions and maybe get a few more B-types.
More generally, the shift is
log2Ceil(width.get / 8) for the class parameter
width: Width.↩︎
Brady pointed out that one of the safety requirements
for Rust’s std::ptr::read
is that the pointer must be aligned, which he said is due to this exact
“forced alignment” behavior on some targets, particularly
microcontrollers.↩︎