↖️ Blog Archive

perilus, Part Six

Bradley Gannon

2026-04-04

TL;DR: I added support for all remaining instructions in the RV32I set. It’s likely that there are bugs that my tests just haven’t found yet, but at least the tests I do have are passing. I also created a high-level system diagram. Along the way, I fixed an interesting bug in the system’s reset logic.


RV32I “Done”

I say “done” because I only have moderate confidence that the instructions are all correct. I have tests for each instruction, but it’s possible that the tests are wrong or don’t catch edge cases with certain inputs. I guess there’s no way to be absolutely certain that the processor is correct outside of formal verification, but eventually I plan to at least run it through the official unit test suite.

The remaining instructions were jalr, lui, auipc, and all the branches except beq. I was struggling to visualize the design because it was starting to diverge from the diagrams in Harris, and this was making it difficult to decide how to proceed. I spent a few hours creating a system diagram that shows all connections between top-level modules—basically a graphical description of main.scala. I bounced off a few tools before settling on Typst with the fletcher package. Aesthetic automatic routing seems to be intractable for this problem, so I laid everything out by hand while creating a few abstractions in Typst’s scripting environment.

Relatively complicated node-and-edge diagram showing the major modules in perilus and their connections to each other

Once the diagram was done, it was easy for me to see that I could add branch support by introducing a lessThan signal as an output from the ALU. This allows the control unit to decide whether or not to take a branch based on the ALU result and the branch flavor currently being executed. As for the upper immediates, I added U-type support to the extend unit and two new control unit states to write the immediates to rd. jalr required similar control FSM changes.

I had hoped to get to the UART and standalone simulation stuff I talked about in my last post, but in retrospect it was probably too ambitious to finish the instruction set implementation and make good progress on that in one week. Next time around I’ll be in a better position to get some I/O going.

The Interesting Bug

After adding implementations and tests for the upper immediate instructions, I ran the whole test suite to make sure nothing had broken. I’m glad I did that, because to my surprise a few tests had broken. The tests were xor, lb, and the Fibonacci test program. The changes I’d made didn’t have any obvious connection to the test failures, but HEAD was still passing, so my changes were definitely somehow causing the issue.

It took me a long time to figure this out, and I ran down several ideas that turned out to not be the root cause. I won’t describe them all in detail here, but I’ll give a few more strange bits of evidence. First of all, it was notable that all three of the failing tests somehow depended on the values stored in the x10 register immediately after reset. Changing the tests to use a register other than x10 avoided the failures. Also, x10 was always corrupted with the same value, 0xfffff9ef. I stared at quite a few VCD traces and even ran git diff against the before and after outputs from the Chisel-to-Verilog and Verilog-to-C++ conversions that support the tests. For a short time I even started to think I’d found a bug in the toolchain.

The root cause of the bug was that the deterministic random values that the simulator assigned to the system’s inputs pre-reset were set up just right to write a garbage value to x10 on the first clock pulse. Here’s a sample trace to show what I mean:

GTKWave trace of a failing xor instruction test. The randomized inputs are about to set the x10 register to a garbage value.
Here’s the beginning of the trace for the failing xor instruction test. The problem arises on the first clock cycle, which is marked with the red line. The randomized pre-reset state is such that the value of x10 is overwritten before the test can even start. Ignoring io_writeEnable3 during reset solves this problem.

The write port on the register file acts like a D-latch. There’s a 5-bit address bus, a 32-bit data input, and an enable line. When the enable line is high and the clock pulses, the register file reads the value on the input and writes it to the register at the provided address. The ChiselSim environment is configured to start the simulation for each test with a deterministic random set of values everywhere in the circuit, which ensures that the reset logic can handle the transition from any possible garbage state to a known initial configuration. The trouble is that in the figure above, the garbage state has three important values: the address bus set to 0xa (meaning x10), the mysterious value on the data bus, and writeEnable3 set high. When the clock pulses during reset, the garbage value gets written to x10, and the rest of the test is invalid.

A simple and effective solution is to ignore the enable line when reset is asserted. The memory module is also subject to this bug, so this all amounts to a two-line change. This bug was hiding in the system ever since I wrote the register file and memory implementations months ago. I’m pleased that my test suite alerted me to it, even though it took me a while to fix it. I haven’t convinced myself yet whether all of the other sequential elements in the system (i.e., small non-architectural registers) need to have this arrangement as well.