perilus, Part Six2026-04-04
TL;DR: I added support for all remaining instructions in the RV32I set. It’s likely that there are bugs that my tests just haven’t found yet, but at least the tests I do have are passing. I also created a high-level system diagram. Along the way, I fixed an interesting bug in the system’s reset logic.
I say “done” because I only have moderate confidence that the instructions are all correct. I have tests for each instruction, but it’s possible that the tests are wrong or don’t catch edge cases with certain inputs. I guess there’s no way to be absolutely certain that the processor is correct outside of formal verification, but eventually I plan to at least run it through the official unit test suite.
The remaining instructions were jalr, lui,
auipc, and all the branches except beq. I was
struggling to visualize the design because it was starting to diverge
from the diagrams in Harris, and this was making it difficult to decide
how to proceed. I spent a few hours creating a system diagram that shows
all connections between top-level modules—basically a graphical
description of main.scala. I bounced off a few tools before
settling on Typst with the
fletcher
package. Aesthetic automatic routing seems to be intractable for this
problem, so I laid everything out by hand while creating a few
abstractions in Typst’s scripting environment.

Once the diagram was done, it was easy for me to see that I could add
branch support by introducing a lessThan signal as an
output from the ALU. This allows the control unit to decide whether or
not to take a branch based on the ALU result and the branch flavor
currently being executed. As for the upper immediates, I added U-type
support to the extend unit and two new control unit states to write the
immediates to rd. jalr required similar
control FSM changes.
I had hoped to get to the UART and standalone simulation stuff I talked about in my last post, but in retrospect it was probably too ambitious to finish the instruction set implementation and make good progress on that in one week. Next time around I’ll be in a better position to get some I/O going.
After adding implementations and tests for the upper immediate
instructions, I ran the whole test suite to make sure nothing had
broken. I’m glad I did that, because to my surprise a few tests
had broken. The tests were xor, lb,
and the Fibonacci test program. The changes I’d made didn’t have any
obvious connection to the test failures, but HEAD was still
passing, so my changes were definitely somehow causing the issue.
It took me a long time to figure this out, and I ran down several
ideas that turned out to not be the root cause. I won’t describe them
all in detail here, but I’ll give a few more strange bits of evidence.
First of all, it was notable that all three of the failing tests somehow
depended on the values stored in the x10 register
immediately after reset. Changing the tests to use a register other than
x10 avoided the failures. Also, x10 was always
corrupted with the same value, 0xfffff9ef. I stared at
quite a few VCD traces and even ran git diff against the
before and after outputs from the Chisel-to-Verilog and Verilog-to-C++
conversions that support the tests. For a short time I even started to
think I’d found a bug in the toolchain.
The root cause of the bug was that the deterministic random values
that the simulator assigned to the system’s inputs pre-reset were set up
just right to write a garbage value to x10 on the
first clock pulse. Here’s a sample trace to show what I mean:

xor
instruction test. The problem arises on the first clock cycle, which is
marked with the red line. The randomized pre-reset state is such that
the value of x10 is overwritten before the test can even
start. Ignoring io_writeEnable3 during reset solves this
problem.
The write port on the register file acts like a D-latch. There’s a
5-bit address bus, a 32-bit data input, and an enable line. When the
enable line is high and the clock pulses, the register file reads the
value on the input and writes it to the register at the provided
address. The ChiselSim environment is configured to start the simulation
for each test with a deterministic random set of values everywhere in
the circuit, which ensures that the reset logic can handle the
transition from any possible garbage state to a known initial
configuration. The trouble is that in the figure above, the garbage
state has three important values: the address bus set to
0xa (meaning x10), the mysterious value on the
data bus, and writeEnable3 set high. When the clock pulses
during reset, the garbage value gets written to x10, and
the rest of the test is invalid.
A simple and effective solution is to ignore the enable line when reset is asserted. The memory module is also subject to this bug, so this all amounts to a two-line change. This bug was hiding in the system ever since I wrote the register file and memory implementations months ago. I’m pleased that my test suite alerted me to it, even though it took me a while to fix it. I haven’t convinced myself yet whether all of the other sequential elements in the system (i.e., small non-architectural registers) need to have this arrangement as well.