perilus, Part Three2026-01-03
TL;DR: perilus can now execute the
lw and sw instructions. To get to this point,
I needed to add a lot of new functionality, most notably the control
unit FSM and the ability to set and inspect memory and register contents
in simulation. Now we can start adding support for more instructions in
the RV32I set.
My first task this week was to finish the first-pass implementation of the multicycle processor in Harris. The only major remaining element was the control unit, which I’d stubbed out but still needed to fill in. The control unit is the part of the processor that tells all the other parts what to do and when to do it, and it accomplishes this by encoding the necessary operations as a finite state machine (FSM). Chisel makes FSMs pretty easy to implement, which I suppose is intentional because they show up all over the hardware design world.
The Harris processor doesn’t support the entire RV32I instruction
set, but the subset that it does support is a good starting
point for adding the rest. (The RV32I set itself is also clearly
designed in a way that eases hardware implementation.) For now, I’ve
built the control unit to exactly match the details of the FSM given in
the book, but later on when we can show that perilus
matches that design we’ll depart from the book and add support for the
rest of the set.
At the moment, the FSM begins in the fetch state, which
reads the address pointed to by the program counter and moves it into
the instruction register. It also adds four to the program counter to
set up for the next instruction, assuming it doesn’t get overriden by
the current one. The fetch state unconditionally
transitions to the decode state, which reads the
opcode field to determine the next state. I won’t describe
the remaining states for brevity, but essentially they form mostly
unconditional chains of operations to execute the various supported
instructions. I suspect that adding support for the remaining
instructions will mostly be a matter of adding more conditions to each
of these existing chains.
I should mention that I replaced all instances of unlabeled
bitstrings in the design with ChiselEnums. This has made
the code much clearer because the enum variants carry a lot more
semantic information locally than just raw binary numbers, which I have
to look up in a table to decode. This is a good example of how Chisel
improves on SystemVerilog by applying concepts from typical programming
languages.
Testing the control unit (and later the entire processor) proved
somewhat challenging at first because I wasn’t sure how to examine the
internal FSM state using the ChiselSim API. A solution to the
problem—although probably not the only one—is to create an extra Chisel
IO port on the module under test. The trick, though, is to
wrap the port in an Option and only return the
Some variant when a boolean parameter
withDebug is true. The default value of
withDebug in the module constructor is false,
so normally the debug port is None and does
not get translated into SystemVerilog. However, during testing we can
explicitly pass withDebug = true and then run
foo.io.debug.get to retrieve the debug port. We can put
anything we want in the debug port as long as we write some extra lines
in the module to keep the debug port updated.
This is admittedly a little cumbersome and seems to be a direct
result of Chisel’s design decision to only allow module interactions
through explicit IO ports. To be fair, this is how actual hardware
really works, at least for integrated circuits. Brady pointed out that
it may be possible to annotate an IO port to exclude it from synthesis
but keep it available for simulation, similar to how serde
allows fields to be skipped. I think this would basically
be a first-party implementation of the same feature as the one described
above, which would be really nice to have.
When the time came to write the first processor test, I had to nest
the debug port definitions by setting one up in the Perilus
class that “re-exports”
the debug ports from the memory and register file modules. This took
some minor acrobatics with the Options to make sure all
ports were properly driven in all possible branches.
Notably, the debug ports don’t given unrestriced access to the
module’s internals. It’s still a Chisel IO port, so it has
to play by the same rules. For this purpose, that means that accessing,
say, a particular memory location involves poking a debug address
register with the right value and then peeking a different debug
register to get the value. The debug ports only support reading data at
the moment, but I don’t think there’s any reason why they
couldn’t support writing as well. There just hasn’t been any reason for
it so far.
The processor won’t do much without a program, which means we need a
way to set the initial state of the machine. In a real system, we can
assume that the program counter is reset to a fixed value (the reset
vector) and the memory contains at least a minimal program that sets up
everything else (the bootloader). These dependencies lie beyond the
scope of the processor itself and may be satisfied by, for example,
flashing the memory using an external circuit. In the virtual
environment of ChiselSim, we still must satisfy these dependencies, and
perilus accomplishes this by reading memory and register
file images from the filesystem. Chisel provides an experimental
feature for this purpose, which it translates into corresponding
SystemVerilog code.
I found this to be a little annoying for two reasons. One is that it
would be nice to instead provide, for example, an
ArrayBuffer of the proper type and have Chisel handle the
details of getting them into SystemVerilog. I intend to explore creating
a helper function for this in the future so we don’t have to keep an
assets/ directory in the repo with files for every test.
The other reason is that there isn’t any warning output when the
specified path doesn’t exist. This again can be handled in user code,
but it would be nice if it existed by default.
It was interesting to find that Chisel (or more probably the underlying simulator backend) initializes the memory with deterministic random values if no other initializer is provided. This is a good representation of actual hardware in that it discourages the assumption of a particular value on reset.
With all these prerequisites in place, the first actual instruction
test was easy enough. The reset vector is 0x0, so I
manually assembled a lw instruction and placed it in a new
memory file at that location, with the rest being random values. I also
created a register file containing all zeros except for the register
containing the base pointer for the load (that is, the register
specified by the instruction’s rs1 field). The test
computes the memory location to be loaded and retrieves that value.
Then, it steps the clock several times to fully execute the instruction.
Finally, it verifies that the expected value has been copied from memory
into the specified rd register. If all the
expects pass, then we can be confident that we’ve executed
the load. The test technically doesn’t check all possible side effects,
and maybe it should, but it at least checks the source and destination
registers and the specified memory location.
Yesterday afternoon, Brady and I had a lovely time adding a test for
the sw instruction as well. We mostly got this instruction
for free in the Harris design. We also tried to add beq,
but we ran into some issue related to management of the program counter.
I probably wrote a bug somewhere, so we’ll just need to track it down
before continuing.
Tests for the remaining RV32I instructions are likely to take the same form as the ones we’ve written so far. In fact, there will probably be a lot of opportunities for factoring out common test logic into helper functions. I’d like to end up with individual tests for each instruction, which will be tedious but probably quite helpful as we make improvements over time. As we go along I’m sure we’ll also add tests for simple programs too. In the longer term I’m interested in decoupling the system from ChiselSim for more complicated tests, with the eventual goal of adding memory-mapped IO peripherals and getting closer to something like a real computer.