2024-10-15
TL;DR: I used the solenoid design from part one to build a player piano module for eleven keys, or one eighth of the full keyboard. I also redesigned the pulse width modulation (PWM) system around an Atmega328P, which gives good performance and takes pressure off the main controller. The system accepts MIDI input over USB. This medium-scale integration helped me learn more about the system before committing to the full keyboard. Code is available here.
A fundamental problem in this project is the high demand for fast GPIO pins to do PWM on the solenoid drivers. Most microcontrollers have only a few pins allocated specifically for hardware PWM output, and practically none have enough pins of any kind to control 88 outputs simultaneously. I struggled with this problem for a long time because I thought it might be possible to use multiplexing or some other trick to get away with just one controller. The latching register solution that I presented in my previous post does work, but eventually I realized that if I was going to bother to have dedicated ICs for buffering outputs, then I might as well use actual microcontrollers to do even more work in the same package. In other words, I was already implicitly accepting the electrical and physical complexity of external ICs, so why not get the most benefit from that concession?
Let me state clearly what we need. For each of the 88 keys, there has to be some way to set a seven bit value that encodes the duty cycle of a PWM output. We need seven bits because that’s the resolution of a MIDI velocity value. The minimum acceptable PWM frequency is 20 kHz—any lower, and the switching noise might audibly interfere with the music. That all means that whatever system we build to do this needs to be able to switch arbitrary outputs with a time resolution of . I call this figure a “velocity period”.
The Atmega328P is a common and widely available microcontroller. It’s in a bunch of Arduino designs and is inexpensive and reliable. It also comes in a 28-DIP, so it’s easy to solder. We’d ideally like to use 22 of those pins for PWM outputs—one quarter of the keyboard—but some of the 28 pins are used for power, ground, and the external clock crystal. Also, we need to talk to the chip somehow, and if we use SPI then that takes up another four pins. After all that, it makes more sense to use 11 outputs, splitting the keyboard evenly into eight sections.
That’s all moot if the chip isn’t fast enough. The Atmega328P
typically runs at 16 MHz—which requires the external crystal mentioned
above—so we’re looking at a maximum of 6 cycles per velocity period as
computed above. You might be aware that digitalWrite
,
the pin-setting function in the Arduino library, is kind of slow (~5
μs). Fortunately, we can get much greater speeds by writing AVR assembly
by hand.
Here’s the core PWM loop from the program I wrote:
ld r16, X+
out portc, r16
ld r16, Y+ out portd, r16
These four instructions take six cycles. “Load Indirect and
Post-Increment” takes two, and “Out to I/O Location” takes one. (See
Table 4-4 in the instruction
set manual.) We have to do both of those operations for
portc
and portd
, which are collections of pins
on the chip, giving a total of six cycles for fourteen outputs1. That’s just fast enough to
satisfy our requirements.
Here’s a pseudocode translation of the assembly. Conveniently, each line in this pseudocode corresponds to one cycle in the real code.
load a byte from memory at the pointer X into the register r16
increment the X pointer
copy the byte in register r16 to portc
load a byte from memory at the pointer Y into the register r16
increment the Y pointer copy the byte in register r16 to portd
To set up for this loop to work, X
and Y
,
which are special memory offset registers in the AVR architecture, must
point to the first elements of 128-byte arrays. The contents of the
arrays encode when each bit should change state during the PWM cycle.2 Ultimately, the contents of the
arrays are set by the main controller via SPI.
I originally wrote this loop body with a counter and branch to handle the jump back to the top, but we have zero extra time in this loop, so we can’t spend any on loop maintenance. For that reason, I wrote a simple loop unrolling script that copies this body 128 times. This isn’t a big deal because we have much more flash space on the chip than we’ll ever need. At the beginning and end of the 128 iterations, there’s still some overhead to handle receiving new data via SPI and resetting the pointers, but that’s acceptable because it amounts to a few microseconds during every PWM cycle, instead of a few microseconds during every velocity period.
The changes I described in the previous section significantly reduced the demand on the main controller, a Teensy 4.1. Under this new system, its job is to handle a MIDI connection with the USB host (e.g., a laptop) on one side and the eight SPI connections to the subcontrollers on the other side. When a new MIDI message comes in, the main controller needs to convert it into appropriate signals to be transmitted to the affected subcontrollers. I’ll describe my approach to each of these steps in turn.
I’ve been writing the main controller code in Rust. So far, this has
made development a lot less painful because of the strong type system,
reliable development tools, and a solid package library. I got
especially lucky when I found usbd-midi
, an
implementation of the USB MIDI standard for embedded devices. While it’s
not strictly complete, the implementation is mature enough for my use
case and hasn’t given me any trouble. My usage of the library amounts to
a polling loop and a match
statement over
NoteOn
and NoteOff
events. When one of those
messages comes in, the code branches to a handler that updates the
internal state and queues an update to be sent after all messages are
processed.
I’ll also mention here that I added basic logging over serial, which
was annoying to set up but proved really useful for debugging. The
reason it was annoying had nothing to do with the library or anything
like that. The problem was that even though I was presenting the correct
USB descriptors to the host, the Linux kernel wasn’t assigning the
correct driver and giving me a block device. I eventually
figured out that the root cause was my choice of vendor ID and product
ID. I had chosen 0x1234
and 0x5678
,
respectively, for testing because I didn’t think they mattered. Turns
out they do, and the kernel was branching its driver assignment logic
based on them. When I changed the VID and PID to 0xffff
and
0x0001
, the problem went away and everything worked how I
expected.
When a NoteOn
message comes in, it has a velocity
associated with it. It’s not trivial to translate that into a PWM signal
to be sent to the corresponding solenoid for a few reasons. One is that
the key only needs to be pressed with the given velocity for a short
time—that is, until it strikes the string—after which we should apply a
lower holding force to conserve power and limit coil heating.3 Another reason is that when we want
to press the same key quickly, we have to release it, allow it to begin
returning to the up position, and then strike it again. These two
requirements add necessary complexity to the main controller’s
translation logic. I’ve tried to capture that complexity in a finite
state machine based on a simplified motion model for the key mechanism.
I doubt this model is optimal, and I expect to improve on it in the
future, but it’s good enough for now. The correctness of this model is
especially important because there is no feedback from the key regarding
its position.4
There are five key states: OFF
, PRESSING
,
HOLDING
, RELEASING
, and
REPEATING
. We can receive a NoteOn
or
NoteOff
message in each of these states, so both of those
transitions are defined for all states. Some states can also time out,
which means that they automatically transition to another state after a
fixed period unless another transition happens first.
I’ll describe the “happy path”, or the cycle that most keys go
through. Keys begin in the OFF
state and stay there until a
NoteOn
message appears, after which they transition to the
PRESSING
state. In this state, the key is accelerating
towards the bottom position with a final speed defined by the MIDI
velocity in the NoteOn
message. The timeout for this state
is short (80–100 ms), after which the key transitions to the
HOLDING
state with a fixed lower PWM duty cycle. The key is
now playing. Later, a NoteOff
message appears for the key,
so it transitions to the RELEASING
state, which is a short
duration when the PWM duty cycle is zero. This allows the key to almost
return to the up position. Finally, the key returns to the
OFF
state and settles in the up position.
The case of rapidly repeating notes on the same key takes a different
path. I realized that it’s possible for the MIDI host to send multiple
NoteOn
messages for the same key without a
NoteOff
message in between. Also, the controller could send
NoteOn, NoteOff, NoteOn
in quick succession, not leaving
enough time for the key to recover. The MIDI standard seems to leave
this kind of thing as an implementation detail for the synthesizer,
which makes sense because the host can’t and shouldn’t know about the
limitations of the synthesizer in general. For these edge cases, we can
only do our best, which means reacting as quickly as we can and doing
whatever makes sense given the state we’re currently in. Suppose we
receive a NoteOn
message while we’re already in the
PRESSING
state. We can’t press any faster than we already
are, so we might as well keep pressing. That translates to a
self-transition, as shown. Similarly, if we receive a
NoteOn
message in the RELEASING
state, then we
need to keep on releasing until the key is ready and then immediately
press it again. That’s exactly what the REPEATING
state is
for.
As I admitted above, I don’t think this is optimal, but it’s not a bad first pass. In testing at high speeds, this approach can only make about six notes per second, or eighth note triplets at 120 bpm. It seems like the human record is 13, which I suppose is near the limit of what the typical key mechanism allows. I think I can probably improve this system and get closer to ten notes per second, which should be good enough for any reasonable music. I’ve seen other projects that take much greater care in matching the PWM states with the natural resonance of the key mechanism, which I think is a good idea that I’ll steal later.
At the end of the USB polling loop, after the controller has processed all of the new MIDI messages from the host, the time has come to serialize the new PWM values and send the resulting bytes to the appropriate controllers. This amounts to a lot of careful indexing and bit slicing, followed by iterating through the subcontrollers and sending their new bytes via SPI. A good way to think about the PWM data bytes is to view them as a table. Time extends along the rows, with one row per velocity period. There are 128 velocity periods in total for each port, so the total number of bytes in the table is 256. The columns are the bits in each byte, and they encode the output state of the corresponding pin / key. The bits in any column will always monotonically increase by design—that is, the column will either contain all ones, all zeros, or a group of zeros followed by a group of ones.
To save time on the SPI bus, I implemented some branching logic that skips subcontrollers that haven’t changed since their last update. On the receiving end, the subcontrollers read each byte via the SPI peripheral and store it at an incrementing pointer, replacing the previous PWM data with the new version.
It’s possible for the main controller and subcontrollers to get out of sync in two ways. First, a bit error in the PWM data would result in incorrect key velocity. Second, an entire byte may be dropped due to missing a clock transition, in which case the data table pointers would become permanently out of sync. I haven’t noticed any problems like this so far, but I might add some kind of sentinel value that encodes the boundary between two data tables. This would give the subcontroller a way to stay in sync with the main controller even when transmission errors occur.
Before settling on the above approach, I also spent a few weeks working on a solution using an FPGA. I learned a lot about this unfamiliar area of digital electronics and wrote enough Verilog to get a working solution in simulation and on the bench. However, during testing I accidentally connected 24 V to an I/O pin, which fried the chip on my dev board. That was a frustrating mistake, but it prompted me to evaluate my chosen solution and realize that it was too complicated. I liked the idea of an FPGA due to its flexibility and high speed, but its nontrivial cost and higher manufacturing complexity made it difficult to work with. Also, I was planning to use every pin on my chosen chip with no room to spare, so the design was pretty constrained in that dimension. My mistake with the dev board was somewhat costly and certainly annoying, but I think it resulted in a better outcome for this project in the long term.
At the end of the previous post in this series, I had a reasonable solenoid design for this application. I’ve tweaked that design slightly by reducing the number of turns to 250 and being much more careful about winding them tightly. This improves efficiency because the density of turns near the plunger is higher. It requires more care, of course, but it’s worth it.
As I began building the solenoids, I found that it was difficult to meet the tolerance I’d set between the outside of the solenoid windings and the inside walls of the steel supports. I was also dreading having to cut and drill so much steel. For those reasons, I didn’t include the steel in this iteration of the multi-key module. This reduced efficiency but also reduced assembly complexity and overall cost. Plus, it revealed the red magnet wire, which I think looks nice against the black PLA support structure.
The net result of these changes seems to be that the holding power is about the same or slightly lower. It’s not that important to me to reduce the power any further right now, although less is always better in this case. In the future, I might try adding steel bars across the fronts and backs of both rows of solenoids, which would incur minimal additional complexity while potentially giving even better efficiency. I’m not sure whether this would actually work because I don’t know enough about how magnetic field lines are affected by ferromagnetic materials, so I guess it’ll probably be easiest to just run the experiment. I think it’s possible that there could be unwanted inter-solenoid effects when using bars that run across multiple keys, but I just don’t know yet.
My previous solenoid tests suffered from audible clicking due to the mechanical interfaces between the key surface, the plunger extension, and the plunger. I eliminated the clicking in the key/extension interface by sticking on some fuzzy circles meant for chair and table legs. For the extension/plunger interface, I cut a strip of printer paper, taped it onto the extension, rolled it up, and then dropped the plunger in with a bit of craft glue. The paper roll helps to keep the plunger and extension coaxial while the glue dries, after which the roll can be removed. This interface is weak to shear stress, but it works well enough for the low tensile stresses it sees under normal use.
Also, I sanded down the inside surfaces of the printed solenoid supports, which addressed an issue where the layer lines were causing the plunger to get stuck. I also explored printing the supports in a different orientation, but that introduced new problems that I wasn’t willing to try to overcome. It might be good to switch to teflon tubing or some other material that’s smooth and slides well against steel, but this is probably good enough for now or maybe forever.
I’d assumed for most of this project that one way or another I would eventually lay out a PCB and get it fabricated. This became a source of stress as the weeks fell away for all the reasons I’ve discussed previously. My janky at-home process would not be sufficient for any of the approaches I’ve considered, so I would be forced to order boards and hope they were right. Then I remembered that Perma-Proto boards exist. At $7, they’re much less costly than custom boards of similar size5 and since they’re meant to be permanent breadboards, I can transfer my design from a normal full-size breadboard to one of these without much additional effort. For the moment, my one complete circuit is on an actual breadboard, but as I scale up further I intend to build the whole project out of these Perma-Protos.
I switched from FreeCAD to CadQuery for all of my CAD models. I was tired of fighting with FreeCAD and dealing with broken model files. It seems like plenty of people are doing cool stuff with it, so maybe I’m just using it wrong, but right now it’s not for me. I much prefer code-first CAD. CadQuery has its own problems—or, again, I have my own problems with it—but it suits me better. I never worry about broken models, and it doesn’t assume that the universe ends beyond its borders. And I get to use Python.
After transitioning my few existing models to CadQuery, I went on to
design the main platform for the eleven-key module. This turned out to
be pretty easy once I got my code organized. I realized I could put all
of the dimesions for different relevant items in data classes, compute
additional dimensions based on those independent values, and then pull
them all in wherever I needed them. Combined with a
Makefile
, this made quick tweaks to dimensions much safer
and quicker, since they’d propagate throughout all affected models.
I’ve noticed that some DIY player piano designs suffer from flexure. When the solenoids are firing quickly, “over-the-keyboard” designs that span the entire piano tend to flex at least somewhat, even when supported with aluminum extrusion. I don’t want that to happen to my design, so I decided to build my platforms with plenty of support along the way across the keyboard, both in front of the keys and behind. This is possible on my piano because the wall behind the keys is angled upward, and there’s a ~12 mm ledge in front of the keys as well. I designed the platform to sit on silicone feet on the near side and just rest against the angled wall on the far side. This results in a surprisingly solid foundation that doesn’t seem to slide at all. I haven’t fully tested this under the dynamic loading situations I’ve described, but anecdotally it seems quite stable. In the future, I plan to somehow connect adjacent platforms to further improve their stability.
My first-pass design for the key platform was just a plate supported on opposite ends, but the weight of the solenoids alone was enough to make the plate flex—this time in the direction perpendicular to the keyboard. This wasn’t too surprising for 2 mm of PLA, but it would only get worse under dynamic load. To address this, I added ribs to the plate profile in positions that don’t interfere with the motion of the plunger extensions. The ribs extend down from the underside of the plate and significantly increase the second moment of inertia, which increases its rigidity under this kind of load. I haven’t measured the increase in strength, but it’s obvious from holding the two models that the new version is much stronger. I’m pleased with this small change because it seems to show that PLA is a suitable material for this purpose, with all of its benefits intact, provided that it’s applied efficiently in a design.
In its current form, the key platform model will not cover the keyboard evenly. When I started designing the platform, I thought I would make seven identical platforms, all starting at C in their respective octaves and covering twelve keys. (That’s the model you see in this post and in the repo at the moment.) This would leave three notes at the bottom and one note at the top of the keyboard to be handled with one-off platform models. I’ve decided to ditch that approach and have every module cover exactly eleven keys, which means there will be eight platforms with no keys left over.
The cost of this change is that every platform needs to be different, since the platform boundaries no longer align with the octave boundaries. In fact, the new platform boundaries will often land on a black key, which means that some platforms will need to expand or contract in order to cover their keys but not those of their neighbors. I’m not certain that this will be easy to handle, but I’m convinced enough for now that I’m willing to try it. The main benefit would be that all platforms would be electrically identical (except for their SPI CS connections) and independent, while the main cost would be that they wouldn’t be mechanically identical. Fortunately, the cost of that is low because my 3D printer doesn’t care.
Port C only has seven pins, and one of them is reserved for the reset signal, leaving six available for GPIO. Port D has eight pins, and all of them are available for GPIO, giving a total of 14. Luckily, port B has both the crystal inputs and the SPI pins all together, so we don’t have to spread across three ports, which would be too slow. See tables 13-3, 13-6, and 13-9 in the chip datasheet.↩︎
Note that this means that it should be possible to change the pin value multiple times in a cycle. I haven’t tried this.↩︎
This is why it’s important to have an ultrasonic PWM frequency. Holding the key with a low force means that the PWM duty cycle is between off and full on, exclusive, so there’s going to be a switching waveform. If the frequency is audible, then the solenoid will buzz, so your only choice is to hold with full force. I’m proud of my PWM solution because it combines the best of both existing solutions. It’s relatively easy to implement due to its use of widely available through hole components, and it retains the efficiency of using a lower holding force.↩︎
I expect that in the future I may have to have motion model parameters for each key, or at least subsets of them. On my piano—and I imagine most others?—keys in the top third or so are easier to press and generally have a different action than the others.↩︎
Gotta love economies of scale. I think DKRed quoted me about $14 per board for an early iteration of the eleven-key driver circuit that was decently optimized for space. Half the price and maybe one tenth the risk is hard for me to ignore. I guess I could have used one of the overseas companies for much less, but I always get a weird feeling from them that their prices are low because their labor standards are, too. But that’s just speculation.↩︎