↖️ Blog Archive

Self-Playing Piano, Part Two

Bradley Gannon

2024-10-15

TL;DR: I used the solenoid design from part one to build a player piano module for eleven keys, or one eighth of the full keyboard. I also redesigned the pulse width modulation (PWM) system around an Atmega328P, which gives good performance and takes pressure off the main controller. The system accepts MIDI input over USB. This medium-scale integration helped me learn more about the system before committing to the full keyboard. Code is available here.


Improving the PWM Driver

Switching to an Atmega328P

A fundamental problem in this project is the high demand for fast GPIO pins to do PWM on the solenoid drivers. Most microcontrollers have only a few pins allocated specifically for hardware PWM output, and practically none have enough pins of any kind to control 88 outputs simultaneously. I struggled with this problem for a long time because I thought it might be possible to use multiplexing or some other trick to get away with just one controller. The latching register solution that I presented in my previous post does work, but eventually I realized that if I was going to bother to have dedicated ICs for buffering outputs, then I might as well use actual microcontrollers to do even more work in the same package. In other words, I was already implicitly accepting the electrical and physical complexity of external ICs, so why not get the most benefit from that concession?

Let me state clearly what we need. For each of the 88 keys, there has to be some way to set a seven bit value that encodes the duty cycle of a PWM output. We need seven bits because that’s the resolution of a MIDI velocity value. The minimum acceptable PWM frequency is 20 kHz—any lower, and the switching noise might audibly interfere with the music. That all means that whatever system we build to do this needs to be able to switch arbitrary outputs with a time resolution of (20kHz×27)10.39μs(20 \text{kHz} \times 2^7)^{-1} \approx 0.39 \text{μs}. I call this figure a “velocity period”.

The Atmega328P is a common and widely available microcontroller. It’s in a bunch of Arduino designs and is inexpensive and reliable. It also comes in a 28-DIP, so it’s easy to solder. We’d ideally like to use 22 of those pins for PWM outputs—one quarter of the keyboard—but some of the 28 pins are used for power, ground, and the external clock crystal. Also, we need to talk to the chip somehow, and if we use SPI then that takes up another four pins. After all that, it makes more sense to use 11 outputs, splitting the keyboard evenly into eight sections.

That’s all moot if the chip isn’t fast enough. The Atmega328P typically runs at 16 MHz—which requires the external crystal mentioned above—so we’re looking at a maximum of 6 cycles per velocity period as computed above. You might be aware that digitalWrite, the pin-setting function in the Arduino library, is kind of slow (~5 μs). Fortunately, we can get much greater speeds by writing AVR assembly by hand.

Here’s the core PWM loop from the program I wrote:

    ld r16, X+
    out portc, r16
    ld r16, Y+
    out portd, r16

These four instructions take six cycles. “Load Indirect and Post-Increment” takes two, and “Out to I/O Location” takes one. (See Table 4-4 in the instruction set manual.) We have to do both of those operations for portc and portd, which are collections of pins on the chip, giving a total of six cycles for fourteen outputs1. That’s just fast enough to satisfy our requirements.

Here’s a pseudocode translation of the assembly. Conveniently, each line in this pseudocode corresponds to one cycle in the real code.

    load a byte from memory at the pointer X into the register r16
    increment the X pointer
    copy the byte in register r16 to portc
    load a byte from memory at the pointer Y into the register r16
    increment the Y pointer
    copy the byte in register r16 to portd

To set up for this loop to work, X and Y, which are special memory offset registers in the AVR architecture, must point to the first elements of 128-byte arrays. The contents of the arrays encode when each bit should change state during the PWM cycle.2 Ultimately, the contents of the arrays are set by the main controller via SPI.

I originally wrote this loop body with a counter and branch to handle the jump back to the top, but we have zero extra time in this loop, so we can’t spend any on loop maintenance. For that reason, I wrote a simple loop unrolling script that copies this body 128 times. This isn’t a big deal because we have much more flash space on the chip than we’ll ever need. At the beginning and end of the 128 iterations, there’s still some overhead to handle receiving new data via SPI and resetting the pointers, but that’s acceptable because it amounts to a few microseconds during every PWM cycle, instead of a few microseconds during every velocity period.

Updating the Main Controller

The changes I described in the previous section significantly reduced the demand on the main controller, a Teensy 4.1. Under this new system, its job is to handle a MIDI connection with the USB host (e.g., a laptop) on one side and the eight SPI connections to the subcontrollers on the other side. When a new MIDI message comes in, the main controller needs to convert it into appropriate signals to be transmitted to the affected subcontrollers. I’ll describe my approach to each of these steps in turn.

MIDI Message Handling

I’ve been writing the main controller code in Rust. So far, this has made development a lot less painful because of the strong type system, reliable development tools, and a solid package library. I got especially lucky when I found usbd-midi, an implementation of the USB MIDI standard for embedded devices. While it’s not strictly complete, the implementation is mature enough for my use case and hasn’t given me any trouble. My usage of the library amounts to a polling loop and a match statement over NoteOn and NoteOff events. When one of those messages comes in, the code branches to a handler that updates the internal state and queues an update to be sent after all messages are processed.

I’ll also mention here that I added basic logging over serial, which was annoying to set up but proved really useful for debugging. The reason it was annoying had nothing to do with the library or anything like that. The problem was that even though I was presenting the correct USB descriptors to the host, the Linux kernel wasn’t assigning the correct driver and giving me a block device. I eventually figured out that the root cause was my choice of vendor ID and product ID. I had chosen 0x1234 and 0x5678, respectively, for testing because I didn’t think they mattered. Turns out they do, and the kernel was branching its driver assignment logic based on them. When I changed the VID and PID to 0xffff and 0x0001, the problem went away and everything worked how I expected.

Key Motion Model and State Machine

When a NoteOn message comes in, it has a velocity associated with it. It’s not trivial to translate that into a PWM signal to be sent to the corresponding solenoid for a few reasons. One is that the key only needs to be pressed with the given velocity for a short time—that is, until it strikes the string—after which we should apply a lower holding force to conserve power and limit coil heating.3 Another reason is that when we want to press the same key quickly, we have to release it, allow it to begin returning to the up position, and then strike it again. These two requirements add necessary complexity to the main controller’s translation logic. I’ve tried to capture that complexity in a finite state machine based on a simplified motion model for the key mechanism. I doubt this model is optimal, and I expect to improve on it in the future, but it’s good enough for now. The correctness of this model is especially important because there is no feedback from the key regarding its position.4

Graphviz directed graph show the transitions
between the available key states

There are five key states: OFF, PRESSING, HOLDING, RELEASING, and REPEATING. We can receive a NoteOn or NoteOff message in each of these states, so both of those transitions are defined for all states. Some states can also time out, which means that they automatically transition to another state after a fixed period unless another transition happens first.

I’ll describe the “happy path”, or the cycle that most keys go through. Keys begin in the OFF state and stay there until a NoteOn message appears, after which they transition to the PRESSING state. In this state, the key is accelerating towards the bottom position with a final speed defined by the MIDI velocity in the NoteOn message. The timeout for this state is short (80–100 ms), after which the key transitions to the HOLDING state with a fixed lower PWM duty cycle. The key is now playing. Later, a NoteOff message appears for the key, so it transitions to the RELEASING state, which is a short duration when the PWM duty cycle is zero. This allows the key to almost return to the up position. Finally, the key returns to the OFF state and settles in the up position.

The case of rapidly repeating notes on the same key takes a different path. I realized that it’s possible for the MIDI host to send multiple NoteOn messages for the same key without a NoteOff message in between. Also, the controller could send NoteOn, NoteOff, NoteOn in quick succession, not leaving enough time for the key to recover. The MIDI standard seems to leave this kind of thing as an implementation detail for the synthesizer, which makes sense because the host can’t and shouldn’t know about the limitations of the synthesizer in general. For these edge cases, we can only do our best, which means reacting as quickly as we can and doing whatever makes sense given the state we’re currently in. Suppose we receive a NoteOn message while we’re already in the PRESSING state. We can’t press any faster than we already are, so we might as well keep pressing. That translates to a self-transition, as shown. Similarly, if we receive a NoteOn message in the RELEASING state, then we need to keep on releasing until the key is ready and then immediately press it again. That’s exactly what the REPEATING state is for.

As I admitted above, I don’t think this is optimal, but it’s not a bad first pass. In testing at high speeds, this approach can only make about six notes per second, or eighth note triplets at 120 bpm. It seems like the human record is 13, which I suppose is near the limit of what the typical key mechanism allows. I think I can probably improve this system and get closer to ten notes per second, which should be good enough for any reasonable music. I’ve seen other projects that take much greater care in matching the PWM states with the natural resonance of the key mechanism, which I think is a good idea that I’ll steal later.

Sending Subcontroller Updates

At the end of the USB polling loop, after the controller has processed all of the new MIDI messages from the host, the time has come to serialize the new PWM values and send the resulting bytes to the appropriate controllers. This amounts to a lot of careful indexing and bit slicing, followed by iterating through the subcontrollers and sending their new bytes via SPI. A good way to think about the PWM data bytes is to view them as a table. Time extends along the rows, with one row per velocity period. There are 128 velocity periods in total for each port, so the total number of bytes in the table is 256. The columns are the bits in each byte, and they encode the output state of the corresponding pin / key. The bits in any column will always monotonically increase by design—that is, the column will either contain all ones, all zeros, or a group of zeros followed by a group of ones.

To save time on the SPI bus, I implemented some branching logic that skips subcontrollers that haven’t changed since their last update. On the receiving end, the subcontrollers read each byte via the SPI peripheral and store it at an incrementing pointer, replacing the previous PWM data with the new version.

It’s possible for the main controller and subcontrollers to get out of sync in two ways. First, a bit error in the PWM data would result in incorrect key velocity. Second, an entire byte may be dropped due to missing a clock transition, in which case the data table pointers would become permanently out of sync. I haven’t noticed any problems like this so far, but I might add some kind of sentinel value that encodes the boundary between two data tables. This would give the subcontroller a way to stay in sync with the main controller even when transmission errors occur.

{T,F}rying an FPGA

Laptop, power supply, breadboard, FPGA dev board, and a mess of wires all on top of a piano bench

Before settling on the above approach, I also spent a few weeks working on a solution using an FPGA. I learned a lot about this unfamiliar area of digital electronics and wrote enough Verilog to get a working solution in simulation and on the bench. However, during testing I accidentally connected 24 V to an I/O pin, which fried the chip on my dev board. That was a frustrating mistake, but it prompted me to evaluate my chosen solution and realize that it was too complicated. I liked the idea of an FPGA due to its flexibility and high speed, but its nontrivial cost and higher manufacturing complexity made it difficult to work with. Also, I was planning to use every pin on my chosen chip with no room to spare, so the design was pretty constrained in that dimension. My mistake with the dev board was somewhat costly and certainly annoying, but I think it resulted in a better outcome for this project in the long term.

Scaling Up to Eleven Keys

Top view of the key platform, solenoids, plungers, and breadboarded driver circuit all installed on the piano

Solenoid Improvements

At the end of the previous post in this series, I had a reasonable solenoid design for this application. I’ve tweaked that design slightly by reducing the number of turns to 250 and being much more careful about winding them tightly. This improves efficiency because the density of turns near the plunger is higher. It requires more care, of course, but it’s worth it.

As I began building the solenoids, I found that it was difficult to meet the tolerance I’d set between the outside of the solenoid windings and the inside walls of the steel supports. I was also dreading having to cut and drill so much steel. For those reasons, I didn’t include the steel in this iteration of the multi-key module. This reduced efficiency but also reduced assembly complexity and overall cost. Plus, it revealed the red magnet wire, which I think looks nice against the black PLA support structure.

The net result of these changes seems to be that the holding power is about the same or slightly lower. It’s not that important to me to reduce the power any further right now, although less is always better in this case. In the future, I might try adding steel bars across the fronts and backs of both rows of solenoids, which would incur minimal additional complexity while potentially giving even better efficiency. I’m not sure whether this would actually work because I don’t know enough about how magnetic field lines are affected by ferromagnetic materials, so I guess it’ll probably be easiest to just run the experiment. I think it’s possible that there could be unwanted inter-solenoid effects when using bars that run across multiple keys, but I just don’t know yet.

Five plungers and their extensions sitting on a table with paper rolls stabilizing them while their glue dries
Solenoid plungers and their extensions during the glue drying phase. The middle three are shorter because they press black keys, which are closer to the solenoid platform.

My previous solenoid tests suffered from audible clicking due to the mechanical interfaces between the key surface, the plunger extension, and the plunger. I eliminated the clicking in the key/extension interface by sticking on some fuzzy circles meant for chair and table legs. For the extension/plunger interface, I cut a strip of printer paper, taped it onto the extension, rolled it up, and then dropped the plunger in with a bit of craft glue. The paper roll helps to keep the plunger and extension coaxial while the glue dries, after which the roll can be removed. This interface is weak to shear stress, but it works well enough for the low tensile stresses it sees under normal use.

Also, I sanded down the inside surfaces of the printed solenoid supports, which addressed an issue where the layer lines were causing the plunger to get stuck. I also explored printing the supports in a different orientation, but that introduced new problems that I wasn’t willing to try to overcome. It might be good to switch to teflon tubing or some other material that’s smooth and slides well against steel, but this is probably good enough for now or maybe forever.

Avoiding PCBs

I’d assumed for most of this project that one way or another I would eventually lay out a PCB and get it fabricated. This became a source of stress as the weeks fell away for all the reasons I’ve discussed previously. My janky at-home process would not be sufficient for any of the approaches I’ve considered, so I would be forced to order boards and hope they were right. Then I remembered that Perma-Proto boards exist. At $7, they’re much less costly than custom boards of similar size5 and since they’re meant to be permanent breadboards, I can transfer my design from a normal full-size breadboard to one of these without much additional effort. For the moment, my one complete circuit is on an actual breadboard, but as I scale up further I intend to build the whole project out of these Perma-Protos.

Mechanical Support

I switched from FreeCAD to CadQuery for all of my CAD models. I was tired of fighting with FreeCAD and dealing with broken model files. It seems like plenty of people are doing cool stuff with it, so maybe I’m just using it wrong, but right now it’s not for me. I much prefer code-first CAD. CadQuery has its own problems—or, again, I have my own problems with it—but it suits me better. I never worry about broken models, and it doesn’t assume that the universe ends beyond its borders. And I get to use Python.

After transitioning my few existing models to CadQuery, I went on to design the main platform for the eleven-key module. This turned out to be pretty easy once I got my code organized. I realized I could put all of the dimesions for different relevant items in data classes, compute additional dimensions based on those independent values, and then pull them all in wherever I needed them. Combined with a Makefile, this made quick tweaks to dimensions much safer and quicker, since they’d propagate throughout all affected models.

I’ve noticed that some DIY player piano designs suffer from flexure. When the solenoids are firing quickly, “over-the-keyboard” designs that span the entire piano tend to flex at least somewhat, even when supported with aluminum extrusion. I don’t want that to happen to my design, so I decided to build my platforms with plenty of support along the way across the keyboard, both in front of the keys and behind. This is possible on my piano because the wall behind the keys is angled upward, and there’s a ~12 mm ledge in front of the keys as well. I designed the platform to sit on silicone feet on the near side and just rest against the angled wall on the far side. This results in a surprisingly solid foundation that doesn’t seem to slide at all. I haven’t fully tested this under the dynamic loading situations I’ve described, but anecdotally it seems quite stable. In the future, I plan to somehow connect adjacent platforms to further improve their stability.

Render of the key platform with a view of the bottom side showing the strengthening ribs
Render of the platform CAD model. Notice the three ribs running from front to back, which give this model significantly more strength with essentially no additional material.

My first-pass design for the key platform was just a plate supported on opposite ends, but the weight of the solenoids alone was enough to make the plate flex—this time in the direction perpendicular to the keyboard. This wasn’t too surprising for 2 mm of PLA, but it would only get worse under dynamic load. To address this, I added ribs to the plate profile in positions that don’t interfere with the motion of the plunger extensions. The ribs extend down from the underside of the plate and significantly increase the second moment of inertia, which increases its rigidity under this kind of load. I haven’t measured the increase in strength, but it’s obvious from holding the two models that the new version is much stronger. I’m pleased with this small change because it seems to show that PLA is a suitable material for this purpose, with all of its benefits intact, provided that it’s applied efficiently in a design.

In its current form, the key platform model will not cover the keyboard evenly. When I started designing the platform, I thought I would make seven identical platforms, all starting at C in their respective octaves and covering twelve keys. (That’s the model you see in this post and in the repo at the moment.) This would leave three notes at the bottom and one note at the top of the keyboard to be handled with one-off platform models. I’ve decided to ditch that approach and have every module cover exactly eleven keys, which means there will be eight platforms with no keys left over.

The cost of this change is that every platform needs to be different, since the platform boundaries no longer align with the octave boundaries. In fact, the new platform boundaries will often land on a black key, which means that some platforms will need to expand or contract in order to cover their keys but not those of their neighbors. I’m not certain that this will be easy to handle, but I’m convinced enough for now that I’m willing to try it. The main benefit would be that all platforms would be electrically identical (except for their SPI CS connections) and independent, while the main cost would be that they wouldn’t be mechanically identical. Fortunately, the cost of that is low because my 3D printer doesn’t care.

Future Work


  1. Port C only has seven pins, and one of them is reserved for the reset signal, leaving six available for GPIO. Port D has eight pins, and all of them are available for GPIO, giving a total of 14. Luckily, port B has both the crystal inputs and the SPI pins all together, so we don’t have to spread across three ports, which would be too slow. See tables 13-3, 13-6, and 13-9 in the chip datasheet.↩︎

  2. Note that this means that it should be possible to change the pin value multiple times in a cycle. I haven’t tried this.↩︎

  3. This is why it’s important to have an ultrasonic PWM frequency. Holding the key with a low force means that the PWM duty cycle is between off and full on, exclusive, so there’s going to be a switching waveform. If the frequency is audible, then the solenoid will buzz, so your only choice is to hold with full force. I’m proud of my PWM solution because it combines the best of both existing solutions. It’s relatively easy to implement due to its use of widely available through hole components, and it retains the efficiency of using a lower holding force.↩︎

  4. I expect that in the future I may have to have motion model parameters for each key, or at least subsets of them. On my piano—and I imagine most others?—keys in the top third or so are easier to press and generally have a different action than the others.↩︎

  5. Gotta love economies of scale. I think DKRed quoted me about $14 per board for an early iteration of the eleven-key driver circuit that was decently optimized for space. Half the price and maybe one tenth the risk is hard for me to ignore. I guess I could have used one of the overseas companies for much less, but I always get a weird feeling from them that their prices are low because their labor standards are, too. But that’s just speculation.↩︎