Maybe two years ago, I encountered something that made me think. (I mean, that
wasn’t the only time, I’m just… look, never mind.) I was committing something
to some git
repository, and the truncated commit hash (e.g. f151b8e
) was all
numbers. Not one of the possible letters a
through f
appeared in it. Of
course, the full hash (e.g. f151b8e71a17fd7c7e074bc4c1500f2dc05da742
) had
plenty of letters, but by chance those first 7 nibbles were all less than a
.
Hmm, I thought. How likely is that? Well, that’s a pretty easy problem.
Let’s assume all hashes are equally likely. That means we can consider each
character of the hash independently. Each character can be one of sixteen
possible values (0
through f
), but we want the cases where each character is
a member of the subset 0
through 9
. That means that the probability of
getting a digit for a given character in the hash is given by the following.
Simple enough, but that’s just for one character. To get all seven characters as digits, we need to raise this value to the power 7.
So about 1 out of every 27 commits or so should contain nothing but digits in the first seven characters. To my mind, that’s infrequently enough to be interesting while still frequently enough to be fun.
Other Interesting Parameters
What other neat constraints can we put on?
Well, an obvious one is to expand our digitsonly hash from 7 characters to the full 40. In that case, the probability is considerably smaller.
That’s about 1 in 146 million, so we can probably say that there might be a commit out there that only has digits. I bet a focused effort could even find it, but I leave that as an exercise for the reader (or maybe me in the future).
What about a short hash that’s in ascending order? We’ll include letters again,
so it could be something like 24569bf
. Notice that each character’s value is
less than the one after it.^{1}
How could we go about solving this? Well, somebody told me once that probability
is just fancy counting, so let’s count. We know there are sixteen possible
values and seven spots for them to go in. When we pick the first character, it
can be any character up to 9
, inclusive, since anything more wouldn’t leave us
enough room to fit strictlyincreasing characters.
That means that the only case for 9
looks like this: 9abcdef
. For 8
, there
can be one “jump” in the sequence, and that jump can be anywhere, so 8abcdef
and 89abcde
are both valid. That jump can be in any of seven positions (after
the first, after the second, etc.), so for 8
there are seven cases.
7
is where it starts to get more difficult. Here, there can be two jumps of
value 1 or one jump of value 2. We know from 8
that the jump of value 2 can
be in any of seven positions (e.g. 789cdef
). A better way to count these along
with the jumps of value 1 might be to think of the jumps as additive. That is,
jumps are always of value 1, but if more than one happen to occupy the same
position, they “stack up” and become a jump of a larger value. This makes the
counting easier, because now we can consider the jumps individually.
Let’s try it. Each jump can occupy any of seven positions, and for 7
there are
two jumps. That’s a total of $7^{2} = 49$ cases. For 6
, there are three
possible jumps, so that’s $7^{3} = 343$. In general, we can sum everything up
using the following expression.
As you might expect, most of the cases come from the last term of the sum, when $n=9$.
Now, all we need to do is count up the number of possible arrangements of any sort and divide the two values to come up with the probability. Since there are sixteen possible values for seven different positions, finding the total number of possibilities is pretty straightforward.
And so is dividing the two values.
This value is surprisingly high to me. I wouldn’t have guessed that nearly 1 in 5 truncated commit hashes meet my strictlyincreasing criterion, but apparently they do.
We could continue on like this forever.
There are endless criteria we could place on commit hashes (or other nearlymeaningless strings). I encourage the reader to set one or more and try to work out the probability. It’s good for the soul, or whatever.

Now, some of you may be smugly thinking that you know the term for this property—as I did mere moments before writing these words—but you’re (probably) wrong. The parameter I’ve arbitrarily stated explicitly disallows repeating the same value, but a monotonic function can have plenty of repeated values if it wants, as long as the values never start going back the way they came. ↩