Did you solve it? Are you brainy at binary?

Here are the questions I set today, repeated with the solutions, and a discussion at the bottom.

Both puzzles are about binary codes, which are a way of encoding information using only the binary digits 0 and 1, called “bits”.

1. ABACAB

Here is a code for the letters A, B and C, in which each letter is represented by two bits.

A: 00
B: 01
C: 10

This code would translate the six-letter message ABACAB into 12-bits: 000100100001

Devise a binary code that translates ABACAB into a string consisting of only 9 bits.

To clarify: in this code, A, B and C each have their own unique representation in binary.

Also, the code must not be ambiguous. A code is ambiguous if the encoding of two different messages produce the same bit string. For example, a code in which A is 0, B is 1, and C is 01, is not permitted because this would mean that the encodings of both AB and C are “01” . Given any bit string, it must be unambiguously decodable into a message of As, Bs and Cs.

[UPDATE: combinations of letters, such as AB, BC, or AC, are not allowed their own codes. The only codes are for A, B and C.]

Solution

Here’s a code that works:

A: 0
B: 10
C: 11

Thus the message ABACAB is 010011010

When we read it from left to right, each letter is unambiguous. The first digit, 0, has to be A, thus we have A10011010. The next available digit, 1, is not the code of a letter, but the next two are, 10, which is B, so we have AB011010. And so on.

How might you have deduced this code? First, we must have a letter that is coded by a single bit, if we are to have a shorter message than the two-bit code above. Also, we can’t have two letters represented by single bits, as stated in the question. So only one letter is a single bit.

Which letter should be a single bit? It makes sense that it is the most frequent letter in the message, which is A. So let A be 0.

B and C must both be two bits, since if they are the total length of ABACAB will be 9 bits, as stated in the question.

There are four possible two bit codes: 00, 01, 10 and 11.

We can eliminate 00, since this could also be decoded as AA, which means the code would be ambiguous.

We can also eliminate 01 and 10 together, since that would mean 010 would be ambiguous – either 01 and 0, or 0 and 10.

We get an unambiguous code if B and C are 10 and 11 (or vice versa), and this coding is easy to decode reading from left to right.

We also get an unambiguous code if B and C are 01 and 11 (or vice versa). This is the mirror image of the previous solution, i.e 10 and 11, and would be the choice if you wanted to decode from right to left. The decoding is a bit awkward when processing from left to right.

Alternate solutions are the ones above, but in which all 0s are 1s and all 1s are 0s.

2. ABACADABA

Here is a code for the letters A, B, C and D, in which each letter is represented by two bits.

A: 00
B: 01
C: 10
D: 11

This code would translate the message ABACADABA, which has nine letters, into a string consisting of 18 bits: 000100100011000100.

Devise a binary code that translates ABACADABA into a string consisting of only 15 bits.

Same rules as above. Every letter has its own unique code. And the code is not ambiguous.

Solution.

Here’s one code that works:

A: 0
B: 10
C: 110
D: 111

We deduce this code using the same logic as above.

The most common letter, A, is a single digit 0.

We can eliminate 00 as a two-bit code, and also 01 and 10 being used together, for the same reasons as above.

What about 01 and 11 together? Let’s say A: 0, B:01 and C: 11. If we list all eight possibilities for D, they all produce ambiguities:

000 (AAA), 001 (AB), 010(BA), 011(AC), 100 (0100 AD or BAA), 101 (0101 BB or AD), 110 (CA), 111 (0111 AD or BC)

What about 10 and 11 together. Let’s say A: 0, B:10 and C: 11. Again, the eight options for D all produce ambiguities.

000 (AAA) ,001 (0010 DA or AAB), 010 (AB), 011 (0011 AAC or AD), 100 (BA), 101 (1010 DA or BB), 110 (CA), 111 (1110 DA or CB)

We conclude, therefore that only one of the letters has a two bit code. Let’s chose B, which is the second most frequent letter.

Since we want to decode the string easily from left to right, we select for B a code that does not start with 0. Thus, let B = 10. For the same reason we select for C and D codes that do not start with 0 or 10. Assigning C: 110, and D: 111, gives an unambiguous code.

This type of code is also called a prefix code because any substring at the start of a letter code is not a letter code. A prefix code can also be represented as a “tree”, as we will discover below.

Conclusion:

The puzzle solution we gave follows a top-down approach to solving a simple version of the general problem posed by Robert Fano in 1951:

Given a collection of letters, numbers, or symbols, find the most efficient way to represent them using a binary code.

The problem is hard – and may explain why Fano and his colleague Claude Shannon, who also followed such an approach, struggled with it. Fano’s student David Huffman solved the problem using a bottom up approach – realising the underlying structure that creates the most efficient code.

He devised what is now called a “Huffman tree”, illustrated below. There is one node, which is the “root” of the tree. All other nodes have one predecessor.

If you want to read more on this, here’s the Wikipedia page, and here is an excellent video that explains it clearly and in more depth.

I hope you enjoyed this puzzle. I’ll be back in two weeks.

Thanks to Pierre Chardaire, a retired computer scientist, for help with this puzzle.

I set a puzzle here every two weeks on a Monday. I’m always on the look-out for great puzzles. If you would like to suggest one, email me.

I give school talks about maths and puzzles (online and in person). If your school is interested please get in touch.