Sunday, June 29, 2008

My favorite cipher

There is something gloriously manual about a hand cipher, a cipher using pencil, paper, and possibly a small collection of keytables or such. There is intimate connection with the message, and a comprehensible dynamic that is impossible for the oily "kerchunk" of a rotor machine or the silent algorithmic intensity of yet more modern techniques.

So, in my not-remotely-technical way, I propose a quick review of three interesting wartime hand ciphers. One from the First World War, two from the Second, all three German. I'll take a look at these three from the standpoint of a cipher clerk enciphering a message in the field and then, with a perspective as artistic as it is mathematical, at the inner workings that render reversible incomprehensibility. It'll take time, so bear with me. First one, then the rest later on. If I get into it, there might even be more than these three..

ADFGVX -- and the heroic efforts that lead to its breaking -- is at least conceptually familiar to students of 1918. In the final year of that static conflict, the French cryptographic victory helped the allies withstand the final German push to Paris (it should be noted that there are those who question this widely held view of the history).

The German designers of ADFGVX knew well that a successful cipher needed to be simple to use. This lesson, and the corollary that a complex cipher may, through laziness or simple haste, may prove self-compromising in the field, was one that would be lost between the wars, much to the relief of allied cryptanalysts who knew they could always count on sloppy German practice.

But ADFGVX itself embodied some very interesting ideas. For starters, it enciphered messages into six easily transmitted letters: A, D, F, G, V, and X. In Morse, these letters are simple to send and difficult to confuse on reception. Clear transmission (and therefore minimum repetition) improves security.

Let us suppose you have the message "It was the best of times" to encrypt. The first ingredient is a 6x6 substitution square (note that original versions of this used the letters ADFGX in a 5x5 grid and therefore could not directly encipher numbers). In the field, this would have been provided by headquarters and, probably, changed on a daily basis.

  A D F G V X
A Q 1 W E R 2
D 3 T Y U 4 I
F O P A 5 S 6
G D F 7 G 8 H
V J 9 K 0 L Z

(Note that I generated this square by typing on that irritating anachronism, the QWERTY keyboard. I don't pretend to cryptographic validity here.)

We look up each letter in the plain text in the grid and then write down the two ciphertext letters that define its coordinates (row first, column second, so "R" would be represented as AV).

Our message encrypts as:


Note that the length is now doubled -- but these are the six easiest letters to transmit (radio was effectively all Morse back then!).

We're not done, because this is just a simple monophonic substitution. That is, every letter in the plaintext corresponds to a single symbol in the ciphertext (even if that symbol consists of two letters). And a monophonic substitution is about as easy to solve as it gets.

Now we write our columns out according to a code word that we have been given. This might also change from day to day, or depending on recipient, in an effort to reduce the amount of traffic a given key setup would produce. Our code word is EDMONDS -- but we omit the second instance of any repeated letter for EDMONS. Write the intermediate encipherment out underneath this word, lining each letter up underneath the ones above it.


I got lucky and my message lined up directly underneath the key word, making things look tidy and in fact more cryptographically secure.  The Germans left any blank columns unfilled but, as the tale will show, would have been better off adding random nulls at the end to fill all columns fully and equally.

Now take the letters off vertically by columns, in order of the alphabetical position of the letters in the key word. In this case, take of the D column first, then the E column, the M column, the N, the O, and finally the S. Per traditionally proper technique, we will do so in five letter groups.


This message would then be transmitted.

Analytically, ADFGVX embodies degrees of both transposition (portions of the message are moved around -- think word scramble) and substitution (think, well, substitution). Interestingly, however, it relies on a third component for its security kicker: fractionation. This results from the step where each plaintext letter's two-part symbol is written down horizontally and then taken off vertically. Each plain text symbol is, therefor, divided (or fractionated) into two parts which are located in different portions of the ciphertext. Modern cryptography places quite a bit of emphasis on this idea -- confusion and diffusion being two sought after goals.

At first blush ADFGVX might seem extremely secure. But the reality is that there are cracks a cryptanalyst can pursue. Realize that after our first step -- the plaintext's initial transformation by substitution into the double-lenth intermediate message of digraphic symbols -- all we had was a simple monophonic substitution. Funny looking, yes, but ultimately almost entirely devoid of security. Given any message with a normal distribution of characters and statistics will quickly yield the digraph pairs that correspond to each plaintext letter. Our sample message is statistically skewed, but AG's representation of E would quickly stand out in any normal frequency count. From there, the rest follows and the message is wide open.

A cagy cryptanalyst can, then try to break back from the fractionation component of the second step. Trialing different lengths and sequences of the columnar transposition guided by the keyword in the cipher's second step is a brute force possibility. This would involve taking an intercepted message "backward" to the intermediate stage by various different keywords, then analyzing the resultant intermediate text for normal statistical distributions of digraph pairs.

Hold on, in case I lost you, think of it this way...and remember that once we reach the intermediate stage of the in-order double-lenght message, the break is effectively trivial.

Our wily cryptanalyst starts by assuming the message's keyword was four letters long. It really doesn't matter what those letters were because a four letter word can only yield 24 possible ways of taking off the columns to produce the final cipher:

1234, 1243, 1324, 1243, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, 4321.

The cryptanalyst now tests the intercept by transcribing the message backwards into each of these columnar orders. This, incidentally is exactly what the friendly cipherclerk on the receiving end would do, but he would know they keyword and therefore the order in which to fill in the received code. Each of these 24 possibilities is then take off (going sideways this time) to produce a candidate intermediate text. Do a statistical count on this intermediate text and see if it looks valid. If close, try to decrypt. If gibberish, try again.

Sounds easy but time consuming -- but when nations are on the line resources are usually available, and nothing so far can't be taught to any moderately apt college student. But now let's say your foe used a five letter keyword for the 2nd step. That would have 120 combinations you' have to examine. Six letters? 720. Eight letters? 5,760. Nine? 51,840. You get the idea, right? And we forgot to account for the paltry six possibilities offered by a three character keyword. The Germans, at least in the first intercepts to be broken, used 20 columns...

In the end, the French depended upon some even more clever analysis. Cryptographic genius George Painvin spent months (and lost 20 pounds) breaking this cipher just ahead of the threat of a German offensive aimed at Paris. He guessed the structure of the cipher and saw that the essential security lay in the columnar transposition. To avoid brute force, he used two messages with the identical beginnings so helpfully common to military communication to determine the number of transposition columns used (20). The identical openings resulted in the first twenty characters of the intercept being identical. After that first column, the diffusion element of the cipher intermingled the traces of the identical openings with the divergent bodies.

Knowing that there were 20 columns, he could reconstruct them. Some were longer and some shorter by one letter. Logically, the longer columns would be to the left because of the order in which they were filled in. This sorted the columns into two groups -- the long left and the short right, thinning the field a little but still leaving far too many options to try. Assuming that messages with identical beginnings might have identical endings, Painvin was able to apply his opening logic in reverse and create a few groups of pairs that he knew must be next to each other, but not which was to the left and which was to the right. But with two columns, he could afford to roll the dice and try to find statistical distributions that matched written German.

That worked, and allowed him to find the digraph pairs for a few common letters. Using this break, the rest of the columnar arrangement and the substitution square filled themselves in by turns. Statistical guesses at the substitution arrangement could guide the arrangement of the columns (arrangements that produced greater frequencies of digraphs corresponding to common letters were more likely to be correct). As more columns were arranged, more digraphs emerged, and so on, until the columnar arrangement and the substitution table were both fully filled out and all that day's traffic could be read.

It was laborious and depended on a sufficient volume of traffic. The French never read ADVGVX reliably -- only on days with sufficient traffic to generate the sort of kick start needed. The initial breaks took weeks to analyze -- rendering the effort useless from a tactical standpoint. But as experience (and traffic and poor communications practice) grew, solutions were achieved sometimes the same day the messages were sent.

All this said, I like the beauty of this cipher. It embodies both substitution and transposition, and yet is relatively wieldy. If you thought that trial example was tough, just wait until parts two and three...

There are also a few steps that could be taken to improve security. A classic step is to substitute codewords for frequently named people and places. Certain suspected patterns can help speed up the breaking of the final stage, and if an army is fighting in Verdun, for example, a cryptanalyst looking at a signal to or from that army might look for that word. Substitute FIBLA for that word (and then encrypt it) and you have an improvement. Even better, have FIBLA, GIZWA, NILMO, and QAGTL all as options for the cipher clerk to pick at random.

The second stage keyword should be long, to maximize the number of choices facing the cryptanalyst. Chose a phrase like "YOURE MY WONDER WALL" (I've had that song stuck in my head today) instead of just a few words. Take columns headed by duplicate letters off in order instead of ignoring repeats and you have a 17 letter keyword: 355,687,428,096,000 possible combinations, more than enough to prevent trial-and-error cryptanalysis as I suggested. Better yet, don't use a keyword at all (and I believe this might be how the Germans actually implemented the cipher) but a numeric sequence (11 4 7 8 3 8 6 10 1 5 2 9 12) that couldn't correspond to some word or phrase a codebreaker might recognize.

An implementation that somehow changed a portion of the key (say the keyphrase) at every message would increase security. A little recognized fact is that message length can have a lot to do with the ability of a foe to compromise a cipher system. ADFGVX is vulnerable, but only with a sufficient amount of identically keyed traffic. So embedding an encrypted version of an ever changing key phrase or selecting one from a daily sequence rather than using the same one all day long would help. One could easily see a book that contained the substitution square on the left side and a table of 50 different transposition keys on the right, each identified by an in indicator to be sent along with the message or else some sort of scheme for sequencing to govern their selection. This was a common feature of some of the better ciphers of the 2nd world war and shortly thereafter -- the amazing American SIGABA machine for one, the postwar Swiss NEMA for another -- an indicator group that specified an easily changed portion of the machine's settings was transmitted (enciphered in some manner) as part of the message header and footer. The great Soviet spy ciphers (Vic and its kin) used similarly variable portions for every message.

I think that the idea of saying "I transmit part of my cipher key IN my message?" is shocking to most. Understandably! But the countermeasure is that done properly, the indicator group is not easy to read, and the effect of reducing message traffic for any given setup more than outweighs the risks.

Naturally, all the standard weaknesses of classical error cryptography (and their potential countermeasures) are present as well. Sloppy work resulting in identical messages being sent with different keys. Sending the same message on a secure and a non-secure channel resulting in the compromise of all the message on the secure channel for that keying period. Stereotyped beginnings or entire messages -- this was the bane of actual German cryptography, with a national penchant for patriotic phrasing or "nothing to report" reports...

Interestingly, while it would considerably weaken the security of the thing, it is possible to do a ADFGVX variation with no prepared materials, using a keyphrase of sufficient length to form up the initial substitution table. Take the letters of "The sky above the port was the color of television, tuned to a dead channel" and use them to form the initial grid, alphabetically filling in the rest of the spaces.

  A D F G V X
V X Z 0 1 2 3
X 4 5 6 7 8 9

Of course, we'd need to come up with some scheme for the numbers -- or else use the 5x5 version. Its weaker, since an analyst could guess the arrangement once they started to get a break. But for a spy or someone forced to work with no materials and sending a very small amount of traffic, it has possibilities.

I find this cipher charming for some reason. Its balance of complexity and comprehensibility works for me. It has its vulnerabilities, but then again so does anything of this period or level of sophistication.

The accessibility of it means I can play with it in my mind, try to understand it the way I picture George Painvin did.

No comments: