Japanese cryptology from the 1500s to Meiji explained

The cipher system that the Uesugi are said to have used is a simple substitution usually known as a Polybius square or "checkerboard." The i-ro-ha alphabet contains forty-eight letters,[1] so a seven-by-seven square is used, with one of the cells left blank. The rows and columns are labeled with a number or a letter. In the table below, the numbers start in the top left, as does the i-ro-ha alphabet. In practice these could start in any corner.

8 i-ro-ha Alphabet, 1-7 Checkerboard Cipher
1 2 3 4 5 6 7
1i ro ha ni ho he to
2chi ri nu ru wo wa ka
3yo ta re so tsu ne na
4ra mu u wi no o ku
5ya ma ke fu ko e te
6a sa ki yu me mi shi
7we hi mo se su n

To encipher, find the plaintext letter in the square and replace it with the number of that row and column. So using the square above, kougeki becomes 55 43 53 63 or 55 34 35 36 if the correspondents decided ahead of time on column-row order. The problem of what to do in the case of letters such as "ga," "de," and "pe" that do not appear in the i-ro-ha alphabet is avoided by using the base form of the letter instead – as above where "kougeki" becomes koukeki.[2] Technically, this is a serious flaw because some messages may have two or more equally valid decipherments. To avoid this the encipherer may have had to rephrase messages.

The column and row headers do not have to be numbers. One common variation is to use letters. This was common in European cryptography and is found in the Uesugi cipher as well. However, the Japanese cipher had a twist that never seems to have been used in the West: using the last 14 letters of the Iroha poem to fill in the row and column headers. The table shown below[3] gives an example of this, using "tsurenakumieshiakinoyufukure".

Checkerboard Cipher Using Iroha
re ku fu yu no ki a
we a ya ra yo chi i tsu
hi sa ma mu ta ri ro re
mo ki ke u re nu ha na
se yu fu wi so ru ni ku
su me ko no tsu wo ho mi
n mi e o ne wa he e
shi te ku na ka to shi

This system of using a "checkerboard" to convert an alphabet into numbers or letters was described by Polybius over 2000 years ago. There are three main advantages to this system. First, converting letters into numbers allows for various mathematical transformations which are not possible or not as easy with letters – super-enciphering for example. Second, the checkerboard system reduces the total number of characters. Whether converting to numbers or letters, the Polybius square reduces 25 English letters[4] to five characters. Uesugi's square reduces to seven. This reduction makes cryptanalysis slightly more difficult than simple one-to-one substitution. Another benefit of the reduction in the number of letters is that it reduces the chance of error in communicating the message. The letters of the German ADGFX system in World War I were chosen because in morse code they are quite distinct and thus it was unlikely that an error in the morse code transmission would accidentally turn one letter into another. This would have been important for a sengoku daimyō, for instance, if he experimented with sending coded messages over long distances by torches, flags, poles, or similar system.

Finally, although the checkerboard system doubles the length of messages, breaking each plaintext letter into two ciphertext letters allows for separate transformations on each of the halves. However, this does not seem to have been used much in American or European cryptology and Japanese cryptologists apparently did not use it at all.

It is not known how or even if Uesugi actually used the seven-by-seven checkerboard system. The scarcity of evidence makes it impossible to draw any firm conclusions but tentatively it seems that senkoku period daimyō did not have much use for cryptology. Of course it is possible that they did have their "black chambers" and that those chambers were shrouded in such secrecy that no hint of their existence escaped. This seems unlikely however. Several daimyō compiled codes of conduct or books of advice on governing for their offspring. Had cryptology been an important factor in the success of such men, they might be expected to pass that advantage along to their successor. The fact that they did not do so, in writing at least, does not prove anything but, in light of the other evidence – and lack of it – does make the existence of black chambers of the European sort seem unlikely.

The history of cryptology in Japan shows two things. First, the fact that substitution ciphers existed makes the failure of the Japanese to improve on the substitution cipher or to invent the transposition cipher much harder to explain. Second, the lack of a strong cryptographic tradition suggests – almost requires – a correspondingly weak cryptanalytic tradition. In fact there seems to be no cryptanalysis in Japanese history before the late 19th century.

World War I as turning point

David Kahn identifies World War I as a major turning point for institutional cryptology. Before the war, breaking codes was an individual endeavor – one person wresting with the messages until one of them broke. After the war, successful cryptology against major nation states required large-scale organization.

Japanese cryptology does not seem to have been affected at all by World War I. The government continued using insecure codes of the sort they had been using since the Meiji Restoration. As a result, in 1921 Japanese diplomacy was unable to gain its preferred result at the Washington Naval Conference, ending with the least position Japan was willing to accept. Weak codes were the primary cause of that result, as the American delegation had the Japanese secret communications available.

The American "Black Chamber" and the two-letter code

The American "Black Chamber" under Herbert O. Yardley broke Japanese diplomatic codes in 1919 – less than a year after starting operations – and the Black Chamber cryptanalysts were still reading Japanese diplomatic traffic in 1921 when the Washington Naval Conference took place. Thanks to Yardley's book The American Black Chamber, the failure of Japanese cryptography at the Conference is well known. Yardley's book gives a valuable look into the quality of the codes employed by the Japanese government in the years leading up to, and during, the Conference and thus is worth looking at in some detail.

Judging from Yardley's description of the codes he and his cryptanalysts broke, Japanese codes in 1919 were weak and barely deserved to be called "codes". He might have exaggerated the difficulty of breaking the Japanese codes – British codebreakers thought Japanese codes at that time were so weak you almost didn't need a cryptanalyst.[5]

Analysis of the two-letter code

The two-letter code Japanese diplomats were using in 1919 consisted of two English-letter groups. This allows for a maximum of 676 (26*26) groups. That is far too small for a diplomatic code in 1819 much less 1919. Worse, the Japanese cryptographers did not use all of the available groups because Yardley says that the groups were either vowel-consonant or consonant-vowel, with "y" counting as both. If Yardley is correct about this, it means that the Japanese cryptographers limited themselves to only 252 of the 676 possible groups.[6] After using anywhere from 54 to 100 groups for the kana and ten groups for the numbers zero to nine, there were at most 188 unassigned code groups remaining.

Yardley made his original break into the code by realizing that wi ub po mo il re re os ok bo was a i ru ra n do do ku ri tsu (Ireland independence).[7] The doubled re re suggests the do do of airuran<u>do do</u>kuritsu. This guess is confirmed when he discovers that the recovered groups re ub bo work elsewhere for do i tsu (Germany).

The initial break into the code is further confirmed when as fy ok makes sense as o wa ri (stop). This is exactly how one breaks a simple substitution cipher – letter frequencies and repetitions in the text suggest possible plaintext letters. The cryptanalyst plugs in those letters and sees what yields meaningful text and what does not. Meaningful text suggests new letters to try and the cryptanalyst starts the cycle over again.

As can be seen from the description of Yardley's original break into the code, groups were assigned to kana like "do" and "bo" which in Japanese are not part of the regular alphabet but are created from other kana by adding pronunciation marks. Providing for these non-alphabet kana would require at least another 25 and possibly as many as 60 more code groups – hence the range given above for code groups for the kana – leaving only about 150 groups for words, phrases, and names. French cryptanalysts were making and breaking bigger, better codes in the 18th century. One suspects the Japanese language gave Yardley more trouble than the code itself did.

Thus the Japanese diplomatic code in use in 1919 was extremely weak and fundamentally flawed: a diplomatic code that does not contain code groups for common geopolitical names and phrases but requires them to be spelled out cannot be considered strong. Spelling out "stop" is further evidence that the code was not well designed. Even if the Japanese cryptographers devoted their 188 groups to the 188 most common phrases, the fact that they only had 188 groups to work with meant that most of their encoded messages would actually be simple-substitution enciphered messages of the sort that people had been solving for hundreds of years.

Code improvements in the 1920s and 1930s

According to Yardley, the Japanese codes his Black Chamber broke in 1919 were improved by a Polish cipher expert about a year later. His exact words are [italics in original]:[8]

Now the Japanese had no intention of permitting us to rest on our laurels, for from 1919 until the spring of 1920 they introduced eleven different codes.

We learned that they had employed a Polish cipher expert to revise their code and cipher systems. It took all our skill to break the new codes that this man produced, but by now we had developed a technique for the solution of Japanese codes that could read anything. Theoretically the Japanese codes were now more scientifically constructed; practically they were easier to solve than the first code, although some of them contained as many as twenty-five thousand kana, syllables and words.

The Polish cryptographer seemed to specialize on army codes, for the Japanese Military Attaché's codes suddenly became more difficult than those of any other branch of the Japanese Government.

Yardley was right about a Polish expert visiting Japan but he was mistaken about the timing. The Japanese army did bring in a Polish expert, Jan Kowalefsky, but he did not arrive in Japan until September 1924. If Japanese codes improved significantly between 1919 and 1924, as Yardley claims, the improvements were the work of Japanese cryptologists.

A possibility that is ripe for further research, is that Japanese cryptologists studied one or more of the books on codes and ciphers that were occasionally published in Europe and America. For example, Parker Hitt's 1916 book Manual for the Solution of Military Ciphers was hugely popular, selling around 16,000 copies in America. Also, Japanese military attachés might have been aware that Winston Churchill, in his 1923 The World Crisis, admitted that Britain had read German naval messages during World War I.

It is possible that Yardley is simply wrong and Japanese codes did not improve significantly between 1919 and 1924. Kahn found that one improvement Yardley mentions – three letter code groups mixed in with two letter groups – was not actually present in the Japanese telegram that Yardley claimed it was.[9]

Japanese cryptographers supposedly improved their codes through sectioning – breaking the message into parts and rearranging them prior to encoding. This buries stereotypical openings and closings, which makes it harder for cryptanalysts to make initial breaks into a code by guessing at probable words. The technique is known as bisecting, Russian copulation, trisecting, tetrasecting, etc. depending on how many pieces the text is broken into. Sectioning was not a new or revolutionary technique in the 1910s.

If, as Yardley claims, some Japanese codes did have as many as 25,000 code groups at the time of the Washington Naval Conference, it would indicate a healthy appreciation of cryptological realities. Cryptographers have long known that bigger codes are better – all else being equal, a 25,000 group code is stronger than a 2,500 group code. In fact, many commercial code books as far back as the 1850s had 50,000 groups – but governments were often reluctant to pay for the production of large codebooks. This limited the size and thus strength of government and military codes for many years. To be fair, the secure production, storage, and distribution of codebooks is not easy nor is it cheap.

However, it seems unlikely that the Japanese government was using codebooks with 25,000 groups in the early 1920s. Jumping from the weak code used for the Washington Naval Conference to a book code of 25,000 in just a few years seems too fast, especially without some external indication that their codes had been compromised. Further, as shown below, even in 1926 the Army's top cryptologist was developing a cipher system that had only about 2,500 groups and those were actually just 10 charts of about 250 groups each.

Thus, the situation between the Washington Naval Conference and the mid-1920s was not that of a Polish officer helping to make Japanese codes much more secure. Rather, Japanese cryptographers were working to bring their codes up to the level of other major governments.

The Polish cipher expert, Jan Kowalefsky, might not have helped improve Japanese codes before the Washington Naval Conference but he did have a strong influence on Japanese cryptography between the conference and World War II. He trained what seems to be the first generation of professional Japanese cryptographers.

Japanese authors have identified two events that influenced the Japanese army's decision to invite a foreigner to improve their cryptology.

The first was an incident during the Siberian Intervention. The Japanese army came into possession of some Soviet diplomatic correspondence, but their cryptanalysts were unable decipher the messages. Someone suggested asking the Polish military to try cryptanalyzing them. It took the Poles less than a week to break the code and read the messages.[10]

The second event also involved a failure to decipher intercepts. Starting in 1923, the Army began intercepting European and American diplomatic radio communications. Interception was difficult but the task of deciphering intercepted messages proved too much for the Army cryptanalysts.[11]

These two failures convinced the leaders of the Japanese army that they needed some outside help and for geopolitical reasons, they decided to turn to the Polish military. Poland had fought the Soviet Union in 1920 and the Japanese believed the Poles would be receptive to the idea of teaching someone on the Soviet Union's opposite flank how to read Soviet codes.

Learning from Warsaw and then in Warsaw

The Japanese Army could not have asked for more distinguished teachers. Polish cryptanalysts would later break early versions of the German Enigma machine in 1932 and their work jump-started the French and British efforts to break later, more complicated, Enigma machines. In the 1920s and 1930s it is accurate to say that Polish cryptanalysts were some of the best in the world.

The arrangements were made and on 7 September 1924, Captain Jan Kowalefsky arrived in Yokohama.[12] Kowalefsky taught a three-month joint Army-Navy course[13] to at least seven officers: four from the Army and three from the Navy.[14]

When the course finished, someone suggested that the novice cryptologists get some practical experience working with the Polish cryptologists in Poland.[15] The Japanese students would go to Poland with their teacher. Arrangements were made and a study-abroad program of sorts was started. Five officers left for Poland with Kowalefsky late in 1924 (Taishō 13).[16] They spent a year working in the Polish Army's Bureau of Ciphers before returning to Japan and taking up positions in the Japanese Army Cipher Department.[17]

Takagawa and Hiyama both assert that each year for about the next fourteen (until Shōwa 14) years, two Japanese Army officers traveled to Warsaw for a year of cryptological training.[16] Neither Smith nor Budiansky mentions Kowalefsky or anything about Japanese officers studying in Poland. Yardley mentions the "Polish expert" working for the Army but gets the timing wrong. In English, only Kahn actually gives this expert a name and provides some more details.

Discrepancies

Kahn writes that Kowalefsky had been in Japan from about 1920, when he was supposedly helping improve Japanese codes, and was still there in 1925 to teach at a new Navy code school. That is, Kahn has Kowalefsky working for the Navy, not the Army. Japanese sources make it clear that both Army and Navy officers attended Kowalefsky's three-month course, so some confusion is possible. However, Yardley wrote, correctly, that Kowalefsky worked for the Army but was wrong about the year since he claimed that the Polish expert had arrived in 1920. Yardley's error might explain why Kahn had Kowalefsky arriving in the wrong year but nothing in Yardley suggests that Kowalefsky ever worked for the Navy.

Although they do mention Kowalefsky (if not by name) neither Kahn nor Yardley mentions anything about Japanese cryptologists training in Poland or even Kowalefsky returning home. Thus, probably the most widely read English books on cryptological history are possibly missing a large and important part of the development of professional cryptology in Japan – if the Japanese sources are correct. If the Japanese sources for this history can be confirmed, it would be an important addition to the understanding of Japanese cryptology leading up to World War II. Polish cryptanalysts were very good and if they tutored the Japanese for almost fifteen years, it makes the Japanese failure to break most of the Allied codes during the war much more puzzling.

The two-letter, ten-chart code

Hyakutake Harukichi was among the first group of Japanese officers to study in Poland and on his return was made the chief of the code section of the third department of the army general staff. This was in 1926. Naturally enough, one of his first concerns was strengthening Army codes. He started by designing a new system to replace a four-letter code used by military attachés that had been in use since around 1918. The replacement was the two-letter, ten-chart code that Yardley mentions but mistakenly attributes to Kowalefsky in about 1920.[18] Yardley gives the following description of Hyakutake's new system and its effectiveness:[8]

This new system was elaborate and required ten different codes. The Japanese would first encode a few words of their message in one code, then by the use of an "indicator" jump to another code and encode a few words, then to still another code, until all ten had been used in the encoding of a single message.

Messages encoded in this manner produced a most puzzling problem, but after several months of careful analysis, I discovered the fact that the messages were encoded in ten different systems. Having made this discovery, I quickly identified all the "indicators." From this point on it was not difficult to arrive at a solution.

Yardley also describes the Japanese system of sectioning their messages but does not make it clear if this applies to the two-letter, ten-chart code. Takagawa's description of Hyakutake's code does not mention any sectioning but otherwise closely matches Yardley's account.[19] It is possible then that sectioning was not a part of Hyakutake's new system. Which code systems involved sectioning and when the systems were used is not clear. Michael Smith mentions in The Emperor's Codes that British codebreakers were surprised by the appearance of sectioning in Japanese codes around 1937.[20] The British had been reading some Japanese codes since at least as far back as the Washington Naval Conference. If they did not see sectioning in Army codes until 1937, in which code did Yardley see sectioning during his time at America's Black Chamber? Further research is necessary to answer that question.

It is clear from Yardley's description that Hyakutake's new system was not very effective. The system used 10 charts, each with 26 rows and columns labeled from a to z. This gives 626 two-letter code groups. Most words and phrases will not be in the code and must be spelled out in kana. In this respect it is similar to, but larger than, the first Japanese code that Yardley broke in 1919. The difference is that this time however there were ten codes instead of just one.Basically, Hyakutake created a poly-code system where the code changes every few words. This is just a code version of a polyalphabetic substitution cipher. Polyalphabetic ciphers use several different enciphering alphabets and change between them at some interval, usually after every letter. The strength of a polyalphabetic cipher comes from how many alphabets it uses to encipher, how often it switches between them, and how it switches between them (at random or following some pattern for example). The Vigenere is probably the most famous example of a polyalphabetic substitution cipher.[21] The famous cipher machines of World War II encipher in a polyalphabetic system. Their strength came from the enormous number of well-mixed alphabets that they used and the fairly random way of switching between them.

With a bit of luck, experienced cryptanalysts have been able to break polyalphabetic ciphers for centuries. From the late 19th century they did not even need luck – Auguste Kerckhoffs published a general solution for polyalphabetic ciphers in 1883 in his book La Cryptographie militaire.[22]

So although Hyakutake's new code system was original,[23] the fundamental idea underlying the system was well known, as were its weaknesses. With only 626 code groups, it is more cipher than code. As mentioned above, the ten different code charts just make it a polyalphabetic cipher – one with only ten "alphabets." Methods like Kerckhoffs' superimposition [24] can be used to convert several polyalphabetically encoded messages into ten monoalphabetically encoded message chucks. Chunks which are very easily solved. It is not surprising that the members of Yardley's Black Chamber broke the code in a few months.

The use of ten charts may have been an illusory complication – rather than improve the security of the code, it probably made the code weaker. If, instead of ten different code groups for 626 terms, Hyakutake had used the ten charts (with slight modification to make each group unique) to provide code groups for closer to six thousand terms, the code would have been much stronger.

Including more terms means that fewer have to be spelled out in kana – which is the whole point of using a code. Further, the reduction in duplication allows more flexibility in assigning homophones. Instead of ten groups for each letter, word, or phrase, each could receive homophones based on its frequency of occurrence. For example, the cryptographer can assign an appropriately large number of homophones to high-frequency letters and words like "n," "shi," and "owari" and only one or two code groups to lower frequency elements.

Likewise, if code groups were used to indicate a switch to a new chart, this could also have weakened the code unnecessarily. In fact, Yardley specifically mentions it as making the codes easier to cryptanalyze. Generally speaking, substitution systems switch alphabets as often as possible because that provides the best security. Their strength lies in how many alphabets they use and how randomly they switch between them.

So switching charts after every couple of words is not as secure as switching after every word. Also important for security is how the cryptographer switches between the charts. If Hyakutake's system required the code clerk to switch codes charts pseudo-randomly, that would provide more security than requiring a set sequence of changes. This is more important if the charts are derived from one another in some predictable manner. If, for example, the plaintext battle engaged is aa on chart 1, ab on chart 2, and ac on chart 3, then switching between the charts in order will pose much less difficulty for the cryptanalyst than using the charts in a more random order.

Regular polyalphabetic substitution ciphers often rely on code words to determine alphabet changes. Each letters of the code work references a different alphabet. With the ten charts of Hyakutake's system, a code number would be easy to use for pseudo-random changes – "301934859762" means encode the first word or phrase with the third table, the second word or phrase with the tenth (zeroth) table, etc. The thirteenth word or phrase would be encoded with the third table again. Of course to give maximum security this code number needs to be changed frequently.

Unfortunately, there is no information on how tables were changed except for Yardley's vague "until all ten had been used in the encoding of a single message," quoted above.[8] This unfortunately says nothing of the order the charts are used in.

Hara Hisashi's pseudo-random number code

Hara Hisashi became head of the code section of the Seventh Division sometime after 1932 and was later transferred to the Third Section of the Army General Staff.[25] Sometime between then and 1940, Hara devised a system that used a pseudo-random number additive to superencipher the three number code the Army already had in service.

Neither Takagawa nor Hiyama provide details about when this three-number code system was adopted for Army communications. A three-number code has a maximum of 10³, or 1000 groups — which is still too small for a strategic code and a far cry from the 25,000 that Yardley claims some Japanese codes had in the 1920s. However, it was a two-part code — an important improvement.

Two-part codes

Code books contain two lists – one of code groups and one of plaintext letters, words, and phrases. Someone encoding a message looks up the words in the plaintext list and substitutes the corresponding code group. Obviously it is important for that person's sanity that the plaintext be in some sort of order so words can be looked up easily. Since the system is similar for decoding – look up the code group and substitute the plaintext – it is equally important to have the code groups in order as well. With a one-part code, both lists are in alphabetical (or numerical) order. This means that you can encode and decode using the same book.

It also makes it easier for the enemy to break the code because once they realize they are dealing with a one-part code, they can use known groups to draw conclusions about unknown groups. For example, if the enemy knows that aabbc is Antwerp and aabbz is available, they will know that aabbm cannot be Tokyo.

A two-part code mixes the lists, making the code stronger by avoiding the problem described above. The drawback is that you now need two books. One, for encoding, has the plaintext in order to make encoding easy and the other, for decoding, has the code groups in order. Hence the name "two-part" code. The increase in security usually outweighs the increase in size and extra security concerns.Antoine Rossignol invented the two-part code around 1650 or so.[26] The idea could hardly be considered new or secret by the 20th century, so again it is surprising to see Japanese cryptographers taking so long to begin using a common cryptographic method.

Random numbers

The "one-time pad" system is the only cipher system that is totally secure. It uses random numbers to encode the plaintext. If the numbers are truly random and the encoder never reuses those numbers, the encoded message cannot be broken. Fortunately for cryptologists, random numbers are very difficult to come up with and creating, distributing, and managing pads for more than a handful of correspondents is beyond the capabilities of most governments.

Using random numbers for cryptography was first done around 1917 for securing teleprinter communications. It proved unfeasible for the reasons mentioned above. By the mid-1920s however, the German government was using one-time pads for diplomatic correspondence.[27] They had learned their lessons from World War I and were determined not to let it happen again.

Hara devised a system that used random numbers to superencipher Japanese army codes. Possibly because of the logistical difficulties inherent in the one-time pad system, Hara's system used tables of pseudo-random numbers. The encipherer had to indicate where in the table he (or much less likely at the time, she) did this by hiding the row and column headers from the table in the message.

This system is not new. Diplomats and armies started superenciphering with additives sometime during or soon after the First World War and by the 1920s it was common. German diplomats in Paris were using, shortly after the First World War, a codebook of 100,000 groups superenciphered twice from a book of 60,000 additive groups![28] It would be very surprising if after five to ten years of training with the Poles, Japanese Army cryptologists were not already familiar with superenciphering with additive tables.

Superencipherment is fairly strong. It can be, and was, broken, but it is very hard to do. With the exception of the one-time pad, which will keep its secrets until the end of time, any code or cipher can be broken. All that is required is sufficient material. All that can be expected of a code or cipher system is that by the time the enemy breaks it, the information in the message is no longer useful. This is just a cryptographic fact of life.

Hara's pseudo-random code system, like every additive system other than the one-time pad, can be broken. Eventually someone, somewhere will use overlapping parts of the additive charts. The first thing the cryptanalyst does is identify where in the message the starting point of the chart (the "indicator") is hidden – this allows the messages that are enciphered with the same sections of the number charts to be lined up and the additives stripped off.[29]

Hara's pseudo-random number generator

Perhaps realizing the gap between theory and practice, Hara devised a small system for generating pseudo-random numbers that could be used by units whose charts were outdated and which could not be supplied with new ones. This suggests that the cryptographers had real world experience with cryptology under battlefield conditions.

The system is simple, as it no doubt was intended to be. It requires a small chart of random numbers. Instead of using the numbers as additives, the encipherer uses two or more of them to create a much longer number. That number is then used to superencipher the message. The figure below shows how this is done.[30]

Creating a Pseudo-Random Number from Two Other Numbers
831728 8 3 1 7 2 8 8 3 1 7 2 8 8 3 1
96837 9 6 8 3 7 9 6 8 3 7 9 6 8 3 7
Result 7 9 9 0 9 7 4 1 4 4 1 4 6 6 8

When the numbers are added, any tens units are dropped. Thus 8 + 9 = 7. If the encipherer uses a six-digit number and a five-digit number, the resulting pseudo-random number will repeat after 30 digits. Hiyama gives an example of this system using a seven-digit and a five-digit number, which repeats after 35 digits.[31]

This pseudo-random number system is much weaker than the usual system of superencipherment but as an emergency backup system it would have been adequate and certainly better than using a transposition or simple substitution cipher. Like any other cipher system, breaking a pseudo-random number system just requires a sufficient amount of intercepted ciphertext.

The state of Japanese Army cryptology around 1941

Hyakutake's two-letter, ten-chart system was exceedingly weak. It might have made a decent tactical field code – it is simple to use, requires only the paper charts and a pencil, and is easily changed. As a code for military attachés around the globe, however, Hyakutake's system was much too weak. It was basically a slightly improved version of the Foreign Ministry's two-letter code that Yardley broke in 1919 and possibly not as strong as the four-letter code it replaced.

Kahn, Smith, and Budiansky all make it clear that superenciphering and using pseudo-random additives were nothing new even in the 1920s – Kahn says that enciphered code was "the customary method for diplomatic communications."[32] A system using random numbers to superencipher messages was not revolutionary in the 1930s.

Thus, Hara's system was not new and does not seem to have been any better than similar systems long in use in other countries. Nevertheless, devising and implementing the Army's system was an important accomplishment and it is possible that Hara was responsible for it. A topic for further research would be why this system was chosen instead of machine ciphers. Was the random number system chosen for non-cryptological reasons? Were the Army cryptanalysts good enough to understand that random numbers were more secure, when used correctly, than cipher machines?

There were several books available that hint at ways to break cipher machines. William Friedman's The Index of Coincidence and Its Applications to Cryptography was revolutionary; the addition of advanced mathematical, especially statistical, methods to the cryptological toolkit made traditional cryptographic systems obsolete and machine systems breakable.[33] So it is possible that the Japanese cryptanalysts knew that cipher machines were, in theory at least, breakable.

The Polish military realized early on that machine enciphering would change the science of cryptology and from 1929 employed mathematicians to work on cryptanalysis. However, as the goal of Japanese-Polish cryptographic cooperation was to train the Japanese side to break Russian codes, there would have been no need for the Polish cryptologists to reveal methods of breaking machines the Russians were not using. Teaching the Japanese the latest and greatest methods would not be of any use against Russian codes and would only risk the Germans finding out and changing their codes. The Poles thus had a strong incentive to teach the Japanese just as much as they needed to know.

The Japanese army was aware of machine systems; at the Hague in 1926, a Japanese military attaché saw a demonstration of the Model B1 cipher machine from Aktiebolaget Cryptograph.[34] In fact, in the early 1930s, both the Japanese Navy and the Foreign Ministry switched to machine systems for their most secret messages. The fact that those systems seem to have been developed in Japan suggests that there were knowledgeable cryptographers in Japan. Which suggests that perhaps there were other, non-cryptographic reasons why the Army continued to use chart and book based systems. Perhaps further research into the cultural and institutional aspects of inter-war cryptology in Japan could uncover those reasons.

Conclusions

Several curious facts stand out in this cursory overview of Japanese cryptological history. One is that the Japanese government did not bring in an outside expert to help with their codes until 1924. Considering all the other gaikokujin oyatoi (hired foreigners) brought in to assist with "modernization" in the Meiji period, it is striking that such an important field as cryptology would be ignored.

This suggests that the Japanese government in the first decades of the 20th century did not really understand the importance of cryptology for protecting communications. Such an attitude would hardly have been limited to Japan in the 1910s or 1920s – despite their success at the Washington Naval Conference, and later public chastisement by Yardley, American codes remained weak right up to the early 1940s. However, even America, thanks to its ties to Europe, had a cryptological history and a reserve of talented people who understood the problems and the solutions. Japan does not seem to have had anyone like Yardley, much less a William Friedmann.

The Japanese Army cryptologists, despite training with the Polish military for over ten years, originally developed substandard codes. Hara's system shows significant improvement and demonstrates an understanding of cryptography at least the same level as practiced by other major world powers in the early 1940s.

See also

Notes and References

  1. Ravi . Sujith . Knight . Kevin . 2009 . Li . Wenjie . Mollá-Aliod . Diego . Probabilistic Methods for a Japanese Syllable Cipher . Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy . Lecture Notes in Computer Science . 5459 . en . Berlin, Heidelberg . Springer . 270–281 . 10.1007/978-3-642-00831-3_25 . 978-3-642-00831-3.
  2. takagawa_2003
  3. page 162 of [takagawa_2003]
  4. To fit the English alphabet into a five-by-five square, the encoder either drops one letter or puts two in one square.
  5. Smith, p. 17
  6. 6 vowels (including "y") times 21 consonants (also including "y" and assuming they used all English consonants and not just the romaji consonants) times 2 (because the reverse - "ed" and "de" - is also okay) gives 252 total groups.
  7. Yardley, p. 176
  8. Yardley, p. 184
  9. Kahn page 1053, endnote to page 358, says that there were no three letter groups in the telegram. Yardley makes the claim on pages 289-290.
  10. Takagawa, p. 177
  11. Hiyama, p. 29
  12. Hiyama, p. 9
  13. Hiyama, p. 34
  14. Hiyama, p. 31
  15. Hiyama p. 35-36
  16. Hiyama, p. 36
  17. Hiyama, p. 39-40
  18. Takagawa, p. 179, Yardley p. 184
  19. Takagawa p. 178-180
  20. Smith, p. 55
  21. Kahn, p. 146--149
  22. Kahn, p. 233
  23. I cannot find any references to any other system of this nature.
  24. Kahn, p. 236--238
  25. Takagawa, p. 180
  26. Kahn, p. 160--161
  27. Kahn, p. 402--403
  28. Budiansky, p. 55
  29. Budiansky, p. 78--81, has an example of the process.
  30. the numbers are taken from Takagawa; Takagawa, p. 181
  31. Hiyama, p. 242
  32. Kahn, p. 402
  33. Kahn p. 376
  34. Kahn, p. 425