The Breakthrough of Frequency Analysis

For almost a thousand years, from 500 to 1400, the cryptology of Western civilization stagnated. The systems used were extremely simple and more or less derivations of substitution ciphers and steganography. However the Arabs were the first to discover the importance of cryptanalysis in the 9th century CE. Till this time only cryptography existed and of any science of cryptanalysis there was nothing. Cryptanalysis is the science of unscrambling a message without the knowledge of the key and is based on finding weaknesses in encryption methods in order to break them. The Arabs had the best conditions for inventing cryptanalysis because they had reached a high level in several disciplines, including mathematics, statistics and linguistics. Theological schools were established where the contents of the Koran were studied in detail. Theologians tried to extract a chronological order of the numerous revelations and therefore they counted the frequency of specific words in every single revelation because some words arose earlier in comparison to other words. They continued to examine the scriptures phonetically and at the level of single letters and found out, that some letters occur much more frequently than other ones and which letters go or do not go together. They realised the rarest letters in Arabic and the most common letters: the letters 'a' and 'l' are the most common in Arabic, whereas the letter 'j' appears only a tenth as frequency. This apparently innocuous observation would lead to the first great breakthrough in cryptanalysis, namely frequency analysis. It is unknown who first realised that the variation in the frequencies of letters could be exploited in order to break ciphers, but the earliest known description comes from the 9th century scientist Abū-Yūsuf Ya’qūb ibn Ishāq al-Kindī. Al-Kindi has written about 290 books on medicine, astronomy, mathematics, linguistics and music. He also is the author of 'A Manuscript on Deciphering Cryptographic Messages'. It contains detailed discussions on statistics, Arabic phonetics and Arabic syntax and describes the system of cryptanalysis in two short paragraphs:

One way to solve an encrypted message, if we know its language, is to find a different plaintext of the same language long enough to fill one sheet or so, and then we count the occurrences of each letter. We call the most frequently occurring letter the 'first', the next most occurring letter the 'second' the following most occurring letter the 'third', and so on, until we account for all the different letters in the plaintext sample. Then we look at the cipher text we want to solve and we also classify its symbols. We find the most occurring symbol and change it to the form of the 'first' letter of the plaintext sample, the next most common symbol is changed to the form of the 'second' letter, and the following most common symbol is changed to the form of the 'third' letter, and so on, until we account for all symbols of the cryptogram we want to solve.

First page of al-Kindi's manuscript "On Deciphering Cryptographic Messages"