Episode 28 — History & Mathematics
From Julius Caesar's shifting alphabet to Alan Turing's electromechanical Bombe — the remarkable 2,000-year journey of secret writing, its eventual defeat, and what it teaches us about pattern recognition and probabilistic reasoning.
The Basics
A cryptogram is one of the oldest puzzles in recorded history and one of the most intellectually satisfying — because it puts you in the shoes of a real spy, a wartime intelligence analyst, or a Renaissance scholar deciphering a rival monarch's dispatches.
In its purest recreational form, a cryptogram presents a famous quote or phrase in which every letter of the original has been replaced by a different letter according to a substitution key. The key is consistent throughout: if the letter E has been replaced by Q, then every E in the original text appears as Q in the cryptogram. The solver's job is to reverse-engineer that key using logic, language knowledge, and statistical reasoning.
The technical name for this technique is a monoalphabetic substitution cipher — "monoalphabetic" because a single fixed alphabet performs all the substitutions, and "substitution" because each plaintext character substitutes for exactly one ciphertext character. There are 26! (approximately 4 × 10²⁶) possible monoalphabetic substitution alphabets for English, which sounds astronomical — but frequency analysis makes most of them solvable in minutes by a practiced human solver.
Understanding the difference between ciphers and codes clarifies a common confusion. A cipher transforms individual characters or small groups of characters. A code replaces whole words, phrases, or semantic units with other symbols — think of naval flag signals, where different flag combinations represent complete messages like "I require a pilot" or "my vessel is on fire." Codes require codebooks; ciphers follow algorithmic rules. Recreational cryptograms are ciphers.
The Roman general and statesman Julius Caesar used a simple substitution cipher for private communications — shifting every letter of the alphabet forward by three positions. A becomes D, B becomes E, C becomes F, and so on. The message MEET ME AT DAWN becomes PHHW PH DW GDZQ. Suetonius documented this practice around 121 CE, making the Caesar cipher the earliest named cipher in recorded history.
Caesar's ROT-3 cipher shifts every letter three positions forward. Only 26 possible variants exist — a five-year-old with enough patience could brute-force every one.
The Caesar cipher is trivially breakable today: there are only 25 other possible shift values (ROT-1 through ROT-25), so brute force works in seconds. The cipher's historical value was that most soldiers Caesar corresponded with were illiterate — secrecy came from the military's near-monopoly on literacy, not from cryptographic strength. Once adversaries began applying systematic analysis, simple shift ciphers fell immediately.
The Math
Around 850 CE, the Arab polymath Abu Yusuf Yaqub ibn Ishaq Al-Kindi wrote the first known systematic treatise on breaking substitution ciphers. His insight was elegant: language is not random. Every natural language leaves a statistical fingerprint in its letter frequencies.
In English, the distribution of letters in typical prose follows a remarkably consistent pattern. The most frequent letters — ETAOIN SHRDLU — appear so consistently across diverse texts that experienced cryptogram solvers have committed this sequence to memory as a mnemonic. Here are the top ten most frequent English letters with their approximate frequencies:
| Letter | Frequency | Visual | Solver's hint |
|---|---|---|---|
| E | 12.7% | Most common — always the first guess for the top cipher letter | |
| T | 9.1% | Common in THE, THAT, TO, THIS, THERE | |
| A | 8.2% | Single-letter word with I means one must be A | |
| O | 7.5% | Often follows T; common in TO, OF, ON, OR | |
| I | 7.0% | The other single-letter word candidate | |
| N | 6.7% | Common at word ends: -ING, -ION, -AN, -EN, -IN | |
| S | 6.3% | Common word-starter and plural/possessive marker | |
| H | 6.1% | TH bigram is the most common digraph in English | |
| R | 6.0% | Common in ER, RE, AR, -ING, -ER endings | |
| D | 4.3% | Common word-ender: -ED past tense |
Frequency analysis works as a cascading inference engine. You start with probabilities — the most frequent cipher symbol is probably E — and each confirmed mapping constrains all subsequent guesses. Once you know that cipher-Q maps to plaintext-E, every word containing Q reveals partial structure. A three-letter word with Q in the middle is likely T_E — THE or TOE or TIE. Each confirmation narrows the search space exponentially.
The technique has direct analogues in data science and machine learning. Any time you characterize a dataset by its statistical distribution and compare that fingerprint against a library of known patterns — classifying spam emails, identifying authorship of anonymous texts, detecting anomalies in network traffic — you're applying the core insight Al-Kindi formalized 1,200 years ago.
Sixteenth-century cryptographers recognized that frequency analysis made any monoalphabetic cipher vulnerable. Their solution was the polyalphabetic cipher — using multiple substitution alphabets in rotation to flatten the frequency distribution. The Vigenère cipher, attributed to Blaise de Vigenère though developed by earlier authors, became the dominant polyalphabetic method.
Each plaintext letter is shifted by the numeric position of the corresponding key letter (K=10, E=4, Y=24). The same plaintext letter encodes differently depending on its position: the letter A appears as K, K, and K — but that's because it always aligns with the same key letter. With a longer key, repetitions become rarer, frustrating frequency analysis.
The Vigenère cipher earned the nickname "le chiffre indéchiffrable" (the indecipherable cipher) and maintained its reputation for approximately 300 years — until Charles Babbage cracked it around 1854 (but kept the method secret for wartime purposes) and Friedrich Kasiski independently published the solution in 1863.
The Kasiski examination identifies the key length by finding repeated sequences of three or more characters in the ciphertext — these repetitions arise when the same plaintext segment aligns with the same key segment, which happens at intervals that are multiples of the key length. Once the key length is known, the analyst splits the ciphertext into groups separated by that period and applies frequency analysis to each group separately, restoring Al-Kindi's 9th-century technique to full effectiveness. The key length determines only how many separate frequency analyses are required, not whether frequency analysis works.
World War II
The Enigma machine, developed in Germany during the 1920s and adopted by the German military in the 1930s, represented a qualitative leap beyond pencil-and-paper ciphers. Its electromechanical complexity created a cryptographic challenge that human frequency analysis could not directly solve — requiring instead a new kind of thinking: computational constraint satisfaction.
The crucial innovation was the stepped rotation mechanism: each time an operator pressed a key, the right-hand rotor advanced by one position, periodically causing the middle and left rotors to advance in a gear-like cascade. This meant the substitution alphabet changed with every single keypress — encrypting the same letter A twice in a row would produce two completely different cipher letters. Simple frequency analysis became useless because no single consistent substitution mapping existed across the message.
The plugboard (Steckerbrett) added a further layer of confusion before and after the rotor stage, swapping ten pairs of letters. The combined system offered approximately 10²³ starting configurations — enough that testing them manually at a rate of one per second would take longer than the age of the universe.
The team at Bletchley Park, led intellectually by mathematician Alan Turing and engineer Gordon Welchman, realized that brute-force enumeration was impossible but constrained search was feasible. The key insight came from Enigma's fundamental limitation: the reflector guaranteed that no letter could encrypt to itself. This single constraint — known as "no letter maps to itself" — allowed analysts to prune enormous fractions of the configuration space instantly.
Combined with cribs — known or guessed plaintext fragments, such as mandatory weather report headers ("WETTER" appearing in a fixed position), stereotyped greetings, or commanders who reliably signed their names — the Bombe could test a single crib hypothesis against all rotor configurations in approximately 20 minutes. A correct crib would produce a consistent solution; incorrect ones would fail the no-self-encryption test rapidly. Bletchley eventually operated over 200 Bombes simultaneously, breaking German naval, air force, and army communications in near-real time during the crucial mid-war period.
Historians estimate that the intelligence obtained at Bletchley Park — codenamed ULTRA — shortened World War II by approximately two to four years. The story of Enigma's defeat demonstrates a principle that cryptographers call Kerckhoffs's principle: a cryptographic system should remain secure even if everything about the system except the key is public knowledge. Enigma failed partly because operators introduced known-plaintext vulnerabilities — the algorithm's theoretical security could not compensate for human procedural weakness.
The History
Technique Catalog
Every historical cipher system eventually succumbed to a specific analytical technique. Understanding these techniques illuminates both the history of cryptography and the underlying mathematical structures of language.
Learning Framework
Experienced cryptogram solvers have internalized this process until it feels intuitive — but it rests on a systematic logical framework that any learner can adopt explicitly and then gradually internalize with practice.
Tally every cipher letter and rank them from most to least frequent. Your most-frequent cipher letter is probably E. For a message of 120+ characters, the top five cipher letters probably map to E, T, A, O, I in some order — though the specific ranking varies by text.
A single-letter word is A or I. A common two-letter word is IN, IT, IS, AN, OF, AT, TO, AS, BE, BY, DO, GO, HE, ME, MY, OR, UP, US, WE. Three-letter THE is far the most common three-letter word in English. Map these patterns against your frequency counts.
The suffix -ING appears on virtually every verb base. -TION, -NESS, -MENT, -MENT, -LESS, -ABLE, -ED are common. If you've confirmed letters for I, N, G from earlier steps, look for three-letter word endings that match to find verb bases — this cascades into confirming more letters rapidly.
Every time you confirm a cipher-to-plaintext mapping, mark it everywhere in the puzzle. A confirmed letter might immediately resolve a word you weren't yet examining. Carry substitutions propagate across the entire message simultaneously — this is constraint propagation, the same algorithm used by Sudoku solvers.
With several confirmed letters, partially-decoded words narrow to a small set of candidates. If you see _E_ T with confirmed E and T, the candidates are LEFT, BEST, NEXT, KEPT, MELT, etc. Test the candidate by seeing whether the letters it introduces are consistent with every other word they appear in — inconsistencies eliminate the guess.
Fully solved cryptograms from quality publishers will decode to grammatically correct, meaningful text — often a famous quotation. If your decode produces near-English gibberish, you likely have one or two letter swaps wrong. Read the text aloud; your language intuition will often flag errors your analytical eye missed.
Cognitive Benefits
Recreational cryptogram solving is not merely entertaining — it develops a cluster of cognitive skills with genuine practical applications in data analysis, formal reasoning, and language processing.
Probabilistic reasoning. Every step of the solving process involves updating your confidence in hypotheses based on new evidence. When you decide the most frequent cipher letter is probably E, then test that hypothesis by seeing whether the resulting letter assignments produce plausible words, you are performing informal Bayesian inference — the same mathematical framework that underlies machine learning classifiers, medical diagnostic reasoning, and scientific hypothesis testing.
Pattern recognition in noisy data. Short cryptograms are genuinely ambiguous — the statistical signals haven't had enough characters to become reliable. Experienced solvers develop the ability to identify meaningful patterns while resisting the cognitive trap of seeing patterns that aren't there (pareidolia). This calibration between signal detection and false-positive resistance is exactly what data analysts need when exploring noisy datasets.
Constraint propagation. Each confirmed letter immediately constrains all words containing that cipher symbol. The best solvers don't apply substitutions one at a time — they mentally propagate each confirmed letter through every word in the puzzle simultaneously, catching cascading implications instantly. This is the same reasoning style required for formal logic proofs, programming with type constraints, and solving systems of equations.
Vocabulary and morphology. Recognizing that a partially decoded word ending in _-I-N-G is a verb in the progressive aspect, or that _-T-I-O-N is a noun, requires tacit knowledge of English morphology that deepens with practice. Studies of adult cryptogram enthusiasts show measurably faster word-recall speed and better spelling accuracy compared to matched controls — suggesting that the pattern-recognition practice generalizes to language processing more broadly.
Working memory and cognitive flexibility. Holding tentative letter mappings in mind while testing them across multiple words simultaneously exercises working memory in a highly specific, demanding way. The cognitive flexibility required to abandon a promising hypothesis when it produces an impossible word is directly related to the executive-function skills measured by neuropsychological assessments.
Further Reading
Listener Q&A
Related Episodes