Episode 28 — History & Mathematics

Cryptograms and Code-Breaking History

From Julius Caesar's shifting alphabet to Alan Turing's electromechanical Bombe — the remarkable 2,000-year journey of secret writing, its eventual defeat, and what it teaches us about pattern recognition and probabilistic reasoning.

Audio coming soon — read the full episode below
850 CE Al-Kindi's breakthrough
10²³ Enigma configurations
12.7% Letter E frequency

What Is a Cryptogram?

A cryptogram is one of the oldest puzzles in recorded history and one of the most intellectually satisfying — because it puts you in the shoes of a real spy, a wartime intelligence analyst, or a Renaissance scholar deciphering a rival monarch's dispatches.

In its purest recreational form, a cryptogram presents a famous quote or phrase in which every letter of the original has been replaced by a different letter according to a substitution key. The key is consistent throughout: if the letter E has been replaced by Q, then every E in the original text appears as Q in the cryptogram. The solver's job is to reverse-engineer that key using logic, language knowledge, and statistical reasoning.

The technical name for this technique is a monoalphabetic substitution cipher — "monoalphabetic" because a single fixed alphabet performs all the substitutions, and "substitution" because each plaintext character substitutes for exactly one ciphertext character. There are 26! (approximately 4 × 10²⁶) possible monoalphabetic substitution alphabets for English, which sounds astronomical — but frequency analysis makes most of them solvable in minutes by a practiced human solver.

Understanding the difference between ciphers and codes clarifies a common confusion. A cipher transforms individual characters or small groups of characters. A code replaces whole words, phrases, or semantic units with other symbols — think of naval flag signals, where different flag combinations represent complete messages like "I require a pilot" or "my vessel is on fire." Codes require codebooks; ciphers follow algorithmic rules. Recreational cryptograms are ciphers.

The Caesar Cipher: Where It All Began

The Roman general and statesman Julius Caesar used a simple substitution cipher for private communications — shifting every letter of the alphabet forward by three positions. A becomes D, B becomes E, C becomes F, and so on. The message MEET ME AT DAWN becomes PHHW PH DW GDZQ. Suetonius documented this practice around 121 CE, making the Caesar cipher the earliest named cipher in recorded history.

Plaintext
MEET   ME   AT   DAWN
+3 Shift
     
Ciphertext
PHHW   PH   DW   GDZQ

Caesar's ROT-3 cipher shifts every letter three positions forward. Only 26 possible variants exist — a five-year-old with enough patience could brute-force every one.

The Caesar cipher is trivially breakable today: there are only 25 other possible shift values (ROT-1 through ROT-25), so brute force works in seconds. The cipher's historical value was that most soldiers Caesar corresponded with were illiterate — secrecy came from the military's near-monopoly on literacy, not from cryptographic strength. Once adversaries began applying systematic analysis, simple shift ciphers fell immediately.

Frequency Analysis: Language as a Statistical Fingerprint

Around 850 CE, the Arab polymath Abu Yusuf Yaqub ibn Ishaq Al-Kindi wrote the first known systematic treatise on breaking substitution ciphers. His insight was elegant: language is not random. Every natural language leaves a statistical fingerprint in its letter frequencies.

In English, the distribution of letters in typical prose follows a remarkably consistent pattern. The most frequent letters — ETAOIN SHRDLU — appear so consistently across diverse texts that experienced cryptogram solvers have committed this sequence to memory as a mnemonic. Here are the top ten most frequent English letters with their approximate frequencies:

Letter Frequency Visual Solver's hint
E 12.7%
Most common — always the first guess for the top cipher letter
T 9.1%
Common in THE, THAT, TO, THIS, THERE
A 8.2%
Single-letter word with I means one must be A
O 7.5%
Often follows T; common in TO, OF, ON, OR
I 7.0%
The other single-letter word candidate
N 6.7%
Common at word ends: -ING, -ION, -AN, -EN, -IN
S 6.3%
Common word-starter and plural/possessive marker
H 6.1%
TH bigram is the most common digraph in English
R 6.0%
Common in ER, RE, AR, -ING, -ER endings
D 4.3%
Common word-ender: -ED past tense

Frequency analysis works as a cascading inference engine. You start with probabilities — the most frequent cipher symbol is probably E — and each confirmed mapping constrains all subsequent guesses. Once you know that cipher-Q maps to plaintext-E, every word containing Q reveals partial structure. A three-letter word with Q in the middle is likely T_E — THE or TOE or TIE. Each confirmation narrows the search space exponentially.

The technique has direct analogues in data science and machine learning. Any time you characterize a dataset by its statistical distribution and compare that fingerprint against a library of known patterns — classifying spam emails, identifying authorship of anonymous texts, detecting anomalies in network traffic — you're applying the core insight Al-Kindi formalized 1,200 years ago.

The Vigenère Cipher: Defeating Frequency Analysis for 300 Years

Sixteenth-century cryptographers recognized that frequency analysis made any monoalphabetic cipher vulnerable. Their solution was the polyalphabetic cipher — using multiple substitution alphabets in rotation to flatten the frequency distribution. The Vigenère cipher, attributed to Blaise de Vigenère though developed by earlier authors, became the dominant polyalphabetic method.

Vigenère Encryption Example — Key: "KEY"
Plaintext A T T A C K A T D A W N
Key (repeat) K E Y K E Y K E Y K E Y
Ciphertext K X R K G I K X B K A L

Each plaintext letter is shifted by the numeric position of the corresponding key letter (K=10, E=4, Y=24). The same plaintext letter encodes differently depending on its position: the letter A appears as K, K, and K — but that's because it always aligns with the same key letter. With a longer key, repetitions become rarer, frustrating frequency analysis.

The Vigenère cipher earned the nickname "le chiffre indéchiffrable" (the indecipherable cipher) and maintained its reputation for approximately 300 years — until Charles Babbage cracked it around 1854 (but kept the method secret for wartime purposes) and Friedrich Kasiski independently published the solution in 1863.

The Kasiski examination identifies the key length by finding repeated sequences of three or more characters in the ciphertext — these repetitions arise when the same plaintext segment aligns with the same key segment, which happens at intervals that are multiples of the key length. Once the key length is known, the analyst splits the ciphertext into groups separated by that period and applies frequency analysis to each group separately, restoring Al-Kindi's 9th-century technique to full effectiveness. The key length determines only how many separate frequency analyses are required, not whether frequency analysis works.

Enigma: The Machine That Changed History

The Enigma machine, developed in Germany during the 1920s and adopted by the German military in the 1930s, represented a qualitative leap beyond pencil-and-paper ciphers. Its electromechanical complexity created a cryptographic challenge that human frequency analysis could not directly solve — requiring instead a new kind of thinking: computational constraint satisfaction.

Keyboard
A–Z Input
Plugboard
10 letter pairs
Rotor I
26 positions
Rotor II
26 positions
Rotor III
26 positions
Reflector
Fixed wiring
Lampboard
Ciphertext letter

The crucial innovation was the stepped rotation mechanism: each time an operator pressed a key, the right-hand rotor advanced by one position, periodically causing the middle and left rotors to advance in a gear-like cascade. This meant the substitution alphabet changed with every single keypress — encrypting the same letter A twice in a row would produce two completely different cipher letters. Simple frequency analysis became useless because no single consistent substitution mapping existed across the message.

The plugboard (Steckerbrett) added a further layer of confusion before and after the rotor stage, swapping ten pairs of letters. The combined system offered approximately 10²³ starting configurations — enough that testing them manually at a rate of one per second would take longer than the age of the universe.

Alan Turing and the Bombe

The team at Bletchley Park, led intellectually by mathematician Alan Turing and engineer Gordon Welchman, realized that brute-force enumeration was impossible but constrained search was feasible. The key insight came from Enigma's fundamental limitation: the reflector guaranteed that no letter could encrypt to itself. This single constraint — known as "no letter maps to itself" — allowed analysts to prune enormous fractions of the configuration space instantly.

Combined with cribs — known or guessed plaintext fragments, such as mandatory weather report headers ("WETTER" appearing in a fixed position), stereotyped greetings, or commanders who reliably signed their names — the Bombe could test a single crib hypothesis against all rotor configurations in approximately 20 minutes. A correct crib would produce a consistent solution; incorrect ones would fail the no-self-encryption test rapidly. Bletchley eventually operated over 200 Bombes simultaneously, breaking German naval, air force, and army communications in near-real time during the crucial mid-war period.

Historians estimate that the intelligence obtained at Bletchley Park — codenamed ULTRA — shortened World War II by approximately two to four years. The story of Enigma's defeat demonstrates a principle that cryptographers call Kerckhoffs's principle: a cryptographic system should remain secure even if everything about the system except the key is public knowledge. Enigma failed partly because operators introduced known-plaintext vulnerabilities — the algorithm's theoretical security could not compensate for human procedural weakness.

Two Thousand Years of Secret Writing

~100 BCE
Julius Caesar's ROT-3 Cipher
Suetonius documents Caesar's use of a shift-3 cipher for military correspondence. The simplest named cipher in recorded history; effective only because most adversaries were illiterate.
~850 CE
Al-Kindi Invents Frequency Analysis
Arab polymath Abu Yusuf Al-Kindi writes "A Manuscript on Deciphering Cryptographic Messages" — the first systematic description of statistical cryptanalysis. Monoalphabetic ciphers are now theoretically breakable by anyone with the technique.
1467
Leon Battista Alberti's Cipher Disk
Italian architect Leon Battista Alberti invents the first polyalphabetic cipher device — two concentric rotating disks that can use multiple substitution alphabets within a single message. The direct ancestor of the Vigenère cipher.
1553
Vigenère Cipher Introduced
Giovan Battista Bellaso publishes the repeating-key polyalphabetic cipher later misattributed to Blaise de Vigenère. Earns the reputation as "le chiffre indéchiffrable" — the unbreakable cipher — for the next three centuries.
1863
Kasiski's Attack Breaks Vigenère
Friedrich Kasiski publishes a method for determining the key length of any polyalphabetic cipher by analyzing repeated sequences. Al-Kindi's frequency analysis, applied period-by-period, finishes the job. The era of secure hand ciphers effectively ends.
1923
Enigma Machine Patented
German engineer Arthur Scherbius patents the Enigma electromechanical cipher machine. Initially marketed commercially; adopted by the German military in the late 1920s after modifications added the plugboard.
1940
Turing's Bombe Becomes Operational
Alan Turing and Gordon Welchman's electromechanical Bombe begins breaking German naval Enigma traffic at Bletchley Park. By 1945 over 200 Bombes are running 24 hours a day. The intelligence (ULTRA) contributes decisively to Allied victory.
1977
DES and the Public-Key Revolution
The US government adopts the Data Encryption Standard (DES). Simultaneously, Diffie and Hellman publish public-key cryptography — a mathematical revolution that separates encryption from secure key exchange, enabling modern internet security.
Today
Cryptograms as Recreational Puzzles
The American Cryptogram Association publishes bimonthly issues of The Cryptogram. Daily newspapers carry aristocrat puzzles. Dozens of apps offer timed cryptogram challenges. The ancient art of frequency analysis survives as pure mental sport.

Key Cryptanalytic Techniques

Every historical cipher system eventually succumbed to a specific analytical technique. Understanding these techniques illuminates both the history of cryptography and the underlying mathematical structures of language.

Frequency Analysis
Al-Kindi, ~850 CE
Count symbol occurrences in ciphertext; map to known language frequency distribution (ETAOIN SHRDLU for English). The foundational attack against all monoalphabetic substitution ciphers. Still the first step in every recreational cryptogram solve.
Index of Coincidence
William Friedman, 1920
Measures how "clumped" a text's letter distribution is. Natural English has IoC ≈ 0.065; random text has IoC ≈ 0.038. Polyalphabetic ciphers produce intermediate values whose deviation from 0.038 reveals the key length — a more precise alternative to Kasiski's examination.
Kasiski Examination
Friedrich Kasiski, 1863
Finds the key length of a polyalphabetic cipher by locating repeated trigrams (3+ character sequences) and computing the GCD of the distances between them. The key length divides all or most of these distances. Once key length is known, standard frequency analysis applies to each sub-cipher.
Known-Plaintext Attack
Bletchley Park, 1940s
Uses a known or guessed fragment of plaintext (a "crib") to immediately test and discard cipher configurations. Turing's Bombe used cribs (expected weather report headers, standardized message formats) to guide efficient constraint-satisfaction search through Enigma's enormous configuration space.
Bigram & Trigram Analysis
Classical, formalized 19th c.
Beyond single-letter frequencies, common two-letter sequences (TH=3.5%, HE=3.1%, IN=2.4%, ER=2.0%, AN=2.0%) and three-letter sequences (THE=1.8%, AND=0.7%, ING=0.7%) provide additional constraints that accelerate decipherment even for short messages where single-letter frequencies haven't stabilized.
Word Pattern Matching
Recreational cryptanalysis
Single-letter words must be A or I. Three-letter repeated-pattern words (XYX) suggest few candidates (DAD, MOM, POP, EYE, AHA, WOW). Common three-letter words with known letters (T_E = THE, ARE, ATE, etc.) narrow possibilities rapidly. Solvers use pattern dictionaries for efficiency.

How to Solve a Cryptogram: Step-by-Step

Experienced cryptogram solvers have internalized this process until it feels intuitive — but it rests on a systematic logical framework that any learner can adopt explicitly and then gradually internalize with practice.

1

Count and rank cipher letter frequencies

Tally every cipher letter and rank them from most to least frequent. Your most-frequent cipher letter is probably E. For a message of 120+ characters, the top five cipher letters probably map to E, T, A, O, I in some order — though the specific ranking varies by text.

2

Identify short words first

A single-letter word is A or I. A common two-letter word is IN, IT, IS, AN, OF, AT, TO, AS, BE, BY, DO, GO, HE, ME, MY, OR, UP, US, WE. Three-letter THE is far the most common three-letter word in English. Map these patterns against your frequency counts.

3

Look for word endings and common suffixes

The suffix -ING appears on virtually every verb base. -TION, -NESS, -MENT, -MENT, -LESS, -ABLE, -ED are common. If you've confirmed letters for I, N, G from earlier steps, look for three-letter word endings that match to find verb bases — this cascades into confirming more letters rapidly.

4

Apply confirmed letters across the entire ciphertext

Every time you confirm a cipher-to-plaintext mapping, mark it everywhere in the puzzle. A confirmed letter might immediately resolve a word you weren't yet examining. Carry substitutions propagate across the entire message simultaneously — this is constraint propagation, the same algorithm used by Sudoku solvers.

5

Guess high-frequency words and test consistency

With several confirmed letters, partially-decoded words narrow to a small set of candidates. If you see _E_ T with confirmed E and T, the candidates are LEFT, BEST, NEXT, KEPT, MELT, etc. Test the candidate by seeing whether the letters it introduces are consistent with every other word they appear in — inconsistencies eliminate the guess.

6

Read the solved text contextually to confirm

Fully solved cryptograms from quality publishers will decode to grammatically correct, meaningful text — often a famous quotation. If your decode produces near-English gibberish, you likely have one or two letter swaps wrong. Read the text aloud; your language intuition will often flag errors your analytical eye missed.

What Cryptogram Solving Teaches Your Brain

Recreational cryptogram solving is not merely entertaining — it develops a cluster of cognitive skills with genuine practical applications in data analysis, formal reasoning, and language processing.

Probabilistic reasoning. Every step of the solving process involves updating your confidence in hypotheses based on new evidence. When you decide the most frequent cipher letter is probably E, then test that hypothesis by seeing whether the resulting letter assignments produce plausible words, you are performing informal Bayesian inference — the same mathematical framework that underlies machine learning classifiers, medical diagnostic reasoning, and scientific hypothesis testing.

Pattern recognition in noisy data. Short cryptograms are genuinely ambiguous — the statistical signals haven't had enough characters to become reliable. Experienced solvers develop the ability to identify meaningful patterns while resisting the cognitive trap of seeing patterns that aren't there (pareidolia). This calibration between signal detection and false-positive resistance is exactly what data analysts need when exploring noisy datasets.

Constraint propagation. Each confirmed letter immediately constrains all words containing that cipher symbol. The best solvers don't apply substitutions one at a time — they mentally propagate each confirmed letter through every word in the puzzle simultaneously, catching cascading implications instantly. This is the same reasoning style required for formal logic proofs, programming with type constraints, and solving systems of equations.

Vocabulary and morphology. Recognizing that a partially decoded word ending in _-I-N-G is a verb in the progressive aspect, or that _-T-I-O-N is a noun, requires tacit knowledge of English morphology that deepens with practice. Studies of adult cryptogram enthusiasts show measurably faster word-recall speed and better spelling accuracy compared to matched controls — suggesting that the pattern-recognition practice generalizes to language processing more broadly.

Working memory and cognitive flexibility. Holding tentative letter mappings in mind while testing them across multiple words simultaneously exercises working memory in a highly specific, demanding way. The cognitive flexibility required to abandon a promising hypothesis when it produces an impossible word is directly related to the executive-function skills measured by neuropsychological assessments.

Explore Cryptogram History and Practice

Your Questions Answered

What is a cryptogram and how does it differ from a code?
A cryptogram is a puzzle where every letter of the original message has been substituted with a different letter following a consistent rule — each letter maps to exactly one other letter throughout the message. A code, by contrast, replaces whole words or phrases with code-words or numbers and requires a codebook to decipher. The distinction matters because the attack strategies differ: frequency analysis works powerfully against cryptograms (monoalphabetic substitution) but is less effective against codebooks where whole words are the unit of replacement. Recreational newspaper cryptograms are always ciphers, not codes — meaning every solver is unknowingly practicing Al-Kindi's 9th-century frequency analysis technique.
How does frequency analysis break a substitution cipher?
Every natural language has a characteristic letter frequency distribution. In English, E appears roughly 12.7% of the time, T about 9.1%, A about 8.2%, and O about 7.5%. When a message is long enough — typically 100 characters or more — the encrypted letter that appears most frequently is almost certainly E, the second-most-frequent is likely T or A, and so on. A solver counts occurrences of each cipher letter, ranks them, then maps them to the expected English frequency table. The first few mappings are educated guesses; confirming them via word patterns creates a cascading chain of deductions. Single-letter words must be A or I. Common two-letter sequences (TH, HE, IN) provide additional anchors. Each confirmed letter constrains every word containing that cipher symbol — constraint propagation that accelerates the solve exponentially. The Vigenère cipher defeated simple frequency analysis by rotating multiple substitution alphabets, but the Kasiski examination (1863) restored frequency analysis by revealing the key period length.
What was the Enigma machine and why was it considered unbreakable?
The Enigma machine was an electromechanical cipher device used by Nazi Germany during World War II. Its innovation was a set of rotating wheels (rotors) that changed the substitution alphabet with every single keypress — the same plaintext letter encrypted differently each time it appeared. With three rotors (26 positions each), a reflector, and a plugboard swapping ten letter pairs, the number of possible starting configurations exceeded 10²³. Brute-force testing at one configuration per second would have taken longer than the age of the universe. What made Enigma vulnerable was human behavior: operators reused message keys, used predictable cribs (weather reports always started with the same format), and made procedural errors. Alan Turing and Gordon Welchman at Bletchley Park exploited Enigma's mathematical property — the reflector guaranteed no letter could ever encrypt to itself — along with known-plaintext cribs, to design the Bombe electromechanical computer that tested thousands of configurations per second using informed constraint satisfaction rather than brute force.
What cognitive skills does solving cryptograms develop?
Cryptogram solving develops a cluster of cognitive skills with genuine practical applications. Probabilistic reasoning: every mapping decision involves updating confidence based on evidence — informal Bayesian inference, the same framework underlying machine learning classifiers and medical diagnosis. Pattern recognition in noisy data: short cryptograms are genuinely ambiguous; experienced solvers learn to identify real signals without seeing false patterns. Constraint propagation: confirmed letters constrain all words containing that cipher symbol simultaneously — the same reasoning style used in formal logic proofs and Sudoku algorithms. Vocabulary and morphology: recognizing suffix patterns (-ING, -TION, -NESS) deepens tacit knowledge of English word structure, with measurable improvements in spelling accuracy documented in adult learners. Working memory: holding tentative substitutions while testing them against multiple words simultaneously exercises precisely the cognitive flexibility measured by neuropsychological working-memory assessments. These skills generalize beyond puzzle-solving to data analysis, programming with type constraints, and any domain requiring inference from incomplete information.
Who was Al-Kindi and why is he important to cryptography history?
Abu Yusuf Yaqub ibn Ishaq Al-Kindi (c. 801–873 CE) was an Arab polymath working in Baghdad during the Islamic Golden Age. Around 850 CE he wrote "A Manuscript on Deciphering Cryptographic Messages" — the earliest known systematic description of frequency analysis as a method for breaking substitution ciphers. He observed that every language has characteristic letter frequencies and that comparing ciphertext symbol counts to the known frequencies of the target language reveals the substitution mapping. This insight predated European cryptography by approximately 600 years. Al-Kindi's technique remained the dominant cipher-breaking method until the Vigenère polyalphabetic cipher (16th century) defeated it by rotating multiple substitution alphabets. Even then, it was only defeated for three centuries — until the Kasiski examination (1863) and the Index of Coincidence (William Friedman, 1920) restored frequency analysis to its primacy by first revealing the cipher period. Modern cryptography moved entirely beyond substitution ciphers, but Al-Kindi's contribution established the principle that mathematical analysis of text statistics is a more powerful tool than any hand cipher can resist.
How do publishers calibrate cryptogram difficulty?
Recreational cryptogram publishers calibrate difficulty along several axes simultaneously. Message length is the most basic: short messages (under 60 characters) resist frequency analysis because patterns haven't stabilized statistically, while long messages (150+ characters) give ample leverage. Quote familiarity provides a different axis: a well-known proverb or Shakespeare quotation gives experienced solvers a crib — they can guess the source and test expected words in likely positions. Letter frequency skew affects difficulty independently of length: a message with unusually even distribution is harder than one whose profile closely matches standard English. Some publishers distinguish aristocrats (word spaces preserved) from patristocrats (spaces removed) — eliminating word boundaries removes the most useful structural cue. Digital apps add time pressure as a third dimension: a 120-character aristocrat with no time limit and optional hints is a learning tool; the same puzzle with a 90-second clock requires internalized pattern recognition. The American Cryptogram Association's formal rating system from Level 1 (very easy) to Level 5 (expert) is calibrated against hundreds of test solves to ensure consistent perceived difficulty across different message styles and source types.

Keep Exploring