What is the index of coincidence used for?

The index of coincidence measures how often letters repeat in ciphertext. English monoalphabetic text has IC near 0.066, while polyalphabetic ciphertext flattens toward 0.038, helping distinguish cipher families.

How does Kasiski examination break Vigenère?

Repeated plaintext fragments encrypted with the same key alignment produce repeated ciphertext. Measuring distances between repeats reveals likely keyword lengths, after which each column can be solved as Caesar.

Why does frequency analysis work on substitution ciphers?

Substitution preserves language statistics: common English letters like E and T remain common, only relabeled. Comparing ciphertext frequencies to English reveals likely letter mappings.

Interactive Laboratory Laboratory 02 — Cryptanalysis

Cryptanalysis Laboratory

Learn how real cryptanalysts break classical ciphers using frequency analysis, pattern recognition, and statistical attacks. Every tool on this page runs in your browser so you can experiment safely with ciphertext samples.

Start Analysis Kasiski Examination

Tools in this lab

IC coincidence test Kasiski key length N-grams pattern match ID cipher guess

Prerequisite intuition

Classical ciphers hide meaning but rarely hide structure. When ciphertext still behaves statistically like language, an attacker gains leverage long before guessing the key by hand.

Frequency Analysis Tool

Paste ciphertext and watch letter frequencies emerge. The animated histogram compares your text against standard English, highlights likely substitutions, and lists the top ten symbols by count.

Ciphertext

Tall recurring bars are your first attack surface. In monoalphabetic ciphers, ciphertext rank order should mirror English ETAOIN even when every letter is renamed.

Letter distribution: ciphertext vs English

Ciphertext English reference Highlighted peaks

How to read this

Arab scholars in the ninth century noticed that Arabic letters are unevenly distributed in normal writing. When European cryptanalysts applied the same counting discipline to Latin alphabets, substitution ciphers lost their main defense: obscurity. Frequency analysis does not magically reveal the key. It narrows the search. You compare ranks, test digraphs like TH and HE, and refine mappings until words appear. For a deeper Caesar-focused walkthrough, see the Frequency Analysis Lab.

Index of Coincidence

The index of coincidence (IC) measures how often letters repeat within a text. It is one of the fastest ways to distinguish monoalphabetic ciphertext from polyalphabetic ciphertext.

Ciphertext for IC calculation

Formula

IC = Σ n(n−1) / N(N−1)

For each letter, count how many times it appears (n), sum n(n−1) across the alphabet, and divide by N(N−1) where N is total letters.

Result

0.0000 from 0 letters

~0.066 — likely monoalphabetic (Caesar, substitution)
~0.038 — likely polyalphabetic (Vigenère)
~0.033 — near random / very short sample

William Friedman formalized IC for American codebreaking workflows in the 1920s. Cryptanalysts still sweep candidate Vigenère keyword lengths and compute IC on each column: when the length is correct, every column behaves like Caesar and IC spikes toward English.

Kasiski Examination

Repeated plaintext under the same keyword alignment produces repeated ciphertext fragments. Measuring distances between repeats exposes likely keyword lengths for Vigenère-style ciphers.

Ciphertext for Kasiski analysis

Repeated fragments

Likely key lengths

Historical note

In 1863 Friedrich Kasiski published a method for attacking repeating-key ciphers by cataloguing repeated ciphertext sequences and factoring the spacings between them. Charles Babbage had discovered the same idea earlier but never published it. Once keyword length is known, split the ciphertext into columns; each column is a Caesar cipher solvable by frequency analysis. Combine this tool with the Vigenère cracker guide for a full attack workflow.

N-Gram Analysis

Bigrams and trigrams expose language skeleton that single-letter counts miss. Compare ciphertext token rankings with common English fragments.

Ciphertext for n-gram analysis

Type	Ciphertext	Count	English reference

English favors TH, HE, IN at the digraph level and THE, ING, AND at the trigram level. When ciphertext bigrams look nothing like English rankings, suspect polyalphabetic encryption or transposition. When rankings are distorted but still clustered, monoalphabetic substitution is likely.

Cipher Identification Assistant

Experimental educational classifier. It combines IC, Caesar shift scoring, and rail-fence trials to suggest which classical family fits your ciphertext.

Ciphertext to classify

Treat confidence scores as teaching aids. Short messages, jargon, or mixed languages can fool heuristics. Always verify with domain knowledge and additional tests from the cipher identification guide.

Why Classical Ciphers Fail

A practical cryptanalysis curriculum in plain language — the statistical story behind every tool above.

Collect ciphertext

Longer samples stabilize frequencies and IC.

Measure statistics

IC, histograms, n-grams, repeats.

Hypothesize cipher

Monoalphabetic, polyalphabetic, or transposition.

Recover key / plaintext

Brute Caesar, map substitution, split Vigenère columns.

Frequency analysis in depth

Human languages are redundant. English uses E roughly eight percent of the time and Z far less than one percent. Any cipher that applies one fixed substitution alphabet across the entire message preserves those proportions. Cryptanalysts therefore start with histograms, not hunches. They ask: which ciphertext symbol appears about as often as E should? Which pairs repeat like LL or EE? Once a few anchors land, partial words constrain the rest of the mapping.

The technique scales from classroom Caesar exercises to historical battlefield traffic, provided enough ciphertext exists. It fails when the cipher changes substitution too quickly (Enigma), when the plaintext is not natural language, or when the sample is too short for stable counts.

Kasiski and Friedman tests

Vigenère was once called “le chiffre indéchiffrable” because a repeating keyword applies different Caesar shifts to different positions. Single-letter frequency looks flat, discouraging monoalphabetic attacks. Kasiski examination recovers structure by hunting repeats. If THE appears twice in plaintext and both encryptions align with the same key phase, ciphertext shows matching fragments separated by a multiple of keyword length.

Friedman’s index of coincidence automates the same intuition: split text into columns for each guessed length and measure IC per column. Wrong lengths look random; the correct length makes every column spike toward English. Together, Kasiski and IC reduce a daunting keyword search to a manageable column-solving problem.

Statistical cryptanalysis mindset

Classical cryptanalysis is hypothesis testing under uncertainty. You rarely prove a cipher type with one metric. You accumulate evidence: IC suggests polyalphabetic, Kasiski suggests length five, column three has a peak matching T, a crib word like AND appears under that mapping. Each step eliminates inconsistent stories.

Modern cryptography deliberately destroys these trails through diffusion and confusion — every output bit depends on many input bits, and local statistics vanish. Studying classical breaks is therefore not nostalgia. It teaches what “no statistical leakage” actually means and why AES designers obsess over avalanche effects.

From lab to practice

Use this page alongside hands-on puzzles on Cipher Challenges, the focused Frequency Analysis Lab, and the main Cipher Portal for encryption checks. When you can explain why a histogram breaks Caesar but not Enigma, you understand the security upgrade rotor machines attempted — and why operator mistakes, cribs, and mechanized search still defeated them in wartime conditions.

Responsible learning means practicing on educational samples or your own exercises. DecodeCipher teaches historical techniques to clarify cryptography engineering, not to attack real private communications.

Frequently Asked Questions

What IC value indicates monoalphabetic text?

English plaintext typically yields IC ≈ 0.066. Monoalphabetic ciphertext preserves that value. Samples under thirty letters fluctuate; prefer longer texts when possible.

Can Kasiski find the Vigenère keyword itself?

No — it estimates length. After splitting columns, use frequency analysis per column to recover each key letter.

How is this different from the Frequency Analysis Lab?

The Frequency Lab focuses on monoalphabetic leakage with Caesar brute force. This Cryptanalysis Lab adds IC, Kasiski, n-grams, and cipher identification for a broader attack toolkit.

Does analysis run on a server?

All tools on this page execute locally in your browser. Nothing you paste is sent for analysis.

Continue Exploring the Museum

Frequency Analysis Laboratory Deep dive into monoalphabetic leakage with live Caesar brute force Challenge Hall Practice puzzles with hints, validation, and achievements Vigenère Cracker Guide Column splitting workflow after Kasiski length recovery Enigma Room Interactive machine simulator with live rotor tracing Classical Cipher Gallery Explore ancient cipher methods from Caesar to Vigenère Learning Gallery How encryption, decryption, and key recovery work

Educational cryptanalysis only. Tools run client-side; no ciphertext is stored.