By Gaurav in Social Science — 01 Nov 2006

Playing with Numbers

Photo by Anna Mysłowska-Kiczek / Unsplash

Statistics help only when we can (a) identify the phenomenon of interest, (b) measure it on a sensible scale, and (c) interpret and communicate the result clearly. When any link breaks—ambiguous concepts, shaky proxies, or sloppy presentation—numbers create more heat than light.

Many institutions now publish annual rankings on corruption, democracy, and press freedom. These lists are catnip for media: they feel “objective” and are easy to headline. Yet the messy parts—construct definitions, data collection, modeling choices, and caveats—often vanish in translation. What survives are numbers repurposed to fit whatever story needs telling.

How rankings go wrong

Vague concepts, improvised scales.
Quantifying fuzzy constructs requires proxies; every proxy is a theory.
Example: Suppose “democracy” is proxied by elite turnover. Country A keeps incumbents (low turnover) but has competitive elections; Country B churns leaders through palace coups (high turnover). A turnover-based index could rank B “more democratic,” contradicting the construct we meant to capture.
Relative ≠ absolute.
Ranks only order units; they don’t say how good anyone is in level terms.
Example: A press-freedom index scores Country X at 70/100 in 2022 and 72/100 in 2025 (a real improvement), but its rank falls from #40 → #60 because many peers improved more. Interpreting the rank drop as decline confuses relative position with absolute level.
Composite scales hide weighting choices.
Indices blend indicators; weights encode value judgments, often implicitly.
Example: Two countries tie on legal protections but differ on violence against reporters (V) and ownership concentration (O). If the index weights V:O = 3:1, Country C (V=1, O=3) beats D (V=2, O=1). Flip weights to 1:3 and D beats C. Headlines change, reality doesn’t.
Sampling that isn’t generalizable.
Expert surveys can be informative yet unrepresentative and uneven across places.
Example: An index polls 25 urban experts per country. In Country E, most media is rural; intimidation there goes unseen by the panel. E looks “fine” on the index, not because it is, but because the sampling frame missed the problem.
Missing data and under-reporting.
Sparse data can masquerade as good outcomes unless exposure and reporting are modeled.
Example: Country F records 1 assault on journalists; Country G records 0. If F has 50,000 journalists and G has 500, the per-capita rate is 0.02% vs. 0%—but if G’s incident reporting is weak, “0” may reflect silence, not safety. Treating “no report” as “no problem” biases ranks.

A recurring case: press-freedom lists

Press-freedom rankings mix inputs (laws, harassment, killings, ownership, censorship) into a score and then a rank. Useful—but easy to misread.

Rank drift as danger.
Example: Country H’s score rises 68 → 70, rank falls #55 → #62. Headlines say “press freedom plummets.” The level improved; others just improved more.
Counts without denominators.
Example: Country I logs 30 harassment cases with 100k journalists (0.03%). Country J logs 4 cases with 1k journalists (0.4%). Raw counts make I look worse; rates reverse the ordering.
Exposure mis-specification.
Example: One internet-blocking incident in a country where 90% are online exposes millions; the same in a country with 5% online affects far fewer. Scoring both events identically ignores affected share.

These are design problems, not arguments against measurement.

How rankings go wrong

A recurring case: press-freedom lists

Subscribe to Gojiberries