Error Free Multi-dimensional Thinking
Recent measurement work in American public opinion often reports strikingly strong low-dimensional structure in survey responses. In ideal-point and related scaling frameworks, a single latent dimension can correctly classify a large share of binary issue responses, with only modest improvements when a second (orthogonal) dimension is added. For example, Jessee reports that a one-dimensional ideal-point model achieves an overall correct classification rate of 79.0% for respondents’ roll-call-style issue items, with a two-dimensional model increasing that rate only to 82.3% (see Jessee 2004). Tausanovitch and Warshaw similarly note that scaling their “super survey” with a one-dimensional model correctly classifies 78.8% of responses, and that moving to two dimensions increases the rate to 80.2%—an improvement of only 1.4 percentage points (see Tausanovitch and Warshaw 2012; see also Sood and Iyengar 2014).
This appears to conflict with a long-standing view in American politics that points in almost the opposite direction: most people’s policy preferences are weakly structured. In fact, on many issues, many people may not have real preferences at all (“non-preferences”). The evidence most often cited for this view comes from Converse: correlations between preferences across measurement waves spanning two years are modest (r ≈ .4–.5), while within-wave, cross-issue correlations are lower still (r ≈ .2).
What explains this double disagreement—about the authenticity of preferences and about the structure (or “constraint”) of preferences?
Evidence For Authenticity of Preferences
Instability in repeated survey responses is compatible with at least three mechanisms: genuine attitude change, lack of a well-formed preference, and measurement error. The empirical challenge is that ordinary panel correlations bundle all three.
Short-interval designs were an early attempt to reduce the “true change” component by shrinking the time between measurements. Brown’s experimental work, for instance, reports substantial test–retest reliability over spans of weeks, with reliability coefficients often in the neighborhood of the .6–.9 range (and sometimes higher) across conditions and intervals (see Brown 1970). This kind of evidence pushes against the strongest “pure noise” reading: many respondents can reproduce consistent directional judgments over short horizons.
But short intervals introduce a different inferential worry: stability may partly reflect memory, consistency motives, or panel conditioning—respondents recalling what they said last time or inferring what they “should” say now. So short-interval stability is evidence that responses are not random, but it does not by itself establish that respondents are expressing deeply held well thought preferences.
This ambiguity suggests that “short interval” designs are informative but not decisive unless paired with additional diagnostics that can separate memory from stability. A straightforward approach is to introduce small design perturbations at retest, e.g., rewording items, reversing scale direction, or changing item context, so that simple recall is less helpful while the substantive preference should still map to the same side or ordering. Certainty may be another way around the puzzle.
Over time measurement-error models (per item) offer a different route: treat observed responses as noisy indicators of a latent attitude and attempt to recover the latent stability (see Achen, 1975). This strategy can be informative, but it comes with assumptions about the structure of errors over time.
Relatedly, a recent line of argument (associated with Dean Lacy, subject to further validation) suggests that much apparent instability is driven by a small number of highly implausible transitions between waves, such as moving from one end of a scale to the other. (This conclusion is the reverse of Converse’s interpretation based on a Markov model. Converse argued that, aside from a small subset of consistent respondents, much of the remaining variation was essentially noise.)
Other evidence for “authentic” preferences comes from measurement-error models that assume an underlying latent trait (or traits) and pool responses across disparate policy items (see, for instance, Ansolabehere, Rodden, and Snyder 2006; also Tausanovitch and Warshaw 2012 – see above). But this approach builds in a major assumption: that an underlying trait exists and meaningfully organizes these preferences. It is perfectly reasonable to test whether preferences are correlated; it is more contentious to assume they are structured by an unobserved mental construct (see here).
Evidence For Constraint
The new scaling results are genuinely intriguing. One possibility is that the discrepancy between recent results and conventional wisdom reflects increased structuration of preferences over time. However, some other research suggests that constraint has not increased over time (see Baldassari and Gelman 2008). So, two cautions are necessary before interpreting scaling results as a refutation of “weak constraint.”
Modern scaling results use dichotomous items. And dichotomization can reduce some kinds of measurement error. (Of course, there are less blunt ways to reduce measurement error, such as using multiple items to measure preferences on a single policy domain.) The key lesson is that adjustments for measurement error and the measurement of constraint should be kept conceptually separate.
Second, dimensionality-reduction methods are sensitive to the pool of items. If most items concern economic issues (as in Tausanovitch and Warshaw 2013), the first principal component will naturally reflect that dimension. And because most predictive gains come from correctly predicting the majority of items, overall “percent correctly predicted” may be a poor diagnostic for whether a second dimension, say, cultural issues, matters.
Cross-validation across large, pre-specified item groups (large enough to overcome idiosyncratic error) would be a useful strategy. Moreover, overall predictive gains can miss meaningful subgroup heterogeneity: different groups may have different preference structures (for example, groups that are more socially conservative but economically liberal).
Interpretation of Constraint
Even if we grant that responses are not pure noise and that low-dimensional summaries predict well, what is generating that structure? Is it belief-system or political alignment?
- Belief-system: People organize politics around stable meta-principles, e.g., commitments to free markets, egalitarianism, moral traditionalism, zero-sum worldviews, or other relatively general value frameworks that yield consistent positions across many policies.
- Alignment structure: Politics supplies coordination devices, e.g., party labels, coalition cues, elite messaging (supported by affective identities which lead to selective exposure and motivated recall and skepticism), that reliably sort people into bundles of positions, even if they do not internally represent those bundles as a single ideological map.
Low-dimensional prediction can emerge under either story. In the alignment story, a latent “dimension” is real and useful, but it is best interpreted as an aggregate alignment signal rather than a window into a unidimensional “policy ideology” inside each respondent. This interpretation is especially plausible in a world where elites work to bundle issues into a small number of conflict lines. Because a multi-dimensional issue space creates cross-cutting coalitions and unstable majorities, bundling helps parties and movements simplify choice, mobilize supporters, and make elections legible.