By Gaurav in Economics — 15 Mar 2026

The Wrong Pairameter

In a trust game, the first mover has an endowment and chooses how much to send to the second mover. Whatever is sent gets multiplied. The second mover then decides what fraction of the multiplied amount to return. Sending more creates more total surplus. Returning more determines how that surplus gets split.

If you are evaluating an intervention in this setting, the welfare-relevant questions are about regimes. What happens to total surplus and its distribution when both sides are treated? When only the first mover is treated? When only the second mover is treated? Each regime is a deployable scenario. Maybe a policy can reach both sides of a market. Maybe it can only reach one. The surplus and distributional consequences differ across these cases, and the differences matter for deciding what to do.

The standard analysis does not recover these objects. It compares treated participants to control participants, conditioning only on a player's own treatment status. For a pair-level outcome, this averages over the partner's treatment status, mixing together regimes that have different welfare implications. A treated first mover paired with a treated second mover does well: they sent more, and the partner returned a higher fraction. A treated first mover paired with a control second mover does poorly: they sent more, but the partner returned the same low fraction as before. The naive comparison pools these together. The resulting estimand does not correspond to any regime someone would actually deploy.

The game, formally

Let the first mover's endowment be $E$, the amount sent be $s$, and the multiplier be $k$. The second mover returns a share $\rho$ of the multiplied amount. Payoffs:

$$\pi_1 = E - s + ks\rho, \qquad \pi_2 = ks(1 - \rho).$$

Total surplus is $E + (k-1)s$, which depends only on $s$. How the surplus gets divided depends only on $\rho$. The welfare consequences of an intervention decompose cleanly: any effect on surplus flows through how much the first mover sends, and any effect on distribution flows through what fraction the second mover returns. These are mechanically separate channels.

Four ordered pair types arise from the treatment assignments: control–control, treated–control, control–treated, and treated–treated. Each corresponds to a different regime with potentially different surplus and distributional consequences.

What the naive estimator actually targets

Let $Y$ be a pair-level outcome and $Z_1$ the first mover's treatment status. The standard contrast is:

$$\Delta^{\text{naive}} = E[Y \mid Z_1 = 1] - E[Y \mid Z_1 = 0].$$

Since $Y$ depends on both players, conditioning only on $Z_1$ averages over the second mover's status. If treatment is assigned independently with probability $p$:

$$\Delta^{\text{naive}} = p(\mu_{11}^Y - \mu_{01}^Y) + (1 - p)(\mu_{10}^Y - \mu_{00}^Y),$$

where $\mu_{zw}^Y = E[Y(z,w)]$ is the expected outcome for ordered pair type $(z, w)$. This is a weighted mixture of two different contrasts, not any single dyadic comparison: not $\mu_{11}^Y - \mu_{00}^Y$, not $\mu_{10}^Y - \mu_{00}^Y$, not $\mu_{01}^Y - \mu_{00}^Y$. The weights are determined by the experimental randomization probability, which has no policy meaning. The naive estimator does not give a noisy answer to the right question. It gives a precise answer to the wrong one.

Simulation

To see how large the gap can be, I simulated a trust game with endowment 6 and a tripling multiplier, where the exclusion restriction holds perfectly: treatment affects only each player's own decision rule, never the partner's. Control first movers send 3.6. Treated first movers send 3.9. Control second movers return 33%. Treated second movers return 40%. Each study has 100 participants per role, with independent treatment assignment at probability 0.5 and small idiosyncratic noise. I ran 1,000 studies.

Theoretical cell means

Ordered pair	Send	Return share	FM payoff	SM payoff	Total surplus
C–C	3.6	0.33	5.964	7.236	13.2
C–T	3.6	0.40	6.720	6.480	13.2
T–C	3.9	0.33	5.961	7.839	13.8
T–T	3.9	0.40	6.780	7.020	13.8

The regime-level welfare consequences are visible in the table. Total surplus rises only when the first mover is treated (13.8 vs. 13.2), regardless of the second mover's status. The second mover's treatment has no effect on surplus at all. It changes only the split: treating the second mover shifts 0.755 from the second mover to the first mover.

A policy that reaches only first movers creates more surplus but leaves the first mover's payoff essentially unchanged ($-0.003$). The extra surplus flows entirely to the second mover. A policy that reaches only second movers creates no new surplus but raises the first mover's payoff by 0.755 through redistribution. A policy that reaches both sides raises the first mover's payoff by 0.814: new surplus gets created and the first mover actually captures some of it.

These are the objects someone designing policy would want to compare.

Monte Carlo results

Quantity	MC mean
Naive own-treatment difference in FM payoff	0.027
Synthetic pair: T–T minus C–C	0.814
Synthetic pair: T–C minus C–C	−0.003
Synthetic pair: C–T minus C–C	0.755

The naive estimator says treatment barely matters for the first mover. The regime-level cell means say it matters a great deal, but the gains come almost entirely through the second mover's channel. A researcher reporting only the naive estimate would miss both the size and the source of the welfare effect.

Two kinds of estimand

Role-specific effects ask whether treatment changes what each individual does:

$$\tau_1 = E[A(1) - A(0)], \qquad \tau_2 = E[B(1) - B(0)].$$

These tell you whether behavior changed. They do not tell you about welfare. The standard design handles them well: randomize individually, collect decisions before matching, form pairs afterward.

Dyadic cell means ask what happens to the interaction under each regime:

$$\mu_{zw}^Y = E[Y(z,w)], \qquad z, w \in {0,1}.$$

These are the welfare objects. They tell you how much surplus each regime creates and how it gets distributed. They correspond to deployable scenarios. The standard design is fine for role-specific effects. It is misaligned with dyadic cell means, and researchers routinely talk as though they care about the latter while estimating the former.

Estimation

If matching is random within blocks and decisions are collected before matching, one can estimate expected cell means by synthetic pairing. Let $\mathcal{I}^{(1)}{sz}$ and $\mathcal{I}^{(2)}{sw}$ be the first and second movers with treatment states $z$ and $w$ in session $s$. The estimator is:

$$\widehat{\mu}^{,\text{exp}}{zw} = \frac{\sum_s \sum{i \in \mathcal{I}^{(1)}{sz}} \sum{j \in \mathcal{I}^{(2)}{sw}} g(A_i, B_j)}{\sum_s, n^{(1)}{sz}, n^{(2)}_{sw}}.$$

Synthetic pairing reuses individuals across many pseudo-pairs, so treating each synthetic pair as independent would understate uncertainty. Randomization inference, participant-level bootstrap, or session-level bootstrap are sensible alternatives.

Design implications

If the question is about individual behavior, individual-level randomization with post-decision matching is fine. If the question is about welfare, the study should be organized around regimes from the start: define dyadic cell means as primary estimands, preserve role labels and match structure, retain treatment status for both sides along with full response schedules, and analyze regimes directly rather than backing into them through own-treatment regressions.