The Wrong Pairameter
Researchers often want to know whether an intervention improves how two people interact. In practice, that means looking at encounter-level metrics: trust, reciprocity, payoffs, total surplus. In a trust game, one might care about how much the first mover sends, how much the second mover returns, the payoff to each side, or the total value generated. In an ultimatum game, the relevant quantities are offer size, acceptance, and resulting payoffs. These are not all the same thing. Some are actions taken by individuals. Others are properties of the encounter itself.
That distinction matters because experiments are often designed and analyzed at the individual level even when the substantive claim is dyadic. A common design randomizes treatment person by person, collects decisions before matching, and forms pairs only afterward. This is well suited to estimating how treatment changes a person's own decision rule. But it is not aligned with a pair-level estimand. Once the question becomes "what happens when a treated first mover meets a treated second mover," the natural object is no longer an individual treatment effect. It is an ordered dyadic contrast.
A motivating example
Suppose the intervention is a short message telling participants that people on the other side are often more fair and cooperative than they assume. A researcher might hope this improves the interaction. But what would that mean?
In a trust game, it could mean at least four different things: first movers send more (trust), second movers return more (reciprocity), first movers end up with higher payoffs (trust rewarded), or the interaction creates more total surplus (efficiency). These need not move together. If the message makes first movers more trusting but leaves second movers unchanged, send amounts may rise while first-mover payoffs fall. If the message instead makes second movers more reciprocal, even untreated first movers may do better. If it changes both sides, trust and payoffs may rise together.
The real question is not whether the intervention worked. It is: worked on which metric, through which side, and for which kind of pair?
Two estimands that are easy to confuse
Let $Z_1 \in {0,1}$ and $Z_2 \in {0,1}$ denote the treatment status of the first and second mover, respectively.
Role-specific effects. Let $A$ be the first mover's action and $B$ the second mover's action or response rule. The role-specific effects are
$$ \tau_1 = E[A(1) - A(0)], \qquad \tau_2 = E[B(1) - B(0)]. $$
These ask whether treatment changes what each role does.
Dyadic cell means. Let $Y$ be an encounter-level outcome. The natural dyadic objects are
$$ \mu_{zw}^Y = E[Y(z,w)], \qquad z,w \in {0,1}, $$
giving four ordered pair types: $\mu_{00}^Y$ (control-control), $\mu_{10}^Y$ (treated first mover, control second mover), $\mu_{01}^Y$ (control first mover, treated second mover), and $\mu_{11}^Y$ (treated-treated).
These two families are not interchangeable. In asymmetric games, $T$-$C$ and $C$-$T$ represent different mechanisms. One changes initiation. The other changes response.
Researchers often talk as though they care about the second family while estimating the first.
A plausible research design
Consider a design someone might reasonably use. Treatment is randomized at the individual level. Participants make choices before being matched with a partner. Matching occurs only after all decisions are collected. Participants do not observe the partner's treatment status when choosing. In strategy-method settings, second movers specify contingent responses before seeing the first mover's realized move.
This design is fine if the target is role-specific. It isolates how treatment changes an individual's decision rule. Formally,
$$ A_i = a(Z_{1i}, U_i), \qquad B_j = b(Z_{2j}, V_j), $$
where $U_i$ and $V_j$ are idiosyncratic determinants. Neither choice rule depends on the partner's treatment status at the moment of decision. So for $\tau_1$ and $\tau_2$, the design is well aligned with the estimand.
The trouble starts when one begins making statements about what the intervention does to the interaction.
Estimating pair-level outcomes
Suppose someone wants to learn whether treatment improves a pair-level outcome $Y$ but estimates a simple own-treatment contrast for the first mover:
$$ \Delta^{\text{naive}} = E[Y \mid Z_1 = 1] - E[Y \mid Z_1 = 0]. $$
This looks reasonable until one notices that $Y$ depends on both players. Conditioning only on $Z_1$ averages over the second mover's treatment status. If the second mover is treated with probability $p$, then
$$ E[Y \mid Z_1 = 1] = p,\mu_{11}^Y + (1 - p),\mu_{10}^Y, $$
$$ E[Y \mid Z_1 = 0] = p,\mu_{01}^Y + (1 - p),\mu_{00}^Y. $$
So
$$ \Delta^{\text{naive}} = p(\mu_{11}^Y - \mu_{01}^Y) + (1 - p)(\mu_{10}^Y - \mu_{00}^Y). $$
This is not a clean dyadic contrast. It is a weighted mixture of two pair-type comparisons, and it is generally not equal to $\mu_{11}^Y - \mu_{00}^Y$, nor to $\mu_{10}^Y - \mu_{00}^Y$, nor to $\mu_{01}^Y - \mu_{00}^Y$. It converges to the wrong population object.
Once outcomes depend on both people, comparing them only by their own treatment status mixes together different encounters.
Trust game formalization
A trust game makes the problem transparent. Let $s(z)$ denote the amount sent by a first mover with treatment state $z$, and $\rho(w)$ the share returned by a second mover with treatment state $w$. Assume the amount sent is tripled. The first mover's payoff is
$$ \pi_1(z,w) = 6 - s(z) + 3,s(z),\rho(w), $$
and the second mover's payoff is
$$ \pi_2(z,w) = 3,s(z),(1 - \rho(w)). $$
Total surplus is
$$ \pi_1(z,w) + \pi_2(z,w) = 6 + 2,s(z). $$
This last expression is the crucial piece. The first mover's treatment can affect total surplus through sending, while the second mover's treatment affects how the surplus is divided through returning. So $T$-$C$ and $C$-$T$ are different causal objects. One is mainly about efficiency. The other is mainly about distribution.
Simulation
To see how this plays out, I simulated a stylized trust-style setting where the exclusion story is perfectly true: treatment affects only each player's own decision rule, never the partner's treatment directly.
The structural choices are simple. Control first movers send $3.6$. Treated first movers send $3.9$. Control second movers return $33%$. Treated second movers return $40%$. I then simulated 1,000 studies with 302 participants, split between roles, with independent treatment assignment and small idiosyncratic noise. For each study I computed the naive own-treatment difference in first-mover payoff and the ordered pair-type means recovered by synthetic pairing.
Theoretical cell means
| Ordered pair | Send | Return share | FM payoff | SM payoff | Total surplus |
|---|---|---|---|---|---|
| C-C | 3.6 | 0.33 | 5.964 | 7.236 | 13.2 |
| C-T | 3.6 | 0.40 | 6.720 | 6.480 | 13.2 |
| T-C | 3.9 | 0.33 | 5.961 | 7.839 | 13.8 |
| T-T | 3.9 | 0.40 | 6.780 | 7.020 | 13.8 |
Several things are visible. Treating only the second mover raises the first mover's payoff substantially: $6.720 - 5.964 = 0.756$. The first mover is untreated in both cells but benefits because the second mover becomes more reciprocal. Treating only the first mover leaves the first mover's payoff nearly unchanged: $5.961 - 5.964 \approx -0.003$. The first mover sends more, enlarging the pie, but does not personally capture the gain when the second mover remains untreated. Treating both raises the first mover's payoff by $6.780 - 5.964 = 0.816$.
If the question is whether treatment improves the encounter for the first mover, the answer depends sharply on pair type.
Monte Carlo results
| Quantity | MC mean |
|---|---|
| Naive own-treatment difference in FM payoff | 0.028 |
| Synthetic pair: T-T minus C-C | 0.816 |
| Synthetic pair: T-C minus C-C | -0.003 |
| Synthetic pair: C-T minus C-C | 0.756 |
The naive estimator is near zero. By that metric, one might conclude that treatment barely affects first-mover payoffs. That conclusion is wrong for the dyadic estimand. The fully treated pair does much better. Treating only the second mover helps the first mover a lot. Treating only the first mover does not. The mechanisms are clear in the ordered cells and invisible in the naive average.
This is not a small-sample accident. The simulation obeys the exclusion logic exactly. The problem is not interference. The problem is that the estimator answers a different question.
Why this happens
When treatment changes what first movers do and also changes how second movers respond, first-mover outcomes are shaped by both sides. Comparing treated and untreated first movers while ignoring partner treatment averages together two states of the world: treated first mover facing a control second mover, and treated first mover facing a treated second mover. Those states need not point in the same direction. In the simulation they do not. One barely changes the first mover's payoff. The other raises it substantially. The average of those two states is a mixture, not a contrast.
That is why the naive estimate can look small even when the fully treated pair differs sharply from the fully control pair.
Estimation
If the estimand is dyadic, the analysis should target dyadic cell means directly. The clean objects are $\mu_{00}^Y$, $\mu_{10}^Y$, $\mu_{01}^Y$, and $\mu_{11}^Y$.
If exact match identifiers are retained, one can estimate realized ordered cell means directly from observed pairs. If matching occurs after decisions are collected and is random within blocks, then one can estimate expected cell means by synthetic pairing. Let $s$ index sessions or randomization blocks, and let $\mathcal{I}^{(1)}{sz}$ and $\mathcal{I}^{(2)}{sw}$ be the first and second movers in treatment states $z$ and $w$. The expected pair-type estimator is
$$ \widehat{\mu}^{,\text{exp}}{zw} = \frac{\sum_s \sum{i \in \mathcal{I}^{(1)}{sz}} \sum{j \in \mathcal{I}^{(2)}{sw}} g(A_i, B_j)}{\sum_s, n^{(1)}{sz}, n^{(2)}_{sw}}. $$
This estimator aligns with the dyadic estimand. The naive own-treatment regression does not.
One caution: synthetic pairing reuses the same individuals across many pseudo-pairs, so standard errors must account for that. Randomization inference, participant-level bootstrap, or session-level bootstrap are sensible choices. Treating each synthetic pair as an independent observation would be absurd.
Implications for design
The lesson is straightforward. If the target is an individual choice, individual-level randomization with post-decision matching is fine. If the target is an interaction, the study should be built around ordered pair types from the start. That means defining dyadic cell means as primary estimands, preserving role labels and match structure, keeping treatment status for both sides, retaining participant-level decisions and full response schedules, and analyzing pair types directly rather than backing into them through own-treatment regressions.