Hungary For More? Optimal 1-to-Many Matching for Causal Inference
The Hungarian algorithm (Kuhn–Munkres) efficiently finds optimal one-to-one matches between treated and control units by minimizing total matching cost (typically Euclidean distance in covariate space). It has been used for estimating treatment effects via matching.
But it has a limitation: it is strictly one-to-one.
In many causal inference settings, especially when estimating the Average Treatment effect on the Treated (ATT), we have far more controls than treated units. Forcing each treated unit to pair with only one control discards useful data and limits efficiency.
One Too Many or 1-to-Many?
The choice between one-to-one and one-to-many matching involves a fundamental trade-off between bias and variance. One-to-many matching is typically optimal when you have a large pool of control units relative to treated units, noisy outcomes that benefit from averaging, or when match quality degrades slowly (the second and third best matches are nearly as good as the first). In these settings, the variance reduction from using additional matches outweighs the bias from including slightly lower-quality controls.
Conversely, one-to-one matching is preferred when match quality drops sharply after the first match, when you have abundant high-quality matches for all treated units, or when treatment effects are highly heterogeneous and require precise individual counterfactuals. The optimal choice depends on your specific data structure—control-to-treated ratios, covariate overlap, outcome noise levels—and can be evaluated empirically by examining match quality distributions across ranks and conducting sensitivity analyses across different matching ratios.
One-to-How-Many?
We can select the optimal number of matches (k) in a data-driven way if we have experimental benchmark data. Or we can use:
- Cross-validation: Hold out a subset of control units, pretend they are "treated," match them to controls, and see how well their outcomes are predicted as a function of k.
- Caliper-based selection: Set a threshold (based on domain knowledge, covariate balance, or common heuristics like 0.2 SD of propensity score), and use all controls within that distance as matches.
Implementation of 1-to-many
There are three approaches to solving 1-to-many:
- Min-Cost Max-Flow (Optimal 1-to-k)
Generalizing to 1-to-k (or variable-ratio) matching, the field has established the use of min-cost max-flow algorithms. Here, each treated unit becomes a supply node with k units, each control is a demand node (capacity 1), and edge weights represent matching costs (e.g., Euclidean distance). The network flow solution finds the globally optimal assignment, meeting all constraints. This approach is foundational in modern matching literature and implemented in R’soptmatch
, Python’snetworkx
, and various optimization libraries (see Hansen 2007, Kallus 2020, Stuart 2010). - Sequential Hungarian (Greedy, Iterative Assignment)
A practical, heuristic alternative is to iteratively apply the Hungarian algorithm. In each of k rounds, we solve the 1-to-1 assignment problem for unmatched treated and controls, record the matches, and remove used controls from the pool. This sequential approach is locally optimal at each step, simple to code using standard Hungarian routines, and commonly used in practice for moderate k or large datasets. However, it does not guarantee global optimality across all matches. Variants of this heuristic have been described in applied statistics and econometrics, especially for approximate matching without full optimization (see Abadie & Imbens 2016). - Graph Duplication (“Blow-up” Hungarian)
A theoretically elegant but underused method is to duplicate each treated unit k times, creating an expanded cost matrix, and solve a standard Hungarian assignment on this larger problem. The resulting matches can be mapped back to the original units to yield a globally optimal 1-to-k assignment. This method is mathematically equivalent to the min-cost flow solution but is rarely implemented in practice, likely due to computational concerns when k or sample sizes are large. Nevertheless, for moderate k and dataset size, this approach is clean, preserves optimality, and leverages robust Hungarian algorithm implementations.