By Gaurav in ML/Statistics — 28 Jun 2025

Pareto ML Deployments

In machine learning, a common deployment strategy is to replace an existing model with one that performs better overall. Another common strategy refines this approach by limiting deployment to user segments or regions where the improvements are clear. Both approaches allow regressions: new errors on cases that the old model got right.

In applications where regressions are expensive, it may be worth considering a more granular deployment strategy. Rather than asking whether the new model is better on average, we can ask where it is better. This shift reframes evaluation and deployment at the example level: if two models disagree, and one is more often right on similar cases, we should prefer it—for that part of the input space.

Instance-level routing—choosing which model to trust on each example—offers one practical approach. Two simple methods are:

Nearest-neighbor routing: For each new case, choose the model that performed best among similar examples in held-out data.
Meta-classifier routing: Train a small classifier to predict, from features, which model is likely to be right for each case.

On the Adult income dataset, these strategies outperformed the standard swap:

Strategy	Accuracy	Regressions vs. A
A	0.8459	0
B	0.8396	368
kNN	0.8501	15
Meta	0.8802	0

Naively replacing A with B increases errors on cases A handled well. The routing methods increase accuracy and sharply reduce or even eliminate regressions.

However, routing brings its own risks. The more finely we tune deployment decisions, the greater the chance of overfitting to validation idiosyncrasies. A meta-classifier might learn quirks of the validation set rather than general patterns. To mitigate this, careful cross-validation, regularization, and simplicity in routing models are essential.

Maintaining multiple models and a routing layer also introduces overhead. Serving infrastructure becomes more complex as additional components must be monitored and maintained. Routing can also add latency, which we may or may not care about (for offline predictions, for instance, we may not care that much). These costs may outweigh the benefits in many settings. But where regressions are costly, instance-level routing strategies can be useful.

More at: https://github.com/finite-sample/pareto-deployment

Subscribe to Gojiberries