By Gaurav in ML/Statistics — 17 Aug 2025

Rank‑Preserving Calibration for Multiclass Probabilities

We often need to adjust a fitted $E[Y|X]$ so that the aggregate $E[Y]$ (or class totals) matches known targets (see here). In binary classification, a single logit shift (or a global temperature) is a monotone transform of the model scores, so if person A had a higher class-1 probability than person B, A still ranks above B after calibration. With $J \geq 3$ classes, however, naive per-class logit shifts interact through the row-sum-to-one constraint: the normalization denominator sums over all classes, so "pushing" one class changes the others unevenly across people. In softmax form, per-class intercepts $\alpha_j$ lead to

$$q_{ij} \propto w_j p_{ij} \quad \text{where} \quad w_j = e^{\alpha_j}$$

divided by $\sum_k w_k p_{ik}$. The person-specific denominator can tilt comparisons and flip the within-class order you meant to preserve.

We pose calibration as a projection problem that keeps rows on the probability simplex and imposes isotonic (rank-preserving) structure within each column while matching column sums. Using Dykstra's alternating projections (and an ADMM alternative), we find the closest calibrated matrix to the original predictions, so we fix aggregates without unnecessary reordering.

Problem Setup

You start with an $N \times J$ matrix $P$ of predicted class probabilities (rows = instances, columns = classes). You want a calibrated matrix $Q$ such that:

Row-simplex: $Q_i \in \Delta_{J-1}$ for all $i$ (entries non-negative and sum to 1)
Column totals: $\sum_{i=1}^{N} Q_{ij} = T_j$ for user-supplied targets $T_1,\ldots,T_J$
Within-class monotonicity: For each class $j$, when instances are sorted by their original class-$j$ scores $P_{\cdot j}$, the calibrated values $Q_{\cdot j}$ are non-decreasing

These constraint sets are convex. Their intersection is non-empty when the targets are feasible; otherwise, we aim for the nearest point in the intersection (in $L_2$).

Dykstra's Alternating Projections and ADMM

Alternating projections shuttle between constraint sets, projecting onto one and then the other. Dykstra's method augments this with correction terms so the iterates converge to the Euclidean projection of $P$ onto the intersection, not just any feasible point. That matters here because we want the smallest necessary change to achieve calibration.

The Two Projections

Row projection (simplex): Project each row to the probability simplex. This keeps rows valid without disturbing relative order inside a column.
Column projection (isotonic + sum): For each column, sort by the original scores, run an isotonic regression (non-decreasing), then adjust to meet the column total. Pool-Adjacent-Violators (PAV) gives an $O(N)$ solution per column once ordered.

We iterate these projections with Dykstra's corrections until column-sum errors and rank-violation residuals are within tolerance.

The same problem can be written as: minimize $||Q - P||^2$ subject to $Q \in S_{row} \cap I_{cols}$ where $S_{row}$ is the row-simplex set and $I_{cols}$ enforces isotonic columns with specified sums. The package includes an **ADMM** implementation with variable splitting so each subproblem is a familiar projection or proximal step. ADMM is useful when you want different stopping rules, easy warm-starts, or to experiment with other penalties.

The same problem can be written as:

$$\min_{Q} |Q - P|^2 \quad \text{subject to} \quad Q \in S_{\text{row}} \cap I_{\text{cols}}$$

where $S_{\text{row}}$ is the row-simplex set and $I_{\text{cols}}$ enforces isotonic columns with specified sums. The package includes an ADMM implementation with variable splitting so each subproblem is a familiar projection or proximal step. ADMM is useful when you want different stopping rules, easy warm-starts, or to experiment with other penalties.

By default, the solver operates in Euclidean geometry, minimizing $|Q - P|^2$. The optional KL mode minimizes $D_{\text{KL}}(Q | P)$ instead. Both enforce the same constraints and preserve within-class order; they differ only in the notion of "closest." Euclidean changes resemble additive tweaks; KL changes resemble multiplicative reweighting.

Diagnostics and Feasibility

Inspect three things after calibration:

Maximum absolute column-sum error: $\max_j |\sum_i Q_{ij} - T_j|$
Maximum deviation from row-sum-to-one: $\max_i |\sum_j Q_{ij} - 1|$
Any residual rank violations at numerical tolerance

If your targets $T$ are infeasible given the row-simplex constraint, the solver returns the nearest point; document the gap and revisit the totals.

Python Package: https://github.com/finite-sample/rank_preserving_calibration

Problem Setup

Dykstra's Alternating Projections and ADMM

Diagnostics and Feasibility

Subscribe to Gojiberries