Streaming Calibration
Modern applications—from ad platforms calibrating click-through predictions to polling systems incorporating responses to ML algorithms adapting fairness thresholds—share a common challenge: maintaining calibrated weights on live data streams.
To address streaming data, we recast raking as a streaming convex optimization problem: minimize the squared error between current weighted margins and the target proportions, subject to positive weight constraints. The loss function is convex in the normalized weights, and its minimizers coincide with the classical raking solution when feasible. Two online mirror‑descent updates emerge from this formulation:
- Stochastic Gradient Descent (SGD) performs additive updates. When a new record arrives, we append a weight of one and take a few projected gradient steps on the margin loss. With a diminishing learning rate, this online algorithm converges to the same solution as batch raking.
- Multiplicative Weights Update (MWU) performs multiplicative updates. Each weight is multiplied by the exponential of the negative gradient, making this update a mirror‑descent step under the Kullback–Leibler (KL) divergence. MWU retains the flavor of traditional raking yet operates locally on each record, without revisiting entire strata.
Both SGD and MWU operate in constant time per observation. Both algorithm updates operate on a positive weight vector (we normalize only when computing margins). MWU’s multiplicative update guarantees positivity, and SGD enforces it through weight clipping; optional upper bounds on weights can be used to control variance.
We evaluated these online rakers on synthetic streaming surveys under three non‑stationary scenarios:
- Linear drift: characteristics shift gradually from under‑ to over‑representation.
- Sudden shift: the population composition jumps midway through the stream.
- Oscillation: demographics oscillate around the targets.
For each scenario, 300 observations were processed with per‑record updates and compared against an unweighted baseline. Table 1 shows average percentage improvements in absolute margin error, final effective sample size (ESS), and the final loss.
Scenario | Method | Age Imp. | Gender Imp. | Educ. Imp. | Region Imp. | ESS | Loss |
---|---|---|---|---|---|---|---|
Linear | SGD | 82.8 % | 78.6 % | 76.8 % | 67.5 % | 251.8 | 0.00147 |
Linear | MWU | 57.2 % | 53.6 % | 46.9 % | 34.6 % | 240.9 | 0.00676 |
Sudden | SGD | 82.9 % | 82.3 % | 79.6 % | 63.5 % | 225.5 | 0.00102 |
Sudden | MWU | 52.6 % | 51.2 % | 46.3 % | 26.3 % | 175.9 | 0.01235 |
Oscillating | SGD | 69.7 % | 78.5 % | 65.6 % | 72.0 % | 278.7 | 0.00023 |
Oscillating | MWU | 49.6 % | 57.3 % | 48.3 % | 50.1 % | 276.0 | 0.00048 |
SGD consistently achieves the largest reductions in margin error and the lowest loss, while MWU still delivers substantial improvements with multiplicative updates. Both methods maintain high ESS, indicating that weights remain stable even under heavy drift.
Python Package: https://github.com/finite-sample/onlinerake
pip install onlinerake
Technical Note: https://gsood.com/research/papers/onlinerake.pdf
See also: https://github.com/finite-sample/calibre for nearly isotonic, relaxed PAVA, and GAM based calibrators.