Sign in Subscribe

Topic

ML/Statistics

A collection of 152 issues

Out-of-Context: Constrained Tool Based Exploration of Context

Longer context windows have not come at the back of a solution for "context rot." Recursive Language Models (RLMs) take inspiration from out‑of‑core data processing, where systems handle datasets larger than memory by deciding what to load. The paper proposes solving the context rot problem by

Targeted Model Editing

Pretrained generative models (LLMs, text-to-image, etc.) are used as general-purpose engines and then "aligned" toward specific goals: lower toxicity, higher truthfulness, removal of certain objects or styles, domain specialization, and so on. In practice, we rarely want to rebuild the whole model. The real objective is more surgical.

Shipping Fuzzy‑Joined Data

A growing share of empirical work relies on "data products" that integrate multiple administrative and survey sources via fuzzy joins. Researchers typically receive only the final merged panel and treat each cell as if it were directly observed. But upstream, the data producer has made substantive decisions—how

Chaotic Flows: LLMs in Low‑Tolerance Workflows

When you add an LLM to a workflow, you change both the upside and the downside. The net effect is easiest to express as a shift in the workflow's value: $$ \Delta \text{Value} = \Delta \text{Benefit} - N \sum_i f_i \cdot \Delta p_i \cdot L_

Stable LLM Inference

A mathematical function gives the same output for the same input. LLMs do not. Not even at temperature = 0. Determinism matters for reproducibility, debugging, and for keeping pipelines stable. Without it, identical prompts can yield divergent completions, making experiments irreproducible and systems brittle. We can intervene at three broad layers

What Problem Is Traditional RAG Solving?

RAG solves a specific technical problem: it enables models trained on broad public text to answer questions about specialized content without retraining. But solving a technical problem doesn't automatically make RAG the right business solution. A useful way to reason about search-system design is to model its net

Conscious Decoupling: Separating Budgets For Thought and Action

Current LLM services bundle model intelligence with supporting infrastructure into fixed tiers. When you purchase access to GPT‑5 Thinking, Claude Opus 4.1, or similar offerings, you receive a package: some amount of model capability paired with some unspecified quantity of supporting compute. This bundling serves a purpose. It

Testing LLMs: Engineering Confidence Without Certainty

Software testing has always relied on a fragile assumption: that we can enumerate test cases that represent production behavior. We test specific inputs, verify outputs, and trust that production will behave similarly. This works until it doesn't. Search ranking degrades as user behavior evolves. Heuristics fail when traffic

What Problem is DSPy Solving

Follow up to: Not so Prompt: Prompt Optimization as Model Selection The practical problem is to choose a prompt (or a set of prompts) that perform the best on a task in the "real world" while respecting constraints, e.g., the output must parse, latency should stay below

The Double Descent of Standard Errors

Since the last essay, I have been percolating over whether double descent also means smaller s.e. for HTE. Estimating heterogeneous treatment effects involves two goals: accurate out-of-sample prediction of $\hat{\tau}(x)$ and valid inference for functionals like average treatment effects and policy values. Standard practice constrains model complexity