Out-of-Context: Constrained Tool Based Exploration of Context
Longer context windows have not eliminated long‑context failure. In practice, adding more tokens often makes models less reliable. Anthropic summarizes “context rot” as recall degrading as context length increases. Operationally, recall on search tasks doesn't go down much. However, as context becomes longer, models struggle when the