(Don't) Forget About It: Toward Pareto Improving GD
Machine learning models don't improve like traditional software. When we "update" a model, it sometimes begins to mishandle cases it previously solved—an outcome known as regression or “forgetting.” This issue is well-studied in continual learning, where models learn multiple tasks sequentially (French, 1999). Standard solutions