Mitigating Catastrophic Forgetting Using Improved Clustering-Based Episodic Memory

Online Continual Learning (Domain-Incremental, Task-Agnostic) — STRATA-I and STRATA-II

Authors
Owen Beabout, Abigail Dodd, Titus Murphy, Enyue Lu
Code
https://github.com/shininglegend/strata
Paper (PDF)
paper.pdf
Poster
poster.pdf
Keywords
continual learning, catastrophic forgetting, episodic memory, domain incremental learning, task-agnostic learning

Abstract

Online Continual Learning (OCL) is a subdomain of machine learning in which models must continuously learn from a perpetual data stream without access to past samples. Models using domain incremental learning can adapt to shifting sample distributions, known as new tasks, while retaining accuracy on previously-trained tasks, without needing to know the precise moment when the task switches. However, domain incremental learning models are often susceptible to a loss in accuracy on earlier tasks as they train on subsequent tasks. We propose two new domain-incremental balanced stochastic gradient models with improved clustering-based episodic memory, STochastic gRAdient with Task-Agnosticity (STRATA-I and STRATA-II), and demonstrate strong performance on several benchmark datasets and tasks compared to previous state-of-theart models, including reducing forgetting in at least two cases by over 75%.

Problem

In domain-incremental online continual learning, task identity and task boundaries are not available, while the data distribution shifts over time. Many models forget earlier tasks as training continues.

Previous Work

Prior work has explored task-agnostic clustering-based episodic memory, which attempts to group samples by underlying task structure without access to explicit task labels [1]. Among non-task-agnostic approaches using episodic memory, MEGA-I and MEGA-II are particularly strong methods: they combine gradients from memory and incoming samples by either rotating or balancing them based on relative loss values [2].

Our Upgrades

Key Results

STRATA-I and STRATA-II generally reduce forgetting compared to baselines across most datasets and task types, with especially strong reductions in class-split settings. Overall accuracy is also competitive and often best.

Forgetting on permutation tasks
Forgetting (Permutation Tasks): STRATA models show reduced forgetting on permutation-based tasks compared to baselines.
Legend:
Graph legend key
Forgetting on class-split tasks
Forgetting (Class-Split Tasks): Especially strong forgetting reductions in class-split settings, with over 75% improvement in some cases.
Legend:
Graph legend key
Accuracy on rotation tasks
First-Task Accuracy (Rotation Tasks): STRATA models maintain stronger first-task performance throughout training.
Legend:
Graph legend key
Overall accuracy on rotation tasks
Overall Accuracy (Rotation Tasks): STRATA models frequently achieve the best or statistically tied-best overall accuracy.
Legend:
Graph legend key
1 / 4

Method summary

Episodic memory update

STRATA-I vs STRATA-II

Experimental setup