Catastrophic Forgetting
A problem in neural networks where training on new data causes the model to lose knowledge it previously learned, effectively overwriting old skills with new ones.
Catastrophic forgetting is a well-known problem in neural networks where training a model on new information causes it to forget previously learned information. The model's internal weights get overwritten by the new training, degrading performance on tasks it previously handled well.
How forgetting happens
Neural networks store knowledge distributed across their weights. When you fine-tune a model on a new task, the training process adjusts these weights to perform well on the new data. But those same weights were responsible for the model's previous capabilities. Changing them to accommodate new knowledge can destroy old knowledge.
For example, if you fine-tune a general-purpose language model exclusively on medical texts, it may become excellent at medical questions but forget how to write poetry, summarise business documents, or solve maths problems.
Why this is a fundamental challenge
Catastrophic forgetting is arguably one of the most important open problems in AI. Humans learn continuously without forgetting how to walk when they learn to ride a bicycle. Neural networks, by contrast, tend to be brittle β optimising for one objective at the expense of others.
This limitation has practical consequences:
- Fine-tuning trade-offs: Every time you specialise a model, you risk degrading its general capabilities.
- Continuous learning: Deploying models that learn from new data in production is risky because they may forget important baseline behaviours.
- Multi-task models: Building a single model that excels at many tasks is harder than it should be because training on one task can interfere with others.
Mitigation strategies
- Elastic Weight Consolidation (EWC): Identifies which weights are most important for previous tasks and penalises changes to those specific weights during new training.
- Progressive networks: Add new network capacity for new tasks while freezing the weights responsible for old tasks.
- Replay methods: Periodically retrain on examples from previous tasks alongside new data, reminding the model of what it learned before.
- LoRA and adapter methods: Instead of modifying the base model's weights, add small trainable modules on top. The base knowledge is preserved because the original weights are frozen.
- Multi-task training: Train on all tasks simultaneously from the start, rather than sequentially.
Relevance to modern AI
Catastrophic forgetting is a key reason why large language models are pre-trained once on enormous datasets and then carefully fine-tuned rather than continuously updated. The pre-training creates a robust knowledge base, and fine-tuning is done cautiously, often with techniques specifically designed to minimise forgetting.
Why This Matters
Catastrophic forgetting explains why AI models cannot simply be "updated" with new information the way a database can. Understanding this constraint helps you set realistic expectations for model customisation and appreciate why fine-tuning requires careful planning and evaluation.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: AI Infrastructure and Deployment
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β