Core AI

Catastrophic Forgetting

Last reviewed: April 2026

A problem in neural networks where training on new data causes the model to lose knowledge it previously learned, effectively overwriting old skills with new ones.

Catastrophic forgetting is a well-known problem in neural networks where training a model on new information causes it to forget previously learned information. The model's internal weights get overwritten by the new training, degrading performance on tasks it previously handled well.

How forgetting happens

Neural networks store knowledge distributed across their weights. When you fine-tune a model on a new task, the training process adjusts these weights to perform well on the new data. But those same weights were responsible for the model's previous capabilities. Changing them to accommodate new knowledge can destroy old knowledge.

For example, if you fine-tune a general-purpose language model exclusively on medical texts, it may become excellent at medical questions but forget how to write poetry, summarise business documents, or solve maths problems.

Why this is a fundamental challenge

Catastrophic forgetting is arguably one of the most important open problems in AI. Humans learn continuously without forgetting how to walk when they learn to ride a bicycle. Neural networks, by contrast, tend to be brittle — optimising for one objective at the expense of others.

This limitation has practical consequences:

Fine-tuning trade-offs: Every time you specialise a model, you risk degrading its general capabilities.
Continuous learning: Deploying models that learn from new data in production is risky because they may forget important baseline behaviours.
Multi-task models: Building a single model that excels at many tasks is harder than it should be because training on one task can interfere with others.

Mitigation strategies

Elastic Weight Consolidation (EWC): Identifies which weights are most important for previous tasks and penalises changes to those specific weights during new training.
Progressive networks: Add new network capacity for new tasks while freezing the weights responsible for old tasks.
Replay methods: Periodically retrain on examples from previous tasks alongside new data, reminding the model of what it learned before.
LoRA and adapter methods: Instead of modifying the base model's weights, add small trainable modules on top. The base knowledge is preserved because the original weights are frozen.
Multi-task training: Train on all tasks simultaneously from the start, rather than sequentially.

Relevance to modern AI

Catastrophic forgetting is a key reason why large language models are pre-trained once on enormous datasets and then carefully fine-tuned rather than continuously updated. The pre-training creates a robust knowledge base, and fine-tuning is done cautiously, often with techniques specifically designed to minimise forgetting.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Catastrophic forgetting explains why AI models cannot simply be "updated" with new information the way a database can. Understanding this constraint helps you set realistic expectations for model customisation and appreciate why fine-tuning requires careful planning and evaluation.

Related Terms

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

LoRA (Low-Rank Adaptation)

A fine-tuning technique that trains only a small number of additional parameters instead of updating the entire model, making customisation faster and cheaper.

Transfer Learning

A technique where a model trained on one task is reused as the starting point for a different but related task, dramatically reducing the data and time needed.

Pre-training

The initial, large-scale training phase where a foundation model learns general knowledge from vast amounts of data before being specialised for specific tasks.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Infrastructure and Deployment

← Back to Glossary