Practical

Entity Extraction

Last reviewed: April 2026

An NLP technique that automatically identifies and classifies named entities — people, organisations, locations, dates, amounts — from unstructured text.

Entity extraction (also called named entity recognition or NER) is the process of automatically identifying and categorising specific pieces of information in text — such as person names, company names, locations, dates, monetary amounts, and product names.

How entity extraction works

Given a sentence like "Apple announced a $3 billion investment in Munich on Tuesday," entity extraction identifies:

"Apple" → Organisation
"$3 billion" → Money
"Munich" → Location
"Tuesday" → Date

Modern entity extraction uses pre-trained language models that understand context. The model knows "Apple" is a company here (not a fruit) because of the surrounding words "announced" and "investment."

Common entity types

Person — individual names
Organisation — companies, institutions, agencies
Location — cities, countries, addresses
Date/Time — dates, times, durations
Money — monetary values and currencies
Product — specific products or services
Custom entities — industry-specific types (drug names, legal citations, part numbers)

Business applications

Document processing — automatically extracting key information from contracts, invoices, and reports
Customer support — identifying product names, order numbers, and issue types from support tickets
Compliance — scanning documents for personally identifiable information (PII)
Media monitoring — tracking mentions of your company, competitors, or executives
Research — extracting relationships between entities from scientific or financial literature

Approaches

Pre-trained NER models — spaCy, Hugging Face models that work out of the box for common entity types
LLM-based extraction — using prompts to extract entities from text, more flexible for custom types but slower
Fine-tuned models — training a model on your specific entity types and domain language for maximum accuracy

Challenges

Ambiguity ("Washington" — person, city, or state?)
Nested entities ("Bank of America" contains "America")
Domain-specific entities require custom training data
Performance varies across languages and domains

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Entity extraction turns unstructured text into structured data that your systems can act on. It automates data entry, powers intelligent document processing, and enables analytics on text-heavy business data. For organisations drowning in documents, emails, and reports, entity extraction is often the highest-ROI NLP application.

Related Terms

Natural Language Processing (NLP)

The branch of AI focused on enabling computers to understand, interpret, and generate human language in useful ways.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Named Entity Recognition (NER)

An NLP task that identifies and classifies proper nouns and specific terms in text into predefined categories like person, organisation, location, and date.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary