Skip to main content
Early access — new tools and guides added regularly
Practical

Entity Extraction

Last reviewed: April 2026

An NLP technique that automatically identifies and classifies named entities — people, organisations, locations, dates, amounts — from unstructured text.

Entity extraction (also called named entity recognition or NER) is the process of automatically identifying and categorising specific pieces of information in text — such as person names, company names, locations, dates, monetary amounts, and product names.

How entity extraction works

Given a sentence like "Apple announced a $3 billion investment in Munich on Tuesday," entity extraction identifies:

  • "Apple" → Organisation
  • "$3 billion" → Money
  • "Munich" → Location
  • "Tuesday" → Date

Modern entity extraction uses pre-trained language models that understand context. The model knows "Apple" is a company here (not a fruit) because of the surrounding words "announced" and "investment."

Common entity types

  • Person — individual names
  • Organisation — companies, institutions, agencies
  • Location — cities, countries, addresses
  • Date/Time — dates, times, durations
  • Money — monetary values and currencies
  • Product — specific products or services
  • Custom entities — industry-specific types (drug names, legal citations, part numbers)

Business applications

  • Document processing — automatically extracting key information from contracts, invoices, and reports
  • Customer support — identifying product names, order numbers, and issue types from support tickets
  • Compliance — scanning documents for personally identifiable information (PII)
  • Media monitoring — tracking mentions of your company, competitors, or executives
  • Research — extracting relationships between entities from scientific or financial literature

Approaches

  • Pre-trained NER models — spaCy, Hugging Face models that work out of the box for common entity types
  • LLM-based extraction — using prompts to extract entities from text, more flexible for custom types but slower
  • Fine-tuned models — training a model on your specific entity types and domain language for maximum accuracy

Challenges

  • Ambiguity ("Washington" — person, city, or state?)
  • Nested entities ("Bank of America" contains "America")
  • Domain-specific entities require custom training data
  • Performance varies across languages and domains
Want to go deeper?
This topic is covered in our Practitioner level. Access all 60+ lessons free.

Why This Matters

Entity extraction turns unstructured text into structured data that your systems can act on. It automates data entry, powers intelligent document processing, and enables analytics on text-heavy business data. For organisations drowning in documents, emails, and reports, entity extraction is often the highest-ROI NLP application.

Related Terms

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow