Entity Extraction
An NLP technique that automatically identifies and classifies named entities — people, organisations, locations, dates, amounts — from unstructured text.
Entity extraction (also called named entity recognition or NER) is the process of automatically identifying and categorising specific pieces of information in text — such as person names, company names, locations, dates, monetary amounts, and product names.
How entity extraction works
Given a sentence like "Apple announced a $3 billion investment in Munich on Tuesday," entity extraction identifies:
- "Apple" → Organisation
- "$3 billion" → Money
- "Munich" → Location
- "Tuesday" → Date
Modern entity extraction uses pre-trained language models that understand context. The model knows "Apple" is a company here (not a fruit) because of the surrounding words "announced" and "investment."
Common entity types
- Person — individual names
- Organisation — companies, institutions, agencies
- Location — cities, countries, addresses
- Date/Time — dates, times, durations
- Money — monetary values and currencies
- Product — specific products or services
- Custom entities — industry-specific types (drug names, legal citations, part numbers)
Business applications
- Document processing — automatically extracting key information from contracts, invoices, and reports
- Customer support — identifying product names, order numbers, and issue types from support tickets
- Compliance — scanning documents for personally identifiable information (PII)
- Media monitoring — tracking mentions of your company, competitors, or executives
- Research — extracting relationships between entities from scientific or financial literature
Approaches
- Pre-trained NER models — spaCy, Hugging Face models that work out of the box for common entity types
- LLM-based extraction — using prompts to extract entities from text, more flexible for custom types but slower
- Fine-tuned models — training a model on your specific entity types and domain language for maximum accuracy
Challenges
- Ambiguity ("Washington" — person, city, or state?)
- Nested entities ("Bank of America" contains "America")
- Domain-specific entities require custom training data
- Performance varies across languages and domains
Why This Matters
Entity extraction turns unstructured text into structured data that your systems can act on. It automates data entry, powers intelligent document processing, and enables analytics on text-heavy business data. For organisations drowning in documents, emails, and reports, entity extraction is often the highest-ROI NLP application.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Building Your First AI Workflow