Practical

Optical Character Recognition (OCR)

Last reviewed: April 2026

Technology that converts images of text — scanned documents, photos, PDFs — into machine-readable text that software can search and process.

Optical character recognition (OCR) is the technology that converts images containing text into actual text data that computers can search, edit, and process. When you scan a paper document and it becomes a searchable PDF, OCR is doing the work.

How OCR works

Traditional OCR follows several steps:

Image preprocessing: The image is cleaned up — correcting skew, removing noise, adjusting contrast.
Text detection: Regions of the image containing text are identified and isolated.
Character recognition: Each character is identified by comparing its shape against known character patterns.
Post-processing: Spell checking and language models correct likely recognition errors.

Modern OCR systems powered by deep learning skip much of this pipeline, using end-to-end neural networks that process the raw image and output text directly. These AI-powered systems handle messy handwriting, unusual fonts, complex layouts, and degraded images far better than traditional approaches.

OCR in the AI era

The intersection of OCR and large language models has created a new category of document AI. Modern multimodal models like Claude and GPT-4 can read documents with sophisticated layouts — tables, charts, multi-column text, forms — and not just extract the text but understand its meaning and structure.

This means you can now:

Upload a financial report and ask questions about the data in its tables.
Process invoices and extract structured data (vendor, amount, date, line items) automatically.
Digitise handwritten notes and meeting minutes.
Convert whiteboard photos into structured action items.

Business applications

Invoice processing: Automating accounts payable by extracting data from diverse invoice formats.
Contract analysis: Making scanned legal documents searchable and analysable.
Healthcare records: Digitising patient records and clinical notes.
Compliance: Processing paper-based regulatory filings.
Archival: Making historical document collections searchable.

Accuracy considerations

OCR accuracy depends heavily on image quality, text complexity, and language. Clean printed text in common languages achieves 99%+ accuracy. Handwritten text, degraded documents, or unusual scripts may require specialised models.

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

OCR bridges the gap between paper-based processes and digital AI capabilities. Many organisations sit on vast stores of information locked in scanned documents, images, and PDFs. Understanding modern OCR capabilities reveals opportunities to unlock this data for AI-powered analysis and automation.

Related Terms

Computer Vision

The field of AI that enables machines to interpret and understand visual information from images and videos, including object recognition, scene understanding, and visual analysis.

Multimodal AI

AI systems that can process and generate multiple types of content — text, images, audio, video — rather than just text alone.

Unstructured Data

Data that does not follow a predefined format — emails, documents, images, videos, and conversations — which AI can now analyse and extract value from.

Automation

Using technology to perform tasks without manual human effort. AI automation goes beyond traditional rule-based automation by handling unstructured tasks like writing, analysis, and decision-making.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: Beyond Text: Images, Audio, and Video

← Back to Glossary