Optical Character Recognition (OCR)
Technology that converts images of text — scanned documents, photos, PDFs — into machine-readable text that software can search and process.
Optical character recognition (OCR) is the technology that converts images containing text into actual text data that computers can search, edit, and process. When you scan a paper document and it becomes a searchable PDF, OCR is doing the work.
How OCR works
Traditional OCR follows several steps:
- Image preprocessing: The image is cleaned up — correcting skew, removing noise, adjusting contrast.
- Text detection: Regions of the image containing text are identified and isolated.
- Character recognition: Each character is identified by comparing its shape against known character patterns.
- Post-processing: Spell checking and language models correct likely recognition errors.
Modern OCR systems powered by deep learning skip much of this pipeline, using end-to-end neural networks that process the raw image and output text directly. These AI-powered systems handle messy handwriting, unusual fonts, complex layouts, and degraded images far better than traditional approaches.
OCR in the AI era
The intersection of OCR and large language models has created a new category of document AI. Modern multimodal models like Claude and GPT-4 can read documents with sophisticated layouts — tables, charts, multi-column text, forms — and not just extract the text but understand its meaning and structure.
This means you can now:
- Upload a financial report and ask questions about the data in its tables.
- Process invoices and extract structured data (vendor, amount, date, line items) automatically.
- Digitise handwritten notes and meeting minutes.
- Convert whiteboard photos into structured action items.
Business applications
- Invoice processing: Automating accounts payable by extracting data from diverse invoice formats.
- Contract analysis: Making scanned legal documents searchable and analysable.
- Healthcare records: Digitising patient records and clinical notes.
- Compliance: Processing paper-based regulatory filings.
- Archival: Making historical document collections searchable.
Accuracy considerations
OCR accuracy depends heavily on image quality, text complexity, and language. Clean printed text in common languages achieves 99%+ accuracy. Handwritten text, degraded documents, or unusual scripts may require specialised models.
Why This Matters
OCR bridges the gap between paper-based processes and digital AI capabilities. Many organisations sit on vast stores of information locked in scanned documents, images, and PDFs. Understanding modern OCR capabilities reveals opportunities to unlock this data for AI-powered analysis and automation.
Related Terms
Continue learning in Essentials
This topic is covered in our lesson: Beyond Text: Images, Audio, and Video