Practical

Object Detection

Last reviewed: April 2026

A computer vision task that identifies and locates specific objects within images or video frames, drawing bounding boxes around each detected object.

Object detection is a computer vision task that not only identifies what objects are in an image but also locates where each object is by drawing a bounding box around it. It goes beyond simple classification ("this image contains a car") to provide spatial information ("there is a car at this location, a pedestrian at that location, and a traffic sign over there").

How object detection works

Modern object detection models process an image and output a list of detected objects, each with:

A class label (what is it — car, person, dog)
A bounding box (where is it — x, y coordinates and dimensions)
A confidence score (how certain the model is)

Major architectures

YOLO (You Only Look Once) — processes the entire image in a single pass, making it extremely fast. The go-to choice for real-time applications.
SSD (Single Shot Detector) — similar to YOLO in processing speed, with multi-scale feature detection
Faster R-CNN — a two-stage approach (first propose regions, then classify) that is more accurate but slower
DETR — a transformer-based approach that treats detection as a set prediction problem

Business applications

Manufacturing — detecting defects on production lines at speeds no human inspector can match
Retail — shelf monitoring, customer behaviour analysis, checkout-free stores
Autonomous vehicles — detecting pedestrians, vehicles, traffic signs, and obstacles in real time
Security — detecting weapons, suspicious behaviour, or safety violations in surveillance footage
Agriculture — counting fruit on trees, detecting pests, assessing crop health from drone imagery
Healthcare — detecting nodules in medical imaging, counting cells in microscopy

Challenges

Small objects — detecting tiny objects in large images remains difficult
Occluded objects — partially hidden objects are harder to detect
Real-time constraints — balancing accuracy with speed for live video applications
Domain adaptation — a model trained on standard photos may struggle with thermal imaging, satellite imagery, or medical scans

From detection to tracking

Object detection processes individual frames. Object tracking extends this to video, following detected objects across frames to maintain identity over time.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Object detection is one of AI's most commercially mature capabilities, with proven deployments in manufacturing, retail, and logistics. Understanding its capabilities and limitations helps you identify which of your visual inspection, monitoring, or counting tasks could be automated — and what accuracy and speed to realistically expect.

Related Terms

Computer Vision

The field of AI that enables machines to interpret and understand visual information from images and videos, including object recognition, scene understanding, and visual analysis.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Image Recognition

AI's ability to identify and categorise objects, people, scenes, and other elements within images, powering applications from photo organisation to medical diagnosis.

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary