Object Detection
A computer vision task that identifies and locates specific objects within images or video frames, drawing bounding boxes around each detected object.
Object detection is a computer vision task that not only identifies what objects are in an image but also locates where each object is by drawing a bounding box around it. It goes beyond simple classification ("this image contains a car") to provide spatial information ("there is a car at this location, a pedestrian at that location, and a traffic sign over there").
How object detection works
Modern object detection models process an image and output a list of detected objects, each with:
- A class label (what is it β car, person, dog)
- A bounding box (where is it β x, y coordinates and dimensions)
- A confidence score (how certain the model is)
Major architectures
- YOLO (You Only Look Once) β processes the entire image in a single pass, making it extremely fast. The go-to choice for real-time applications.
- SSD (Single Shot Detector) β similar to YOLO in processing speed, with multi-scale feature detection
- Faster R-CNN β a two-stage approach (first propose regions, then classify) that is more accurate but slower
- DETR β a transformer-based approach that treats detection as a set prediction problem
Business applications
- Manufacturing β detecting defects on production lines at speeds no human inspector can match
- Retail β shelf monitoring, customer behaviour analysis, checkout-free stores
- Autonomous vehicles β detecting pedestrians, vehicles, traffic signs, and obstacles in real time
- Security β detecting weapons, suspicious behaviour, or safety violations in surveillance footage
- Agriculture β counting fruit on trees, detecting pests, assessing crop health from drone imagery
- Healthcare β detecting nodules in medical imaging, counting cells in microscopy
Challenges
- Small objects β detecting tiny objects in large images remains difficult
- Occluded objects β partially hidden objects are harder to detect
- Real-time constraints β balancing accuracy with speed for live video applications
- Domain adaptation β a model trained on standard photos may struggle with thermal imaging, satellite imagery, or medical scans
From detection to tracking
Object detection processes individual frames. Object tracking extends this to video, following detected objects across frames to maintain identity over time.
Why This Matters
Object detection is one of AI's most commercially mature capabilities, with proven deployments in manufacturing, retail, and logistics. Understanding its capabilities and limitations helps you identify which of your visual inspection, monitoring, or counting tasks could be automated β and what accuracy and speed to realistically expect.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Building Your First AI Workflow