Practical

On-Device AI

Last reviewed: April 2026

AI models that run directly on your phone, laptop, or other hardware rather than in the cloud, offering faster responses and greater privacy.

On-device AI refers to AI models that run locally on your hardware — your phone, laptop, tablet, or embedded device — rather than sending data to cloud servers for processing. Apple Intelligence, Google's on-device features, and locally running open-source models are all examples.

Why run AI on-device?

Privacy: Your data never leaves your device. For sensitive applications (health data, financial information, personal communications), this is a significant advantage.
Latency: No network round-trip means near-instant responses. On-device inference can be 10-100x faster for simple tasks.
Offline capability: On-device AI works without an internet connection — useful for field workers, travellers, or unreliable network environments.
Cost: No per-query API charges. Once the model is on the device, inference is "free" (just battery and compute).

What makes on-device AI possible

Modern devices have specialised AI hardware:

Apple Neural Engine: Dedicated AI processor in iPhones, iPads, and Macs.
Google Tensor chips: Custom processors in Pixel phones optimised for AI.
Qualcomm AI Engine: AI acceleration in Snapdragon-powered devices.
Intel NPUs: Neural processing units in recent Intel laptop processors.

Combined with model compression techniques (quantisation, distillation, pruning), capable models can now fit on consumer devices.

Current on-device capabilities

Speech recognition: Siri, Google Assistant, and others process voice commands locally.
Photo enhancement: Computational photography features run on-device.
Text prediction: Keyboard suggestions and autocomplete.
Small language models: Models like Phi-3, Gemma, and Llama 3.2 can run on phones and laptops.
Translation: Real-time offline translation on mobile devices.

Limitations

Model size constraints: On-device models are much smaller than cloud models, limiting their capability.
Battery impact: Running AI inference drains battery faster.
Hardware requirements: Older devices may not have the specialised chips needed for acceptable performance.
Update complexity: Updating a model on millions of devices is harder than updating a cloud endpoint.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

On-device AI is reshaping the privacy and performance equation for AI applications. Understanding its capabilities helps you design AI features that work everywhere, protect sensitive data, and reduce ongoing cloud costs. For many routine AI tasks, on-device is becoming the smarter choice.

Related Terms

Small Language Model (SLM)

A language model with fewer parameters (typically under 10 billion) that trades some capability for dramatically lower cost, faster speed, and the ability to run on smaller hardware.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Latency

The time delay between sending a request to an AI model and receiving the first part of the response. Lower latency means faster, more responsive AI interactions.

Knowledge Distillation

A technique for training a smaller, faster AI model to replicate the behaviour of a larger, more capable model.

GPU (Graphics Processing Unit)

A specialised processor originally designed for rendering graphics but now essential for training and running AI models. GPUs can perform thousands of calculations simultaneously.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Choosing the Right Deployment Strategy

← Back to Glossary