Practical

Ollama

Last reviewed: April 2026

An open-source tool that makes running large language models on your own computer as simple as a single command, removing the need for cloud APIs or technical expertise.

Ollama is an open-source tool that simplifies running large language models locally on your own hardware. With a single command, you can download and run models like Llama, Mistral, Gemma, and dozens of others — no cloud subscription, no API keys, and no data leaving your machine.

Why Ollama matters

Before Ollama, running an LLM locally required significant technical expertise: setting up Python environments, managing dependencies, downloading model weights, configuring hardware acceleration, and troubleshooting compatibility issues. Ollama wraps all of this complexity into a single application that works like a package manager.

Running "ollama run llama3" downloads the model (if not already present) and starts an interactive chat session. It is that simple.

Key features

One-command model management: Download, run, and switch between models with simple commands.
Automatic hardware detection: Ollama detects your GPU and configures hardware acceleration automatically. It also runs on CPU-only machines, though more slowly.
Model library: A growing catalogue of popular models, pre-quantised for local execution. Each model is available in multiple size variants.
API compatibility: Ollama exposes a REST API that is compatible with the OpenAI API format, making it easy to swap local models into applications built for cloud APIs.
Modelfile customisation: Create custom model configurations with specific system prompts, parameters, and behaviour modifications.

Use cases for local AI

Data privacy: Sensitive documents, customer data, and proprietary information never leave your machine. This is critical for regulated industries.
Cost control: No per-token API charges. Once the hardware investment is made, inference is effectively free.
Offline access: Models work without an internet connection — useful for travel, unreliable connectivity, or air-gapped environments.
Experimentation: Try dozens of models quickly to find the best fit for your specific tasks.
Development: Build and test AI-powered applications locally before committing to cloud API costs.

Hardware requirements

Most modern laptops can run smaller models (7-8 billion parameters) with acceptable speed. For larger models (13-70 billion parameters), a dedicated GPU with 16GB+ of VRAM provides a much better experience. The sweet spot for most users is a 7-8 billion parameter model quantised to 4-bit, which runs well on machines with 16GB of RAM.

The broader ecosystem

Ollama integrates with a growing ecosystem of tools:

Open WebUI: A browser-based chat interface for Ollama models.
Continue: An IDE extension that connects to Ollama for AI-assisted coding.
LangChain/LlamaIndex: Popular AI development frameworks support Ollama as a local model provider.
AnythingLLM: A complete local AI workspace built on top of Ollama.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Ollama democratises access to AI by removing the barriers of cost, complexity, and privacy concerns. Understanding this tool helps you evaluate whether local AI deployment — with its cost savings and privacy advantages — makes sense for your organisation's specific needs.

Related Terms

Open-Source AI

AI models and tools whose code and weights are publicly available, allowing anyone to use, modify, and deploy them freely.

GGUF (GPT-Generated Unified Format)

A file format for storing quantised AI models designed for efficient local execution, widely used by tools like llama.cpp to run large language models on consumer hardware.

Quantization

A technique that reduces the precision of an AI model's numerical weights to make it smaller, faster, and cheaper to run.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Understanding AI Models and When to Use Them

← Back to Glossary