Ollama
An open-source tool that makes running large language models on your own computer as simple as a single command, removing the need for cloud APIs or technical expertise.
Ollama is an open-source tool that simplifies running large language models locally on your own hardware. With a single command, you can download and run models like Llama, Mistral, Gemma, and dozens of others β no cloud subscription, no API keys, and no data leaving your machine.
Why Ollama matters
Before Ollama, running an LLM locally required significant technical expertise: setting up Python environments, managing dependencies, downloading model weights, configuring hardware acceleration, and troubleshooting compatibility issues. Ollama wraps all of this complexity into a single application that works like a package manager.
Running "ollama run llama3" downloads the model (if not already present) and starts an interactive chat session. It is that simple.
Key features
- One-command model management: Download, run, and switch between models with simple commands.
- Automatic hardware detection: Ollama detects your GPU and configures hardware acceleration automatically. It also runs on CPU-only machines, though more slowly.
- Model library: A growing catalogue of popular models, pre-quantised for local execution. Each model is available in multiple size variants.
- API compatibility: Ollama exposes a REST API that is compatible with the OpenAI API format, making it easy to swap local models into applications built for cloud APIs.
- Modelfile customisation: Create custom model configurations with specific system prompts, parameters, and behaviour modifications.
Use cases for local AI
- Data privacy: Sensitive documents, customer data, and proprietary information never leave your machine. This is critical for regulated industries.
- Cost control: No per-token API charges. Once the hardware investment is made, inference is effectively free.
- Offline access: Models work without an internet connection β useful for travel, unreliable connectivity, or air-gapped environments.
- Experimentation: Try dozens of models quickly to find the best fit for your specific tasks.
- Development: Build and test AI-powered applications locally before committing to cloud API costs.
Hardware requirements
Most modern laptops can run smaller models (7-8 billion parameters) with acceptable speed. For larger models (13-70 billion parameters), a dedicated GPU with 16GB+ of VRAM provides a much better experience. The sweet spot for most users is a 7-8 billion parameter model quantised to 4-bit, which runs well on machines with 16GB of RAM.
The broader ecosystem
Ollama integrates with a growing ecosystem of tools:
- Open WebUI: A browser-based chat interface for Ollama models.
- Continue: An IDE extension that connects to Ollama for AI-assisted coding.
- LangChain/LlamaIndex: Popular AI development frameworks support Ollama as a local model provider.
- AnythingLLM: A complete local AI workspace built on top of Ollama.
Why This Matters
Ollama democratises access to AI by removing the barriers of cost, complexity, and privacy concerns. Understanding this tool helps you evaluate whether local AI deployment β with its cost savings and privacy advantages β makes sense for your organisation's specific needs.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Understanding AI Models and When to Use Them
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β