If you’ve been following the world of artificial intelligence lately, you’ve probably heard the phrase “run AI locally” thrown around — but what does it actually mean, and why should you care? Ollama is one of the most popular tools making this possible, and it’s changing the way people interact with large language models. In plain English: Ollama lets you download and run powerful AI chatbots on your own computer, with no internet connection required, no subscription fees, and none of your conversations ever leaving your machine. This guide explains exactly what Ollama is, how it works, and whether it’s right for you.
What Is Ollama?
Ollama is a free, open-source application that lets you download and run large language models (LLMs) directly on your own hardware. Think of it as a kind of app store and runtime engine for AI models — you browse a library of available models, pull the one you want, and run it entirely on your own computer.
The name is a nod to “llama,” as in the Meta Llama family of AI models, but Ollama supports many different models from many different developers. It was created to make local AI genuinely accessible — not just to machine learning researchers with specialist knowledge, but to anyone comfortable using a terminal or command line.
Before tools like Ollama existed, running a large language model locally was a genuinely painful process. You’d need to wrangle model weights in obscure formats, install complex dependencies, and write custom code just to get a basic response. Ollama wraps all of that complexity into a clean interface with simple commands.
Why Run AI Locally? The Case for Ollama
Privacy and Data Control
When you use a cloud-based AI service, every message you send is transmitted to a remote server. Your conversations may be logged, used for training data, or subject to the provider’s data retention policies. For individuals and businesses handling sensitive information — client data, confidential documents, personal health details — this is a genuine concern. With Ollama, everything stays on your device. Your prompts and the AI’s responses never leave your machine.
No Subscription Costs
Premium AI services typically cost anywhere from £15 to £20 per month per user, or significantly more for API-based access. Ollama itself is completely free. The models you run through it are also free and open-source. Once your hardware is in place, your running costs are essentially zero.
Offline Operation
Ollama works without an internet connection once the model is downloaded. This makes it useful on flights, in areas with poor connectivity, or in secure environments where internet access is restricted.
Customisation and Experimentation
Running locally means you have full control. You can experiment with different models, adjust parameters, create custom system prompts, and integrate AI into your own scripts and applications — all without worrying about rate limits, API quotas, or a vendor changing their terms of service.
How Ollama Works
The Model Library
Ollama maintains a library of pre-packaged AI models at ollama.com/library. Each model in this library has been formatted, optimised, and tested to work with Ollama’s runtime. When you want to use a model, you pull it by name — similar to how you’d pull a container image with Docker.
Quantisation and Efficient Inference
Large language models are, by definition, large. A full-precision version of a model like Llama 3 might require 60GB or more of memory — far beyond what most consumer hardware can handle. Ollama solves this with quantisation: a process that reduces the numerical precision of the model’s weights, making the model dramatically smaller and faster to run, with only a modest reduction in quality.
When you download a model through Ollama, you’re typically getting a quantised version of it — often in a format called GGUF (GPT-Generated Unified Format), which is specifically designed for efficient local inference. Ollama uses a library called llama.cpp under the hood to actually run these models.
The Local API Server
When Ollama is running, it starts a local API server on your machine (by default on port 11434). This means you can interact with your locally running AI models not just through the terminal, but through compatible applications, browser extensions, and your own code. The API is compatible with the OpenAI API schema, meaning any application already built for ChatGPT can be pointed at Ollama with minimal changes.
GPU Acceleration
If your computer has a compatible graphics card (GPU), Ollama will automatically use it to accelerate model inference. GPUs are dramatically faster than CPUs for the kind of matrix calculations involved in running LLMs. On a capable GPU, responses can stream at a comfortable reading pace.
What Hardware Do You Need?
| Scenario | RAM | GPU | What You Can Run |
|---|---|---|---|
| Minimum | 8GB | Not required | Small models (1B–3B) e.g. Phi-3 Mini, Gemma 2B |
| Comfortable | 16GB | Optional (4GB VRAM) | Mid-range (7B–8B) e.g. Llama 3.1 8B, Mistral 7B |
| Recommended | 32GB | 8GB+ VRAM | Larger models (13B–34B) with good performance |
| Enthusiast | 64GB+ | 16GB+ VRAM | 70B parameter models — approaching GPT-4 class |
One important note for Mac users: Apple Silicon Macs (M1, M2, M3, M4 series) are particularly well-suited to running Ollama. The unified memory architecture means that GPU and CPU share the same memory pool, so a MacBook Pro with 32GB of RAM can run models that would require a dedicated graphics card with 24GB of VRAM on a Windows PC.
What Models Does Ollama Support?
Llama (Meta)
Meta’s Llama family is arguably the most significant series of open-source LLMs available. Llama 3.1 and Llama 3.2 are available in multiple sizes — from 1B parameters (tiny and fast) up to 70B (large and highly capable). The 8B versions hit a sweet spot for most users.
Mistral and Mixtral
Mistral AI produces high-quality models known for their efficiency. Mistral 7B punches above its weight. Mixtral 8x7B is a “mixture of experts” model that delivers strong performance without using all its parameters simultaneously.
Gemma (Google)
Google’s Gemma models are lightweight and efficient. Gemma 2B and Gemma 7B are good choices for lower-powered hardware and are particularly strong at structured tasks and coding assistance.
Phi (Microsoft)
Microsoft’s Phi series — particularly Phi-3 and Phi-4 — are remarkable small models. Phi-3 Mini has just 3.8 billion parameters but performs comparably to models three or four times its size.
Code-Focused Models
Ollama also offers models specifically trained on code, including DeepSeek Coder, CodeLlama, and Qwen2.5-Coder. These are optimised for generating, explaining, and debugging code across many programming languages.
Quick Start: Running Your First Model
Step 1: Install Ollama
Visit ollama.com and download the installer for your operating system. On Linux, use the one-line install script:
curl -fsSL https://ollama.com/install.sh | sh
Step 2: Pull a Model
Once Ollama is installed, open your terminal and pull a model:
ollama pull llama3.2
Step 3: Run It
Once the model is downloaded, run it with:
ollama run llama3.2
You’ll see a prompt appear in your terminal. Type /bye to exit.
Who Is Ollama For?
Developers and programmers — Ollama’s local REST API makes it easy to integrate AI into your own applications and scripts, with zero cost during development and testing.
Privacy-conscious users — Anyone handling sensitive data who doesn’t want their conversations logged by a corporation. If the data never leaves your machine, there’s nothing to be concerned about.
Tinkerers and enthusiasts — If you enjoy understanding how technology works and experimenting with it, Ollama is endlessly fascinating. You can compare different models, create custom system prompts, and explore the frontier of what your hardware can do.
Businesses with compliance requirements — Organisations subject to GDPR, HIPAA, or other data regulations may be unable to use cloud AI services for certain types of data processing. Running LLMs locally through Ollama can satisfy those requirements.
What Are the Limitations?
Local models, even large ones, generally lag behind the most advanced commercial AI systems like GPT-4o or Claude Sonnet in raw capability. The gap is narrowing quickly, but for the most demanding tasks the best cloud models still have an edge. You’re also limited by your hardware — running a 70B parameter model on a laptop isn’t practical for most people.
Conclusion: Is Ollama Worth It?
Ollama represents something genuinely significant: the democratisation of large language model technology. A few years ago, running a capable AI model locally would have required expensive specialist hardware and deep technical knowledge. Today, Ollama makes it possible for anyone with a modern laptop or desktop to run capable AI models privately, for free, in minutes.
Whether Ollama is right for you depends on your priorities. If you value privacy, dislike subscription fees, want to experiment freely, or simply find the idea of running AI on your own hardware appealing — Ollama is absolutely worth trying. For most people, the answer is probably both: use a cloud service for everyday tasks, and keep Ollama around for the sensitive work, the late-night experiments, and the satisfaction of knowing that your AI runs entirely under your control.


