Running Ollama on a Raspberry Pi lets you have a private, always-on local AI server that costs pennies to run. It won’t match the speed of a desktop GPU, but for small models and non-time-critical tasks it works surprisingly well — and it runs 24/7 on about 5 watts.
What You Need
- Raspberry Pi 5 (8GB) — strongly recommended. The Pi 5’s performance is roughly 3x the Pi 4, making local LLMs genuinely usable. The 8GB RAM version is essential for running anything useful.
- Raspberry Pi 4 (8GB) — works but is slower. Expect 1-3 tokens per second on small models.
- A fast microSD card (or better, an NVMe SSD via PCIe hat for the Pi 5)
- Raspberry Pi OS 64-bit (required — 32-bit will not work)
Installing Ollama
The standard Ollama install script works on Raspberry Pi OS 64-bit:
curl -fsSL https://ollama.com/install.sh | sh
Once installed, start the service:
sudo systemctl enable ollama
sudo systemctl start ollama
Verify it’s running:
ollama list
Choosing a Model for Raspberry Pi
On a Pi, smaller is better. The Pi has no discrete GPU, so everything runs on CPU with ARM NEON acceleration. Models larger than 7B will be unusably slow.
| Model | RAM needed | Speed on Pi 5 (approx) | Best for |
|---|---|---|---|
| phi3:mini (3.8B) | ~3GB | 4-6 tok/s | General chat, fast responses |
| gemma2:2b | ~2GB | 5-8 tok/s | Lightweight tasks |
| mistral (7B) | ~5GB | 2-3 tok/s | Better quality, slower |
| llama3.2:3b | ~3GB | 4-5 tok/s | Balanced quality and speed |
| nomic-embed-text | ~1GB | Fast | Embeddings for RAG |
Start with phi3:mini or llama3.2:3b for the best balance of quality and speed on a Pi.
ollama pull phi3:mini
ollama run phi3:mini
Accessing Ollama Over Your Network
By default, Ollama only listens on localhost. To access it from other devices on your network, set the host environment variable:
sudo systemctl edit ollama
Add the following:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Then restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Now you can access Ollama from any device on your local network at http://your-pi-ip:11434.
Adding a Web Interface with Open WebUI
Install Open WebUI via Docker for a browser-based chat interface:
# Install Docker first if needed
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Run Open WebUI (use your Pi's local IP)
docker run -d \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--add-host=host.docker.internal:host-gateway \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Open WebUI will be available at http://your-pi-ip:3000 from any browser on your network.
Improving Performance
Use an NVMe SSD (Pi 5)
The Pi 5 has a PCIe connector that supports NVMe SSDs via an M.2 hat. Swapping from a microSD card to NVMe dramatically improves model load times — from 30-60 seconds down to 5-10 seconds for a 4GB model.
Increase swap space
If you’re running 7B models on an 8GB Pi and hitting memory limits, increase swap:
sudo dphys-swapfile swapoff
sudo nano /etc/dphys-swapfile
# Set CONF_SWAPSIZE=4096
sudo dphys-swapfile setup
sudo dphys-swapfile swapon
Use quantised models
Ollama uses 4-bit quantised models by default. You can try Q2 quantisation for more aggressive memory reduction at the cost of quality:
ollama pull llama3.2:3b-instruct-q2_K
Setting Up as a Home AI Server
With Ollama running as a service and Open WebUI accessible on your network, your Pi becomes a home AI server. You can:
- Chat with it from any device on your network (phone, laptop, tablet)
- Build automations using the Python API
- Connect it to Home Assistant for smart home queries
- Use it for private document summarisation without cloud services
Realistic Expectations
A Raspberry Pi 5 (8GB) running a 3B model will give you roughly 5-8 tokens per second — similar to a moderately fast typist. For chat, this is perfectly usable. For code generation or long documents you’ll need patience, or a bigger machine. For speed comparisons, see our guide to Ollama vs LM Studio.
For more model recommendations, see the best Ollama models for summarisation and best Ollama models for roleplay and chat.


