Home / AI / Ollama / How to Run Ollama on a Raspberry Pi

How to Run Ollama on a Raspberry Pi

Running Ollama on a Raspberry Pi lets you have a private, always-on local AI server that costs pennies to run. It won’t match the speed of a desktop GPU, but for small models and non-time-critical tasks it works surprisingly well — and it runs 24/7 on about 5 watts.

What You Need

  • Raspberry Pi 5 (8GB) — strongly recommended. The Pi 5’s performance is roughly 3x the Pi 4, making local LLMs genuinely usable. The 8GB RAM version is essential for running anything useful.
  • Raspberry Pi 4 (8GB) — works but is slower. Expect 1-3 tokens per second on small models.
  • A fast microSD card (or better, an NVMe SSD via PCIe hat for the Pi 5)
  • Raspberry Pi OS 64-bit (required — 32-bit will not work)

Installing Ollama

The standard Ollama install script works on Raspberry Pi OS 64-bit:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, start the service:

sudo systemctl enable ollama
sudo systemctl start ollama

Verify it’s running:

ollama list

Choosing a Model for Raspberry Pi

On a Pi, smaller is better. The Pi has no discrete GPU, so everything runs on CPU with ARM NEON acceleration. Models larger than 7B will be unusably slow.

Model RAM needed Speed on Pi 5 (approx) Best for
phi3:mini (3.8B) ~3GB 4-6 tok/s General chat, fast responses
gemma2:2b ~2GB 5-8 tok/s Lightweight tasks
mistral (7B) ~5GB 2-3 tok/s Better quality, slower
llama3.2:3b ~3GB 4-5 tok/s Balanced quality and speed
nomic-embed-text ~1GB Fast Embeddings for RAG

Start with phi3:mini or llama3.2:3b for the best balance of quality and speed on a Pi.

ollama pull phi3:mini
ollama run phi3:mini

Accessing Ollama Over Your Network

By default, Ollama only listens on localhost. To access it from other devices on your network, set the host environment variable:

sudo systemctl edit ollama

Add the following:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Then restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Now you can access Ollama from any device on your local network at http://your-pi-ip:11434.

Adding a Web Interface with Open WebUI

Install Open WebUI via Docker for a browser-based chat interface:

# Install Docker first if needed
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Run Open WebUI (use your Pi's local IP)
docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Open WebUI will be available at http://your-pi-ip:3000 from any browser on your network.

Improving Performance

Use an NVMe SSD (Pi 5)

The Pi 5 has a PCIe connector that supports NVMe SSDs via an M.2 hat. Swapping from a microSD card to NVMe dramatically improves model load times — from 30-60 seconds down to 5-10 seconds for a 4GB model.

Increase swap space

If you’re running 7B models on an 8GB Pi and hitting memory limits, increase swap:

sudo dphys-swapfile swapoff
sudo nano /etc/dphys-swapfile
# Set CONF_SWAPSIZE=4096
sudo dphys-swapfile setup
sudo dphys-swapfile swapon

Use quantised models

Ollama uses 4-bit quantised models by default. You can try Q2 quantisation for more aggressive memory reduction at the cost of quality:

ollama pull llama3.2:3b-instruct-q2_K

Setting Up as a Home AI Server

With Ollama running as a service and Open WebUI accessible on your network, your Pi becomes a home AI server. You can:

  • Chat with it from any device on your network (phone, laptop, tablet)
  • Build automations using the Python API
  • Connect it to Home Assistant for smart home queries
  • Use it for private document summarisation without cloud services

Realistic Expectations

A Raspberry Pi 5 (8GB) running a 3B model will give you roughly 5-8 tokens per second — similar to a moderately fast typist. For chat, this is perfectly usable. For code generation or long documents you’ll need patience, or a bigger machine. For speed comparisons, see our guide to Ollama vs LM Studio.

For more model recommendations, see the best Ollama models for summarisation and best Ollama models for roleplay and chat.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *