Home / AI / Ollama / How to Run Ollama on a Raspberry Pi

Ollama

How to Run Ollama on a Raspberry Pi

3. Choosing a Model for Raspberry Pi

4. Accessing Ollama Over Your Network

5. Adding a Web Interface with Open WebUI

10. Setting Up as a Home AI Server

Running Ollama on a Raspberry Pi lets you have a private, always-on local AI server that costs pennies to run. It won’t match the speed of a desktop GPU, but for small models and non-time-critical tasks it works surprisingly well — and it runs 24/7 on about 5 watts.

What You Need

Raspberry Pi 5 (8GB) — strongly recommended. The Pi 5’s performance is roughly 3x the Pi 4, making local LLMs genuinely usable. The 8GB RAM version is essential for running anything useful.
Raspberry Pi 4 (8GB) — works but is slower. Expect 1-3 tokens per second on small models.
A fast microSD card (or better, an NVMe SSD via PCIe hat for the Pi 5)
Raspberry Pi OS 64-bit (required — 32-bit will not work)

Installing Ollama

The standard Ollama install script works on Raspberry Pi OS 64-bit:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, start the service:

sudo systemctl enable ollama
sudo systemctl start ollama

Verify it’s running:

ollama list

Choosing a Model for Raspberry Pi

On a Pi, smaller is better. The Pi has no discrete GPU, so everything runs on CPU with ARM NEON acceleration. Models larger than 7B will be unusably slow.

Model	RAM needed	Speed on Pi 5 (approx)	Best for
phi3:mini (3.8B)	~3GB	4-6 tok/s	General chat, fast responses
gemma2:2b	~2GB	5-8 tok/s	Lightweight tasks
mistral (7B)	~5GB	2-3 tok/s	Better quality, slower
llama3.2:3b	~3GB	4-5 tok/s	Balanced quality and speed
nomic-embed-text	~1GB	Fast	Embeddings for RAG

Start with phi3:mini or llama3.2:3b for the best balance of quality and speed on a Pi.

ollama pull phi3:mini
ollama run phi3:mini

Accessing Ollama Over Your Network

By default, Ollama only listens on localhost. To access it from other devices on your network, set the host environment variable:

sudo systemctl edit ollama

Add the following:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Then restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Now you can access Ollama from any device on your local network at http://your-pi-ip:11434.

Adding a Web Interface with Open WebUI

Install Open WebUI via Docker for a browser-based chat interface:

# Install Docker first if needed
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Run Open WebUI (use your Pi's local IP)
docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Open WebUI will be available at http://your-pi-ip:3000 from any browser on your network.

Improving Performance

Use an NVMe SSD (Pi 5)

The Pi 5 has a PCIe connector that supports NVMe SSDs via an M.2 hat. Swapping from a microSD card to NVMe dramatically improves model load times — from 30-60 seconds down to 5-10 seconds for a 4GB model.

Increase swap space

If you’re running 7B models on an 8GB Pi and hitting memory limits, increase swap:

sudo dphys-swapfile swapoff
sudo nano /etc/dphys-swapfile
# Set CONF_SWAPSIZE=4096
sudo dphys-swapfile setup
sudo dphys-swapfile swapon

Use quantised models

Ollama uses 4-bit quantised models by default. You can try Q2 quantisation for more aggressive memory reduction at the cost of quality:

ollama pull llama3.2:3b-instruct-q2_K

Setting Up as a Home AI Server

With Ollama running as a service and Open WebUI accessible on your network, your Pi becomes a home AI server. You can:

Chat with it from any device on your network (phone, laptop, tablet)
Build automations using the Python API
Connect it to Home Assistant for smart home queries
Use it for private document summarisation without cloud services

Realistic Expectations

A Raspberry Pi 5 (8GB) running a 3B model will give you roughly 5-8 tokens per second — similar to a moderately fast typist. For chat, this is perfectly usable. For code generation or long documents you’ll need patience, or a bigger machine. For speed comparisons, see our guide to Ollama vs LM Studio.

For more model recommendations, see the best Ollama models for summarisation and best Ollama models for roleplay and chat.

How to Run Ollama on a Raspberry Pi

Table of Contents

1. What You Need

2. Installing Ollama

3. Choosing a Model for Raspberry Pi

4. Accessing Ollama Over Your Network

5. Adding a Web Interface with Open WebUI

6. Improving Performance

7. Use an NVMe SSD (Pi 5)

8. Increase swap space

9. Use quantised models

10. Setting Up as a Home AI Server

11. Realistic Expectations

What You Need

Installing Ollama

Choosing a Model for Raspberry Pi

Accessing Ollama Over Your Network

Adding a Web Interface with Open WebUI

Improving Performance

Use an NVMe SSD (Pi 5)

Increase swap space

Use quantised models

Setting Up as a Home AI Server

Realistic Expectations

How to Run Ollama with Docker

How to Use Ollama with LangChain

Leave a Reply Cancel reply

How to Run Ollama on a Raspberry Pi

Table of Contents

What You Need

Installing Ollama

Choosing a Model for Raspberry Pi

Accessing Ollama Over Your Network

Adding a Web Interface with Open WebUI

Improving Performance

Use an NVMe SSD (Pi 5)

Increase swap space

Use quantised models

Setting Up as a Home AI Server

Realistic Expectations

How to Run Ollama with Docker

How to Use Ollama with LangChain

Sign Up For Daily Newsletter

Related Posts

Leave a Reply Cancel reply