Running Ollama in Docker is one of the cleanest ways to self-host large language models on your own infrastructure. Whether you’re setting up a home lab, deploying to a production server, or just want to keep your host system clean, containerising Ollama gives you isolation, portability, and reproducibility that a bare-metal install simply can’t match. This guide walks you through everything from a basic docker run command to a full Docker Compose stack with Open WebUI, GPU passthrough, persistent storage, and environment variable tuning.
Why Run Ollama in Docker?
- Isolation: Keeps Ollama and its model files completely separate from your host system
- Portability: A
docker-compose.ymlfile reproduces an identical setup on a different machine - Easy version management: Pin a specific image tag, roll back with a one-line change
- Server deployments: Fits naturally into server environments already running containerised workloads
- Process management: Docker restart policies give you simple daemon behaviour without writing systemd unit files
Prerequisites
- Docker Engine 20.10+ and Docker Compose v2
- For GPU support: NVIDIA Container Toolkit or AMD ROCm drivers
- Sufficient disk space — models range from 2 GB to 70 GB+
docker --version
docker compose version
Basic Docker Run: CPU Only
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
The -v ollama:/root/.ollama flag creates a named volume so downloaded models persist across container restarts. Verify it’s running:
curl http://localhost:11434
You should see: Ollama is running.
Running with an NVIDIA GPU
Install the NVIDIA Container Toolkit, then add --gpus=all:
docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
Running with an AMD GPU (ROCm)
docker run -d \
--device /dev/kfd \
--device /dev/dri \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:rocm
The :rocm tag ships with the ROCm runtime included. You still need the host-side ROCm stack installed (version 5.7+ recommended).
Pulling and Running Models Inside the Container
docker exec -it ollama ollama run llama3.2
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama list
You can also use the REST API directly from the host without exec-ing into the container:
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.2", "prompt": "What is Docker?", "stream": false}'
Docker Compose Setup: Ollama + Open WebUI
Create a docker-compose.yml file:
services:
ollama:
image: ollama/ollama
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_NUM_PARALLEL=2
# Uncomment for NVIDIA GPU:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- open_webui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
volumes:
ollama_data:
open_webui_data:
Start the stack:
docker compose up -d
Open WebUI will be available at http://localhost:3000. Uncomment the deploy block to enable NVIDIA GPU passthrough.
Persistent Model Storage with Volumes
Named Docker volumes persist models across container rebuilds. If you prefer a host directory bind mount to store models on a specific disk, replace the volume entry:
volumes:
- /mnt/data/ollama:/root/.ollama
Environment Variables
OLLAMA_HOST— bind address (set to0.0.0.0for container-to-container access)OLLAMA_MODELS— override default model storage pathOLLAMA_NUM_PARALLEL— concurrent inference requests per model (default: 1)OLLAMA_MAX_LOADED_MODELS— models held in memory simultaneouslyOLLAMA_KEEP_ALIVE— how long models stay loaded after last request (default:5m)
Updating Ollama in Docker
For a standalone container:
docker stop ollama && docker rm ollama
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
For Compose:
docker compose pull && docker compose up -d
Models in the named volume are preserved through updates.
Troubleshooting
GPU Not Detected Inside the Container
Test NVIDIA toolkit setup:
docker run --rm --gpus=all nvidia/cuda:12.0-base nvidia-smi
If this fails, install/configure the toolkit:
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Container Can’t Reach Host Ollama
Use host.docker.internal instead of localhost from inside a container. On Linux, add to your Compose service:
extra_hosts:
- "host.docker.internal:host-gateway"
Models Disappearing After Container Restart
Verify the volume is mounted:
docker inspect ollama | grep -A 10 Mounts
If no mount is listed, recreate the container with the -v ollama:/root/.ollama flag.


