Home / AI / Ollama / How to Run Ollama in Docker (Step-by-Step Guide)

Ollama

How to Run Ollama in Docker (Step-by-Step Guide)

5. Running with an AMD GPU (ROCm)

6. Pulling and Running Models Inside the Container

7. Docker Compose Setup: Ollama + Open WebUI

8. Persistent Model Storage with Volumes

12. GPU Not Detected Inside the Container

13. Container Can’t Reach Host Ollama

14. Models Disappearing After Container Restart

Running Ollama in Docker is one of the cleanest ways to self-host large language models on your own infrastructure. Whether you’re setting up a home lab, deploying to a production server, or just want to keep your host system clean, containerising Ollama gives you isolation, portability, and reproducibility that a bare-metal install simply can’t match. This guide walks you through everything from a basic docker run command to a full Docker Compose stack with Open WebUI, GPU passthrough, persistent storage, and environment variable tuning.

Why Run Ollama in Docker?

Isolation: Keeps Ollama and its model files completely separate from your host system
Portability: A docker-compose.yml file reproduces an identical setup on a different machine
Easy version management: Pin a specific image tag, roll back with a one-line change
Server deployments: Fits naturally into server environments already running containerised workloads
Process management: Docker restart policies give you simple daemon behaviour without writing systemd unit files

Prerequisites

Docker Engine 20.10+ and Docker Compose v2
For GPU support: NVIDIA Container Toolkit or AMD ROCm drivers
Sufficient disk space — models range from 2 GB to 70 GB+

docker --version
docker compose version

Basic Docker Run: CPU Only

docker run -d 
  -v ollama:/root/.ollama 
  -p 11434:11434 
  --name ollama 
  ollama/ollama

The -v ollama:/root/.ollama flag creates a named volume so downloaded models persist across container restarts. Verify it’s running:

curl http://localhost:11434

You should see: Ollama is running.

Running with an NVIDIA GPU

Install the NVIDIA Container Toolkit, then add --gpus=all:

docker run -d 
  --gpus=all 
  -v ollama:/root/.ollama 
  -p 11434:11434 
  --name ollama 
  ollama/ollama

Running with an AMD GPU (ROCm)

docker run -d 
  --device /dev/kfd 
  --device /dev/dri 
  -v ollama:/root/.ollama 
  -p 11434:11434 
  --name ollama 
  ollama/ollama:rocm

The :rocm tag ships with the ROCm runtime included. You still need the host-side ROCm stack installed (version 5.7+ recommended).

Pulling and Running Models Inside the Container

docker exec -it ollama ollama run llama3.2
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama list

You can also use the REST API directly from the host without exec-ing into the container:

curl http://localhost:11434/api/generate 
  -d '{"model": "llama3.2", "prompt": "What is Docker?", "stream": false}'

Docker Compose Setup: Ollama + Open WebUI

Create a docker-compose.yml file:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_PARALLEL=2
    # Uncomment for NVIDIA GPU:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

Start the stack:

docker compose up -d

Open WebUI will be available at http://localhost:3000. Uncomment the deploy block to enable NVIDIA GPU passthrough.

Persistent Model Storage with Volumes

Named Docker volumes persist models across container rebuilds. If you prefer a host directory bind mount to store models on a specific disk, replace the volume entry:

volumes:
  - /mnt/data/ollama:/root/.ollama

Environment Variables

OLLAMA_HOST — bind address (set to 0.0.0.0 for container-to-container access)
OLLAMA_MODELS — override default model storage path
OLLAMA_NUM_PARALLEL — concurrent inference requests per model (default: 1)
OLLAMA_MAX_LOADED_MODELS — models held in memory simultaneously
OLLAMA_KEEP_ALIVE — how long models stay loaded after last request (default: 5m)

Updating Ollama in Docker

For a standalone container:

docker stop ollama && docker rm ollama
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

For Compose:

docker compose pull && docker compose up -d

Models in the named volume are preserved through updates.

Troubleshooting

GPU Not Detected Inside the Container

Test NVIDIA toolkit setup:

docker run --rm --gpus=all nvidia/cuda:12.0-base nvidia-smi

If this fails, install/configure the toolkit:

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Container Can’t Reach Host Ollama

Use host.docker.internal instead of localhost from inside a container. On Linux, add to your Compose service:

extra_hosts:
  - "host.docker.internal:host-gateway"

Models Disappearing After Container Restart

Verify the volume is mounted:

docker inspect ollama | grep -A 10 Mounts

If no mount is listed, recreate the container with the -v ollama:/root/.ollama flag.

How to Run Ollama in Docker (Step-by-Step Guide)

Table of Contents

1. Why Run Ollama in Docker?

2. Prerequisites

3. Basic Docker Run: CPU Only

4. Running with an NVIDIA GPU

5. Running with an AMD GPU (ROCm)

6. Pulling and Running Models Inside the Container

7. Docker Compose Setup: Ollama + Open WebUI

8. Persistent Model Storage with Volumes

9. Environment Variables

10. Updating Ollama in Docker

11. Troubleshooting

12. GPU Not Detected Inside the Container

13. Container Can’t Reach Host Ollama

14. Models Disappearing After Container Restart

15. Related Reading

Why Run Ollama in Docker?

Prerequisites

Basic Docker Run: CPU Only

Running with an NVIDIA GPU

Running with an AMD GPU (ROCm)

Pulling and Running Models Inside the Container

Docker Compose Setup: Ollama + Open WebUI

Persistent Model Storage with Volumes

Environment Variables

Updating Ollama in Docker

Troubleshooting

GPU Not Detected Inside the Container

Container Can’t Reach Host Ollama

Models Disappearing After Container Restart

How to Set Up Open WebUI with Ollama (Complete Guide)

How to Create Custom Ollama Models with Modelfiles

How to Run Ollama in Docker (Step-by-Step Guide)

Table of Contents

Why Run Ollama in Docker?

Prerequisites

Basic Docker Run: CPU Only

Running with an NVIDIA GPU

Running with an AMD GPU (ROCm)

Pulling and Running Models Inside the Container

Docker Compose Setup: Ollama + Open WebUI

Persistent Model Storage with Volumes

Environment Variables

Updating Ollama in Docker

Troubleshooting

GPU Not Detected Inside the Container

Container Can’t Reach Host Ollama

Models Disappearing After Container Restart

Related Reading

How to Set Up Open WebUI with Ollama (Complete Guide)

How to Create Custom Ollama Models with Modelfiles

Related Posts