Home / AI / Ollama / How to Run Ollama in Docker (Step-by-Step Guide)

How to Run Ollama in Docker (Step-by-Step Guide)

Ollama

Running Ollama in Docker is one of the cleanest ways to self-host large language models on your own infrastructure. Whether you’re setting up a home lab, deploying to a production server, or just want to keep your host system clean, containerising Ollama gives you isolation, portability, and reproducibility that a bare-metal install simply can’t match. This guide walks you through everything from a basic docker run command to a full Docker Compose stack with Open WebUI, GPU passthrough, persistent storage, and environment variable tuning.

Why Run Ollama in Docker?

  • Isolation: Keeps Ollama and its model files completely separate from your host system
  • Portability: A docker-compose.yml file reproduces an identical setup on a different machine
  • Easy version management: Pin a specific image tag, roll back with a one-line change
  • Server deployments: Fits naturally into server environments already running containerised workloads
  • Process management: Docker restart policies give you simple daemon behaviour without writing systemd unit files

Prerequisites

  • Docker Engine 20.10+ and Docker Compose v2
  • For GPU support: NVIDIA Container Toolkit or AMD ROCm drivers
  • Sufficient disk space — models range from 2 GB to 70 GB+
docker --version
docker compose version

Basic Docker Run: CPU Only

docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

The -v ollama:/root/.ollama flag creates a named volume so downloaded models persist across container restarts. Verify it’s running:

curl http://localhost:11434

You should see: Ollama is running.

Running with an NVIDIA GPU

Install the NVIDIA Container Toolkit, then add --gpus=all:

docker run -d \
  --gpus=all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Running with an AMD GPU (ROCm)

docker run -d \
  --device /dev/kfd \
  --device /dev/dri \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:rocm

The :rocm tag ships with the ROCm runtime included. You still need the host-side ROCm stack installed (version 5.7+ recommended).

Pulling and Running Models Inside the Container

docker exec -it ollama ollama run llama3.2
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama list

You can also use the REST API directly from the host without exec-ing into the container:

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "What is Docker?", "stream": false}'

Docker Compose Setup: Ollama + Open WebUI

Create a docker-compose.yml file:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_PARALLEL=2
    # Uncomment for NVIDIA GPU:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

Start the stack:

docker compose up -d

Open WebUI will be available at http://localhost:3000. Uncomment the deploy block to enable NVIDIA GPU passthrough.

Persistent Model Storage with Volumes

Named Docker volumes persist models across container rebuilds. If you prefer a host directory bind mount to store models on a specific disk, replace the volume entry:

volumes:
  - /mnt/data/ollama:/root/.ollama

Environment Variables

  • OLLAMA_HOST — bind address (set to 0.0.0.0 for container-to-container access)
  • OLLAMA_MODELS — override default model storage path
  • OLLAMA_NUM_PARALLEL — concurrent inference requests per model (default: 1)
  • OLLAMA_MAX_LOADED_MODELS — models held in memory simultaneously
  • OLLAMA_KEEP_ALIVE — how long models stay loaded after last request (default: 5m)

Updating Ollama in Docker

For a standalone container:

docker stop ollama && docker rm ollama
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

For Compose:

docker compose pull && docker compose up -d

Models in the named volume are preserved through updates.

Troubleshooting

GPU Not Detected Inside the Container

Test NVIDIA toolkit setup:

docker run --rm --gpus=all nvidia/cuda:12.0-base nvidia-smi

If this fails, install/configure the toolkit:

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Container Can’t Reach Host Ollama

Use host.docker.internal instead of localhost from inside a container. On Linux, add to your Compose service:

extra_hosts:
  - "host.docker.internal:host-gateway"

Models Disappearing After Container Restart

Verify the volume is mounted:

docker inspect ollama | grep -A 10 Mounts

If no mount is listed, recreate the container with the -v ollama:/root/.ollama flag.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *