Home / AI / Ollama / How to Run Ollama with Docker

How to Run Ollama with Docker

Running Ollama in Docker lets you deploy local LLMs on any machine or server without installing anything directly on the host. It’s the cleanest approach for server deployments, CI pipelines, or anyone who wants a portable, reproducible AI environment.

Prerequisites

You’ll need Docker installed. For GPU support you’ll also need the NVIDIA Container Toolkit (NVIDIA) or ROCm (AMD). On Apple Silicon, Docker Desktop handles GPU via the Metal backend automatically.

Basic CPU Setup

Pull and run the official Ollama image:

docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Then pull a model into the running container:

docker exec -it ollama ollama pull llama3.1

Test it:

curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "Hello!", "stream": false}'

NVIDIA GPU Setup

First install the NVIDIA Container Toolkit, then run:

docker run -d \
  --gpus=all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

AMD GPU Setup

docker run -d \
  --device /dev/kfd \
  --device /dev/dri \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:rocm

Using Docker Compose

For a persistent, easily managed setup, use a docker-compose.yml:

CPU only

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

volumes:
  ollama_data:

With NVIDIA GPU

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

Start with:

docker compose up -d

Adding Open WebUI

Pair Ollama with Open WebUI for a ChatGPT-style interface in your browser. Add it to your compose file:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open_webui_data:/app/backend/data
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  open_webui_data:

Open WebUI will be available at http://localhost:3000.

Pre-pulling Models at Startup

To automatically pull models when the container starts, use an init script:

#!/bin/bash
# init-models.sh
ollama serve &
sleep 5
ollama pull llama3.1
ollama pull nomic-embed-text
wait
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
      - ./init-models.sh:/init-models.sh
    ports:
      - "11434:11434"
    entrypoint: ["/bin/bash", "/init-models.sh"]
    restart: unless-stopped

volumes:
  ollama_data:

Connecting From Other Containers

When other containers on the same Docker network need to call Ollama, use the service name as the hostname:

# From Python in another container on the same network
import ollama

client = ollama.Client(host='http://ollama:11434')
response = client.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])

Useful Docker Commands

# View logs
docker logs ollama

# List pulled models
docker exec ollama ollama list

# Pull a new model
docker exec ollama ollama pull mistral

# Remove a model to free disk space
docker exec ollama ollama rm mistral

# Stop and remove container (keeps volume data)
docker stop ollama && docker rm ollama

Choosing a Model

For server deployments, the right model depends on your hardware and use case. See the guides to best models for coding, best models for RAG, and best models for summarisation.

Next Steps

With Ollama running in Docker, the next logical step is calling it from Python to build applications, or using LangChain to build a RAG pipeline.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *