Home / AI / Ollama / How to Run Ollama in Proxmox: VM, LXC, and GPU Passthrough

How to Run Ollama in Proxmox: VM, LXC, and GPU Passthrough

Ollama running inside a Proxmox virtual machine with GPU passthrough

Running Ollama inside Proxmox gives you a clean, isolated environment for your local AI models — separate from the host system, snapshotable, and easy to rebuild or migrate. You have two options: a standard Ubuntu VM (simplest, broadest compatibility) or an LXC container (lighter weight, faster). This guide covers both, including how to pass through a GPU for full inference speed.

Why Run Ollama in Proxmox?

If your home server runs Proxmox, spinning up a dedicated VM or container for Ollama has several advantages over running it directly on the host:

  • Isolation — Ollama and its model files are contained. You can snapshot before pulling a new model, roll back if something breaks, or clone the environment entirely.
  • Resource limits — assign a fixed amount of RAM and CPU so Ollama cannot starve other VMs on the same host
  • Clean teardown — if you want to remove Ollama, you delete the VM rather than uninstalling across a system shared with other services
  • Easy migration — move the VM to a different Proxmox node, or restore it on new hardware

A VM is the most straightforward approach and has the best driver compatibility, especially for GPU passthrough.

Create the VM

In the Proxmox web UI:

  1. Upload an Ubuntu 24.04 LTS ISO to local storage
  2. Create a new VM: 4 vCPUs, 8–16 GB RAM, 80 GB disk (models are large — 5–10 GB each)
  3. Set display to VirtIO for better performance
  4. Enable QEMU Guest Agent for proper integration
  5. Install Ubuntu with minimal/server configuration

Install Ollama in the VM

curl -fsSL https://ollama.com/install.sh | sh

By default, Ollama only listens on localhost inside the VM. To make it accessible from other machines on your network (or other VMs), edit the systemd service:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl daemon-reload && sudo systemctl restart ollama

Pull and run a model to verify everything works:

ollama run qwen3:8b

Option 2 — LXC Container

LXC containers are lighter than VMs — they share the Proxmox host kernel, which means less overhead and faster startup. The trade-off is that GPU passthrough is more complex in LXC than in a VM.

Create an LXC Container

In the Proxmox web UI:

  1. Download an Ubuntu 24.04 LXC template from the template library
  2. Create a new container: 4 cores, 8 GB RAM, 80 GB disk
  3. Set the container to unprivileged: No (privileged) if you need GPU access — GPU passthrough in LXC requires a privileged container
  4. Enable nesting under Features if you want Docker inside the container

After starting, enter the container and install Ollama the same way as the VM method above.

GPU Passthrough for Full Speed

Without GPU access, Ollama runs on CPU — functional but slow for larger models. GPU passthrough routes your graphics card directly into the VM or container for full inference speed.

For NVIDIA GPUs (VM)

On the Proxmox host, enable IOMMU in the GRUB config:

# For Intel CPUs:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# For AMD CPUs:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
sudo update-grub && reboot

After reboot, verify IOMMU is active:

dmesg | grep -e DMAR -e IOMMU

Add VFIO kernel modules to /etc/modules:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Find your GPU’s PCI ID:

lspci -nn | grep -i nvidia

Add it to VFIO (replace with your IDs):

echo "options vfio-pci ids=10de:2204,10de:1aef" > /etc/modprobe.d/vfio.conf
update-initramfs -u && reboot

In the Proxmox web UI, add the GPU to your VM: VM → Hardware → Add → PCI Device. Select your GPU, tick All Functions and Primary GPU if you are passing the full card.

Inside the VM, install NVIDIA drivers:

sudo apt install nvidia-driver-535
sudo reboot

Verify the GPU is visible:

nvidia-smi

Ollama detects NVIDIA GPUs automatically — no additional configuration needed. Pull a model and it will use GPU inference.

For NVIDIA GPUs (LXC)

GPU passthrough in LXC is done differently. Add these lines to your container’s config file at /etc/pve/lxc/[VMID].conf:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 234:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

The NVIDIA drivers must be installed on the Proxmox host at the same version as what the container will use. This is the main complexity of the LXC approach — keeping driver versions in sync.

For AMD GPUs

AMD GPU passthrough follows the same VM process (IOMMU, VFIO, PCI device add). Inside the VM, AMD GPUs are typically detected automatically by Ollama via ROCm, though driver installation steps vary by GPU generation. For most AMD cards, install rocm-hip-sdk inside the Ubuntu VM.

For a home Proxmox server running Ollama alongside other services:

  • Ubuntu 24.04 LTS VM — 4 vCPUs, 16 GB RAM, 100 GB disk
  • GPU passthrough if you have a spare or dedicated GPU
  • CPU-only if you are running the 8B model family (qwen3:8b is usable on CPU inference at reasonable speed)
  • Bind Ollama to 0.0.0.0 so other VMs and the host can reach it
  • Combine with Tailscale for remote access (install Tailscale inside the VM)

Disk Space Planning

Models are large. Plan disk space accordingly before creating the VM:

Model Disk Space
qwen3:8b ~5 GB
qwen3:14b ~9 GB
qwen3:32b ~20 GB
llama3.3 (70b) ~43 GB
qwen3:30b-a3b (MoE) ~18 GB

A 100 GB virtual disk comfortably holds 4–6 medium-sized models. Proxmox supports thin provisioning, so the virtual disk only uses actual space on the physical drive as it fills.