Home / AI / Ollama / How to Run Ollama in WSL2 (Windows Subsystem for Linux)

How to Run Ollama in WSL2 (Windows Subsystem for Linux)

Ollama

Ollama has a native Windows installer, so why would you bother running it inside WSL2? For many developers, the answer comes down to toolchain consistency. If your Python environment, Docker setup, shell scripts, and development dependencies all live inside WSL2, it makes sense to keep Ollama there too. Running Ollama natively in your Linux environment means your model calls integrate cleanly with Linux-native tools, no cross-environment friction, and no need to manage two separate Ollama installations.

This guide walks through setting up Ollama inside a WSL2 Ubuntu instance on Windows, including GPU acceleration, API access from the Windows host, and auto-starting the server on WSL2 launch.

Why Run Ollama in WSL2 Rather Than Native Windows?

There are several genuine reasons to prefer WSL2 over the native Windows Ollama installer:

  • Linux-native Python tooling: Tools like virtualenv, pip, and many ML libraries behave more predictably on Linux. If you are building applications around Ollama using Python, keeping everything in WSL2 avoids path quirks and Windows-specific dependency issues.
  • Docker inside WSL2: Docker Desktop on Windows uses WSL2 as its backend. If you are running containers that call Ollama, having Ollama itself inside WSL2 means your containers can reach it over the WSL2 internal network rather than routing through the Windows host.
  • Shell scripting and automation: Bash pipelines, curl calls to the Ollama API, and model management scripts are all simpler to write and run in a Linux shell. Keeping Ollama inside WSL2 means your scripts do not need to bridge two environments.
  • Consistent dev/prod parity: If your production environment is Linux (which it almost certainly is), developing against a Linux-hosted Ollama instance avoids subtle differences in behaviour between operating systems.

That said, if you just want to run a chat interface on Windows with no development workflow attached, the native Windows installer is simpler. WSL2 is the right choice when Ollama is part of a broader Linux-based development environment.

Prerequisites

Before starting, make sure you have the following:

  • Windows 11, or Windows 10 version 2004 or later (Build 19041 and above)
  • WSL2 enabled and an Ubuntu distribution installed
  • If you want GPU acceleration: an NVIDIA GPU with up-to-date Windows drivers (version 527.xx or later)
  • At least 8 GB of RAM allocated to WSL2 for smaller models; 16 GB or more is recommended for anything beyond 7B parameter models

Enabling WSL2 and Installing Ubuntu

If you have not already set up WSL2, the process is straightforward. Open PowerShell or Windows Terminal as Administrator and run:

wsl --install

This single command enables the WSL feature, sets WSL2 as the default version, and installs Ubuntu. Your machine will need to restart once during this process. After rebooting and completing the Ubuntu first-run setup (creating a username and password), you will have a working WSL2 Ubuntu environment.

If you already have WSL installed but want to confirm you are using WSL2 rather than WSL1, run the following in PowerShell:

wsl --list --verbose

The output will show each installed distribution and its WSL version. If any distribution shows version 1, you can upgrade it with:

wsl --set-version Ubuntu 2

Once Ubuntu is running under WSL2, update the package list before installing anything:

sudo apt update && sudo apt upgrade -y

Installing Ollama Inside WSL2

With your WSL2 Ubuntu session open, installing Ollama uses exactly the same command as any Linux system. Ollama provides an official install script that detects your environment and sets up the correct binary and systemd service:

curl -fsSL https://ollama.com/install.sh | sh

The script downloads the Ollama binary, installs it to /usr/local/bin/ollama, and attempts to register a systemd service. WSL2 distributions do not always have systemd enabled by default, so there is a chance the service registration step will report an error — this is fine and does not affect functionality. You can start the server manually or configure it to launch on WSL2 startup, both of which are covered below.

Once installation completes, verify it worked:

ollama --version

Modern versions of WSL2 support systemd, which lets Ollama run as a proper background service. To enable it, open or create the file /etc/wsl.conf inside your Ubuntu instance:

sudo nano /etc/wsl.conf

Add the following:

[boot]\\nsystemd=true

Save the file, then restart your WSL2 instance from PowerShell:

wsl --shutdown

Reopen Ubuntu. With systemd running, you can enable and start the Ollama service properly:

sudo systemctl enable ollama\\nsudo systemctl start ollama

Check it is running:

sudo systemctl status ollama

Starting the Ollama Server Without systemd

If you prefer not to enable systemd, you can start the Ollama server manually in the background:

ollama serve &

The ampersand sends the process to the background so you can continue using the terminal. Output from the server will occasionally appear in your terminal session. If you want to suppress it entirely:

ollama serve > /dev/null 2>&1 &

To pull and run a model once the server is running:

ollama pull llama3\\nollama run llama3

GPU Acceleration in WSL2

This is where WSL2 has a significant advantage over a standard Linux VM. Microsoft and NVIDIA worked together to provide CUDA support through WSL2 without requiring you to install NVIDIA drivers inside the Linux environment. The GPU driver installed on the Windows side is exposed to WSL2 automatically.

To use GPU acceleration with Ollama in WSL2:

  • Install or update your NVIDIA drivers on Windows to version 527.xx or later. Do not install NVIDIA drivers inside WSL2 — doing so will break the integration.
  • The CUDA libraries are provided via a special WSL2-specific driver layer. Ollama’s install script installs the necessary CUDA components inside WSL2 automatically when it detects an NVIDIA GPU.

You can verify that your GPU is visible inside WSL2 by running:

nvidia-smi

If this returns your GPU details, CUDA passthrough is working. When you then run an Ollama model, it will automatically use the GPU. To confirm a model is running on the GPU rather than CPU, watch the Ollama server output — it will log which device is being used — or check GPU utilisation in Windows Task Manager while a model is active.

AMD GPU users: AMD support in WSL2 is more limited. ROCm, the AMD equivalent of CUDA, has partial WSL2 support depending on your GPU generation. Check the ROCm documentation for current compatibility before expecting GPU acceleration on AMD hardware under WSL2.

Accessing the Ollama API from Your Windows Host

One of the more useful aspects of running Ollama in WSL2 is that the API is automatically accessible from Windows. When Ollama runs inside WSL2 and listens on 0.0.0.0:11434 (its default), Windows networking routes traffic to WSL2 transparently via a virtual network adapter.

This means you can open a browser or run a curl command from Windows PowerShell and hit the Ollama API directly:

curl http://localhost:11434/api/tags

This works because WSL2 uses a Hyper-V virtual switch and Windows sets up automatic port forwarding between the Windows host and the WSL2 network interface. You do not need to configure anything manually — localhost on the Windows side resolves to the WSL2 instance for any port that has a process listening on it.

This means a Windows-native application — a .NET app, an Electron desktop tool, a browser extension — can call the Ollama API at http://localhost:11434 even though Ollama itself is running inside Linux. From the application’s perspective, it is just a local HTTP endpoint.

Calling the Ollama API from a Windows Application

With Ollama running in WSL2, any HTTP client on Windows can interact with it. Here is a basic example using PowerShell to send a prompt to a running model:

$body = @{\\n    model = "llama3"\\n    prompt = "Explain WSL2 in one paragraph."\\n    stream = $false\\n} | ConvertTo-Json\\n\\nInvoke-RestMethod -Uri "http://localhost:11434/api/generate" `\\n    -Method Post `\\n    -ContentType "application/json" `\\n    -Body $body

The same endpoint works from any language. A Python script running in the Windows environment (not inside WSL2) can use the requests library against http://localhost:11434 without any special configuration. This flexibility is one of the strongest arguments for the WSL2 approach — Ollama is accessible to both the Linux development environment and Windows applications simultaneously.

Auto-Starting Ollama When WSL2 Opens

If you are not using systemd, you will need to start the Ollama server each time you open a WSL2 session. The simplest way to automate this is to add a startup command to your ~/.bashrc file:

echo 'pgrep ollama > /dev/null || ollama serve > /dev/null 2>&1 &' >> ~/.bashrc

The pgrep check prevents a duplicate server from starting if Ollama is already running. This line will execute every time a new bash session starts inside WSL2.

If you want a cleaner approach using the WSL2 boot command (which runs once per WSL2 startup rather than per terminal), add this to /etc/wsl.conf:

[boot]\\ncommand = /usr/local/bin/ollama serve > /var/log/ollama.log 2>&1 &

This starts Ollama at WSL2 boot regardless of whether systemd is enabled, and logs output to a file for troubleshooting.

Common Issues and Fixes

Ollama port not accessible from Windows

If localhost:11434 is not reachable from Windows, check that Ollama is actually listening on 0.0.0.0 and not just 127.0.0.1 (loopback within WSL2 only). You can force this by setting an environment variable before starting the server:

OLLAMA_HOST=0.0.0.0 ollama serve

Also confirm that Windows Firewall is not blocking the port, and check that WSL2 is using version 2 (not WSL1, which has different networking behaviour).

GPU not detected in WSL2

If nvidia-smi fails inside WSL2, the most common cause is outdated Windows GPU drivers. Update to the latest NVIDIA driver from the NVIDIA website (not the Windows Update version, which can lag behind). Also confirm you have not accidentally installed NVIDIA drivers inside WSL2 itself — remove them if so.

WSL2 running out of memory

By default, WSL2 claims up to 50% of your total system RAM. For large language models this can be insufficient, and WSL2 may start swapping aggressively. Create or edit the file C:\\\\Users\\\\YourUsername\\\\.wslconfig in Windows to set explicit limits:

[wsl2]\\nmemory=12GB\\nswap=8GB\\nprocessors=6

Adjust the values to match your hardware. After saving, restart WSL2 with wsl --shutdown from PowerShell for the changes to take effect. If you have a machine with 32 GB RAM, allocating 16–20 GB to WSL2 gives Ollama enough headroom to run 13B parameter models comfortably.

Slow model load times

Model files stored on the Windows filesystem and accessed through WSL2’s /mnt/c/ path suffer significant I/O overhead. Ensure your Ollama model storage (~/.ollama/models) is inside the WSL2 filesystem (the Linux ext4 volume), not on a mounted Windows drive. This alone can make a large difference in load times.

Summary

Running Ollama inside WSL2 is a natural fit for developers who already work in a Linux environment on Windows. The installation process mirrors standard Linux setup, GPU passthrough works without Linux-side driver management, and the API is seamlessly accessible from both WSL2 and the Windows host. With a small amount of configuration for auto-start and memory limits, WSL2 becomes a reliable home for Ollama as part of a broader development workflow.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *