Home / Server / Ollama GPU Not Detected: How to Fix CUDA and ROCm Errors

Ollama GPU Not Detected: How to Fix CUDA and ROCm Errors

Q: Does Ollama support AMD GPUs on Windows?

AMD ROCm support on Windows within Ollama is limited and experimental. AMD ROCm is primarily developed for Linux, and Ollama's Windows AMD GPU support is still maturing. Linux, particularly Ubuntu 22.04, is the recommended platform for reliable AMD GPU inference with Ollama.

Q: My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

Yes, in many cases. Set the HSA_OVERRIDE_GFX_VERSION environment variable to the architecture version of a supported GPU closest to your own. For example, many RDNA 2 cards work with HSA_OVERRIDE_GFX_VERSION=10.3.0. Set this variable before starting the Ollama service.

Q: How much VRAM do I need to run Ollama models fully on the GPU?

A 7B model at Q4 quantisation needs around 4 to 5GB of VRAM. A 13B model at Q4 needs around 8 to 9GB, and a 70B model at Q4 needs around 40GB or more. For UK businesses, an RTX 4060 Ti 16GB (from around GBP 520) is a practical starting point for running 13B models fully in GPU memory.

1. Why Ollama Falls Back to CPU

2. Step One: Confirm Whether Ollama Is Using Your GPU

3. Fixing NVIDIA CUDA Errors in Ollama

4. Fixing AMD ROCm Not Working With Ollama

5. WSL2 and Docker GPU Passthrough Issues

6. VRAM Limitations and Model Layer Offloading

7. Environment Variables That Control GPU Behaviour in Ollama

8. Key Takeaways

9. Related Guides

10. Frequently Asked Questions

11. Why does Ollama say it is running on CPU when I have a GPU installed?

12. Does Ollama support AMD GPUs on Windows?

13. How do I fix Ollama GPU errors in WSL2?

14. My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

15. How much VRAM do I need to run Ollama models fully on the GPU?

16. Related Posts

If you have searched for “ollama gpu not detected” then you are almost certainly staring at painfully slow inference times and a growing suspicion that your expensive graphics card is doing absolutely nothing. Whether you are running a local AI assistant for your business, testing large language models in-house, or evaluating Ollama before committing to a wider deployment, GPU acceleration is not optional. Without it, even a modest 7B model can take minutes to respond to a simple prompt.

This guide walks through every common cause of Ollama failing to detect a GPU, covering both NVIDIA CUDA errors and AMD ROCm failures, and explains how to verify and fix each one. It is written for IT managers and business owners running Windows, Linux, or WSL2 environments on their own hardware. If you are still deciding whether to self-host at all, our guide on how to run Ollama on a home server covers the full setup from scratch.

Why Ollama Falls Back to CPU

Ollama is designed to detect your GPU automatically at startup. When it does, model inference is offloaded to the GPU and responses are dramatically faster. When detection fails, Ollama silently falls back to CPU mode. There is no loud error message by default, which is exactly why so many users do not realise the problem exists until they benchmark their inference speeds or check the logs.

The root causes split cleanly into two camps: NVIDIA CUDA issues and AMD ROCm issues. Both frameworks act as the bridge between Ollama and your GPU hardware. If the framework is missing, misconfigured, or the wrong version, the bridge does not exist and Ollama falls back to CPU. Understanding which camp your hardware sits in is the first step before running any diagnostics.

There is also a third, less obvious category: environment issues. These include missing environment variables, containerisation problems (such as Docker not being configured to pass through the GPU), and permission errors in Linux that prevent Ollama from accessing GPU devices at all. Each of these is covered in detail below.

Step One: Confirm Whether Ollama Is Using Your GPU

Before fixing anything, confirm the problem. Run a model and immediately check GPU utilisation. On Windows, open Task Manager, click Performance, then select your GPU. If the GPU engine labelled “Compute” stays at zero while Ollama is processing a prompt, the GPU is not being used. On Linux, use the nvidia-smi command for NVIDIA cards or rocm-smi for AMD cards and watch the GPU utilisation column during inference.

You can also check the Ollama logs directly. On Linux, run journalctl -u ollama --no-pager | grep -i gpu. On Windows, Ollama logs are typically found at %LOCALAPPDATA%\Ollama\logs. Look for lines referencing CUDA, ROCm, or “no GPU found”. The logs will usually tell you exactly what Ollama tried to detect and why it gave up.

A quick and useful alternative is to run ollama ps whilst a model is loaded. The output will show which device the model is running on. If it says “CPU” where you expect to see your GPU name, you have confirmed the issue and can move on to the fixes below.

Fixing NVIDIA CUDA Errors in Ollama

The most common Ollama CUDA error scenario is a missing or incompatible CUDA toolkit. Ollama requires CUDA 11.3 or later for NVIDIA GPU support. The toolkit is separate from your display driver. Many users install the display driver and assume CUDA is included, but a full CUDA toolkit installation is required for compute workloads.

Download the CUDA toolkit directly from NVIDIA’s developer site and install the version appropriate for your operating system. After installation, verify it worked by running nvcc --version in a terminal. If the command is not found, your PATH environment variable may need updating to include the CUDA binary directory, typically C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin on Windows.

Driver version mismatches are the second most common NVIDIA problem. Each CUDA version requires a minimum driver version. If your driver is too old, CUDA will not function correctly. Run nvidia-smi and check the driver version shown in the top right corner of the output, then cross-reference it against NVIDIA’s CUDA compatibility table. If you need to update, download the latest Game Ready or Studio driver from NVIDIA’s website, or use a tool like GeForce Experience if available. On a business server running a data centre card such as an RTX 4000 Ada or an older Quadro, update through NVIDIA’s enterprise driver portal instead.

Install the CUDA Toolkit from developer.nvidia.com (version 11.3 minimum, 12.x recommended)
Confirm your NVIDIA driver meets the minimum version for your chosen CUDA release
Verify with nvcc --version and nvidia-smi before restarting Ollama
On Linux, ensure the nvidia-container-toolkit is installed if running Ollama in Docker
Restart the Ollama service after any driver or toolkit change

Fixing AMD ROCm Not Working With Ollama

AMD ROCm support in Ollama is excellent on Linux but limited on Windows, where it remains in early stages. If you are attempting to use an AMD GPU on Windows and ROCm is not working, the honest answer is that Windows ROCm support via Ollama is still maturing. Linux is the recommended platform for AMD GPU inference with Ollama, and Ubuntu 22.04 or 22.04-based distributions offer the best compatibility at the time of writing.

On Linux, begin by installing the ROCm stack from AMD’s official repository. AMD provides a straightforward installation script via amdgpu-install. After installation, verify with rocm-smi and check that your GPU is listed. If it is not, the most common cause is that your GPU is not on AMD’s officially supported list. Ollama does support some unofficially supported GPUs through the HSA_OVERRIDE_GFX_VERSION environment variable, which tells ROCm to treat your GPU as a different, supported architecture.

For example, if you have an RX 6600 (gfx1032) which may not be listed as supported, setting HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching Ollama can enable GPU acceleration by mapping it to a supported GFX architecture. This is not guaranteed to work perfectly for every card, but it resolves ROCm not working for a wide range of RDNA 2 and RDNA 3 consumer GPUs that AMD has not formally certified for compute workloads.

Use Linux (Ubuntu 22.04 strongly recommended) for AMD ROCm with Ollama
Install ROCm using AMD’s official amdgpu-install script
Verify GPU detection with rocm-smi after installation
For unsupported consumer GPUs, set HSA_OVERRIDE_GFX_VERSION to the nearest supported architecture
Add your user to the render and video groups: sudo usermod -a -G render,video $USER
Log out and back in after adding groups, then test again

WSL2 and Docker GPU Passthrough Issues

A significant number of UK businesses running Ollama on Windows do so inside WSL2 (Windows Subsystem for Linux) or Docker. Both introduce additional GPU passthrough requirements that are easy to miss. In WSL2, GPU support requires Windows 11 or Windows 10 21H2 or later, plus a WSL2-compatible NVIDIA driver installed on the Windows host. The driver must be 470.76 or later for CUDA in WSL2 to function. Crucially, you should not install a separate CUDA toolkit inside WSL2 unless you are sure of the versioning. The CUDA libraries within WSL2 are provided by the Windows host driver, not by a separate Linux installation.

For Docker on Linux, GPU passthrough requires the nvidia-container-toolkit package. Once installed, configure Docker to use the NVIDIA runtime by editing /etc/docker/daemon.json to add the NVIDIA runtime entry, then restart Docker. When launching Ollama in Docker, include the --gpus all flag. Without it, the container has no visibility of the host GPU regardless of what drivers are installed.

On Windows Docker Desktop, GPU support for NVIDIA is available through the WSL2 backend. Ensure GPU resources are enabled in Docker Desktop settings under Resources. AMD GPU passthrough in Docker on Windows is not currently supported in a reliable, production-ready way and should be treated as experimental for business deployments.

VRAM Limitations and Model Layer Offloading

Sometimes Ollama does detect the GPU but only partially uses it. This happens when the model you are loading is larger than your available VRAM. In this case, Ollama offloads as many layers as it can to the GPU and runs the remainder on the CPU. The result is that you see some GPU utilisation but performance is still poor. This is not a bug. It is a design feature, but it may feel indistinguishable from a GPU detection failure if you are not watching the layer offload count.

Check the Ollama logs during model loading to see how many layers were offloaded. A line such as “offloaded 20/32 layers to GPU” tells you that the model is too large for your VRAM and some layers are running on CPU. The solution is either to use a smaller quantisation variant of the model (such as a Q4_K_M instead of Q8 or full precision), reduce the context length via the OLLAMA_NUM_CTX environment variable, or upgrade your GPU to one with more VRAM.

For UK businesses buying GPU hardware specifically for local AI inference, NVIDIA RTX 4060 Ti 16GB cards offer good value at typically around GBP 500 to GBP 600, providing enough VRAM to run 13B models fully in GPU memory. For heavier workloads involving 30B or 70B models, data centre grade cards with 24GB or more of VRAM are necessary, though these carry a significantly higher price tag, often starting from around GBP 2,000 for enterprise-grade options.

GPU	VRAM	Max Model Size (Full GPU)	Approx UK Price
RTX 4060	8GB	7B (Q4)	From around GBP 280
RTX 4060 Ti 16GB	16GB	13B (Q4-Q6)	From around GBP 520
RTX 4090	24GB	30B (Q4)	From around GBP 1,800
RX 7900 XTX	24GB	30B (Q4)	From around GBP 900
RTX 6000 Ada (Pro)	48GB	70B (Q4)	From around GBP 6,500

Environment Variables That Control GPU Behaviour in Ollama

Several environment variables directly control how Ollama interacts with your GPU. Knowing these gives you precise control over GPU behaviour beyond what the default configuration provides. The most important is CUDA_VISIBLE_DEVICES for NVIDIA systems. Setting this to 0 tells CUDA to use the first GPU. Setting it to -1 effectively disables GPU use entirely, which is a common accidental misconfiguration that causes Ollama to run on CPU without any obvious error.

On AMD systems, the equivalent is HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES. Both control which GPU ROCm can see. If these are set incorrectly in a system-level profile, a bash script, or a Docker environment file, Ollama will see no GPU regardless of what hardware is installed. Always audit these variables before spending time on deeper troubleshooting.

CUDA_VISIBLE_DEVICES=0 enables the first NVIDIA GPU
CUDA_VISIBLE_DEVICES=-1 disables CUDA GPU access entirely
HIP_VISIBLE_DEVICES=0 enables the first AMD GPU via ROCm
HSA_OVERRIDE_GFX_VERSION maps an unsupported AMD GPU to a supported architecture
OLLAMA_NUM_GPU sets how many GPUs Ollama will use (useful in multi-GPU setups)
OLLAMA_NUM_CTX reduces context window to lower VRAM demand

If you are deploying Ollama as a service for your business, consider whether local GPU hardware is the right long-term approach versus cloud-hosted inference. Our comparison of Azure vs AWS vs Google Cloud for UK SMEs is worth reading if you are evaluating whether cloud-based AI inference makes more financial sense than maintaining on-premises GPU hardware.

Key Takeaways

Ollama falls back to CPU silently when GPU detection fails. Always confirm GPU usage via ollama ps, Task Manager, or GPU monitoring tools before assuming everything is working correctly.
NVIDIA CUDA errors are usually caused by a missing CUDA toolkit, an outdated driver, or a version mismatch. Install the toolkit separately and verify with nvcc --version.
AMD ROCm not working on Windows is expected at this stage. Use Linux, ideally Ubuntu 22.04, for reliable AMD GPU inference with Ollama.
For unsupported AMD consumer GPUs, the HSA_OVERRIDE_GFX_VERSION variable can unlock GPU acceleration by mapping to a supported architecture.
WSL2 GPU support requires a Windows 11 or Windows 10 21H2 host with driver 470.76 or later. Docker requires the NVIDIA container toolkit and the --gpus all flag.
Partial GPU usage often indicates a VRAM limitation, not a detection failure. Use smaller quantisation variants or reduce context length to fit models fully into GPU memory.
Environment variables such as CUDA_VISIBLE_DEVICES=-1 can silently disable GPU use. Always audit these in your shell profile, systemd service file, and Docker compose files.

Frequently Asked Questions

Why does Ollama say it is running on CPU when I have a GPU installed?

Ollama automatically falls back to CPU when it cannot detect a compatible GPU or the required compute framework (CUDA for NVIDIA, ROCm for AMD) is missing or misconfigured. This happens silently with no prominent warning. Check the Ollama logs for GPU-related lines and run ollama ps while a model is loaded to confirm which device it is actually using. In most cases the fix is installing or updating the CUDA toolkit, correcting your driver version, or resolving a misconfigured environment variable such as CUDA_VISIBLE_DEVICES being set to -1.

Does Ollama support AMD GPUs on Windows?

AMD ROCm support on Windows within Ollama is limited and should be considered experimental at this time. AMD ROCm is primarily developed for Linux, and Ollama’s Windows AMD GPU support is still maturing. If you need reliable AMD GPU inference, Linux (particularly Ubuntu 22.04) is the recommended platform. On Windows, many AMD GPU users find that Ollama will run on CPU regardless of the hardware present due to the immaturity of the Windows ROCm stack.

How do I fix Ollama GPU errors in WSL2?

GPU support in WSL2 requires Windows 11 or Windows 10 version 21H2 or later, with an NVIDIA driver version of 470.76 or higher installed on the Windows host. You should not install a separate CUDA toolkit inside WSL2 as CUDA libraries are provided by the Windows host driver through WSL2’s DirectML integration. If your GPU is still not detected inside WSL2, check that the WSL2 backend is fully updated via wsl --update in a Windows terminal, then restart WSL2 and relaunch Ollama.

My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

Yes, in many cases. AMD consumer GPUs that are not on the official ROCm supported list can often be made to work by setting the HSA_OVERRIDE_GFX_VERSION environment variable to the architecture version of a supported GPU that is closest to your own. For example, many RDNA 2 cards work with HSA_OVERRIDE_GFX_VERSION=10.3.0. This is not officially supported by AMD but is widely used in the Ollama community and works reliably for a broad range of unsupported consumer cards. Set this variable before starting the Ollama service.

How much VRAM do I need to run Ollama models fully on the GPU?

The VRAM requirement depends on the model size and quantisation level. A 7B parameter model at Q4 quantisation typically requires around 4 to 5GB of VRAM. A 13B model at Q4 requires around 8 to 9GB, and a 70B model at Q4 requires around 40GB or more. For UK businesses buying dedicated inference hardware, an RTX 4060 Ti 16GB (typically from around GBP 520) handles 13B models fully in GPU memory and is a practical starting point. If you are working with larger models regularly, look at cards with 24GB or more of VRAM.

Stuart Stafford

S Stafford is a UK-based IT consultant and technology writer with over 30 years of hands-on experience in the industry, dating back to 1993. He specialises in cybersecurity, networking, server infrastructure, and AI tools for small and medium businesses, and writes practical guides aimed at helping UK business owners understand and implement technology confidently.

Ollama GPU Not Detected: How to Fix CUDA and ROCm Errors

Table of Contents

1. Why Ollama Falls Back to CPU

2. Step One: Confirm Whether Ollama Is Using Your GPU

3. Fixing NVIDIA CUDA Errors in Ollama

4. Fixing AMD ROCm Not Working With Ollama

5. WSL2 and Docker GPU Passthrough Issues

6. VRAM Limitations and Model Layer Offloading

7. Environment Variables That Control GPU Behaviour in Ollama

8. Key Takeaways

9. Related Guides

10. Frequently Asked Questions

11. Why does Ollama say it is running on CPU when I have a GPU installed?

12. Does Ollama support AMD GPUs on Windows?

13. How do I fix Ollama GPU errors in WSL2?

14. My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

15. How much VRAM do I need to run Ollama models fully on the GPU?

16. Related Posts

Why Ollama Falls Back to CPU

Step One: Confirm Whether Ollama Is Using Your GPU

Fixing NVIDIA CUDA Errors in Ollama

Fixing AMD ROCm Not Working With Ollama

WSL2 and Docker GPU Passthrough Issues

VRAM Limitations and Model Layer Offloading

Environment Variables That Control GPU Behaviour in Ollama

Key Takeaways

Frequently Asked Questions

Why does Ollama say it is running on CPU when I have a GPU installed?

Does Ollama support AMD GPUs on Windows?

How do I fix Ollama GPU errors in WSL2?

My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

How much VRAM do I need to run Ollama models fully on the GPU?

How to Run Ollama on a Home Server

How Much RAM Do You Need to Run Ollama Models?

Ollama GPU Not Detected: How to Fix CUDA and ROCm Errors

Table of Contents

Why Ollama Falls Back to CPU

Step One: Confirm Whether Ollama Is Using Your GPU

Fixing NVIDIA CUDA Errors in Ollama

Fixing AMD ROCm Not Working With Ollama

WSL2 and Docker GPU Passthrough Issues

VRAM Limitations and Model Layer Offloading

Environment Variables That Control GPU Behaviour in Ollama

Key Takeaways

Related Guides

Frequently Asked Questions

Why does Ollama say it is running on CPU when I have a GPU installed?

Does Ollama support AMD GPUs on Windows?

How do I fix Ollama GPU errors in WSL2?

My AMD GPU is not on the ROCm supported list. Can I still use it with Ollama?

How much VRAM do I need to run Ollama models fully on the GPU?

Related Posts

How to Run Ollama on a Home Server

How Much RAM Do You Need to Run Ollama Models?

Sign Up For Daily Newsletter

Related Posts