Home / AI / Ollama / Best GPUs for Ollama in 2026: Which Graphics Card Should You Buy?

Ollama

Best GPUs for Ollama in 2026: Which Graphics Card Should You Buy?

1. Why Your GPU Matters — and Why VRAM Is Everything

2. How Much VRAM Do You Actually Need?

3. NVIDIA vs AMD vs Apple Silicon

7. GPU Recommendations by Budget

11. Enthusiast — £1,000 and Above

Running large language models locally with Ollama has never been more accessible, but choosing the right GPU can make the difference between a smooth experience and a frustrating one. This guide cuts through the noise and tells you exactly which graphics card to buy based on your budget and the models you want to run.

Why Your GPU Matters — and Why VRAM Is Everything

When people talk about GPU performance for Ollama, they often focus on raw compute power — shader counts, clock speeds, bandwidth figures. In practice, the single most important specification is VRAM capacity.

Ollama loads model weights into GPU memory before inference begins. If the model does not fit in VRAM, it spills into system RAM and performance degrades dramatically — often to the point of being unusable for real-time conversation. A slower GPU with more VRAM will consistently outperform a faster GPU with less VRAM when running large models.

How Much VRAM Do You Actually Need?

A useful rule of thumb for Q4 quantised models (the format Ollama uses by default) is approximately 0.5 GB of VRAM per billion parameters. This gives you a practical sizing guide:

7B models (Llama 3, Mistral 7B, Gemma 2) — approximately 4 GB VRAM
13B models (Llama 2 13B, CodeLlama 13B) — approximately 7 GB VRAM
34B models (CodeLlama 34B, Yi 34B) — approximately 18–20 GB VRAM
70B models (Llama 3 70B, Qwen2 72B) — approximately 35–40 GB VRAM

Always leave a few gigabytes of headroom above the minimum. Running a model at the absolute edge of your VRAM often causes slowdowns or out-of-memory errors under load.

NVIDIA vs AMD vs Apple Silicon

NVIDIA (CUDA)

NVIDIA GPUs are the default choice for Ollama and the entire local AI ecosystem. CUDA support is mature, well-tested, and works out of the box on Windows, Linux, and macOS. If you want the path of least resistance, buy an NVIDIA card. Driver installation is straightforward, and the community troubleshooting resources are extensive.

AMD (ROCm)

AMD GPUs are a viable option, particularly on Linux where ROCm support is solid. Ollama has improved AMD support significantly, and cards like the RX 6700 XT and RX 7900 XTX work well in Linux environments. Windows ROCm support remains limited and inconsistent — if you are on Windows and considering AMD, be prepared for additional setup friction. On Linux, AMD is a legitimate budget-friendly alternative.

Apple Silicon (Metal)

Apple Silicon deserves its own category. The M-series chips use unified memory — RAM and VRAM are the same pool — which means a MacBook Pro with 32 GB of RAM effectively has 32 GB of VRAM available for Ollama. Ollama uses Apple’s Metal framework natively, and performance is excellent relative to power consumption. The M4 Max with 128 GB unified memory can run 70B models comfortably, something that would require expensive multi-GPU setups on a discrete card system.

GPU Recommendations by Budget

Budget Tier — Under £200

The RTX 3060 12GB is the standout recommendation at this price point and arguably the best value GPU in the entire Ollama ecosystem. The 12 GB VRAM figure is the key — NVIDIA also sells lower-VRAM variants of the 3060, so make sure to get the 12 GB version. It runs 7B and 13B models with ease, and Q4 quantised 13B models sit comfortably within its capacity with room to spare. For a first Ollama GPU, this is the card to buy.

The RX 6700 XT 12GB is the AMD alternative at a similar price. Performance is comparable for inference workloads, and the 12 GB VRAM gives you the same model coverage as the 3060. Best suited for Linux users who want to save money versus NVIDIA options.

Mid-Range — £200 to £500

The RTX 4070 12GB sits at the upper end of this bracket and is the best card here for users who want both speed and efficiency. The Ada Lovelace architecture is significantly faster than the 3060 for inference, and 12 GB VRAM keeps it versatile. Power consumption is low relative to performance, making it a strong choice for a machine that runs continuously.

The RTX 3090 24GB is the other major recommendation in this range — but on the used market. New 3090s are rare and overpriced, but second-hand units frequently appear for £400–£500. The 24 GB VRAM is transformative: you can run Q4 quantised 34B models entirely on-GPU, which is a capability tier above most consumer cards. If you find one in good condition at a fair price, it is worth serious consideration.

High-End — £500 to £1,000

The RTX 4080 Super 16GB is the mainstream high-end pick. Sixteen gigabytes of fast GDDR6X memory handles 13B models with ease and pushes into 20B territory. Performance is significantly faster than the 3090 for current-generation models, though the VRAM ceiling is lower.

The RTX 4090 24GB begins to appear in this price bracket on the used market as the RTX 50 series drives down second-hand prices — see the enthusiast section for detail.

Enthusiast — £1,000 and Above

The RTX 4090 24GB is the best single consumer GPU for Ollama with no caveats. Twenty-four gigabytes of extremely fast GDDR6X memory combined with the Ada Lovelace architecture means it can run 70B models in Q2 quantisation, handle every 34B model at Q4, and process responses faster than most people can read them for 7B and 13B models. If you want one card that does everything, this is it.

For users who need more than 24 GB without moving to professional hardware, a dual RTX 3090 setup can pool VRAM for a total of 48 GB. Ollama supports multi-GPU inference, allowing models too large for a single consumer card to run across two. This requires a compatible motherboard and adequate power supply, but it remains far cheaper than a data centre GPU.

GPU Comparison Table

GPU	VRAM	Approx. Price (2026)	Best For
RTX 3060 12GB	12 GB	£150–£180	7B and 13B models — best budget pick
RX 6700 XT 12GB	12 GB	£140–£170	7B and 13B on Linux
RTX 4070 12GB	12 GB	£350–£420	7B–13B, fast and efficient
RTX 3090 24GB (used)	24 GB	£400–£500	Up to 34B Q4 — best mid-range VRAM
RTX 4080 Super 16GB	16 GB	£700–£800	13B–20B, fast modern architecture
RTX 4090 24GB	24 GB	£900–£1,100 (used)	70B in Q2, best single consumer GPU
Dual RTX 3090	48 GB total	£900–£1,100	70B Q4 and very large models
Apple M2 / M3 (16–32 GB)	16–32 GB unified	Varies (Mac pricing)	7B–34B, silent and efficient
Apple M4 Max (96–128 GB)	Up to 128 GB unified	Varies (Mac pricing)	70B and beyond

What to Avoid

Cards with less than 8 GB VRAM — even 7B models at Q4 quantisation push 4 GB, leaving virtually no headroom. Cards like the RTX 4060 8GB are not recommended for serious local AI use.
The RTX 4060 Ti 16GB — a counterintuitive one, but this card uses a narrow memory bus that makes it slower than its VRAM figure suggests. The RTX 3090 is a better choice for the same money on the used market.
Older NVIDIA cards (GTX 10/20 series) — CUDA support for these generations is being phased out, and they often lack sufficient VRAM anyway.
AMD on Windows for Ollama — unless you are an experienced user prepared to troubleshoot ROCm setup, the experience is inconsistent on Windows. Stick to NVIDIA on Windows.

Verdict

For the majority of users running Ollama at home, two cards stand above the rest.

The RTX 3060 12GB is the best entry point. At around £150–£180, it runs every 7B model and most 13B models without issue. It draws modest power, is widely available second-hand, and its 12 GB VRAM makes it genuinely capable rather than a compromise. If your budget is under £200, buy this card without hesitation.

The RTX 4070 12GB is the step-up recommendation for users who want meaningfully faster inference and lower power consumption. The generational leap from the 3060 is noticeable in response speed, and the card runs cool and quietly. If you can stretch to £350–£420, the 4070 will serve you well for years.

Whatever you choose, prioritise VRAM above all else. A card with more memory will always outperform a technically faster card that forces your models into system RAM.

Best GPUs for Ollama in 2026: Which Graphics Card Should You Buy?

Table of Contents

1. Why Your GPU Matters — and Why VRAM Is Everything

2. How Much VRAM Do You Actually Need?

3. NVIDIA vs AMD vs Apple Silicon

4. NVIDIA (CUDA)

5. AMD (ROCm)

6. Apple Silicon (Metal)

7. GPU Recommendations by Budget

8. Budget Tier — Under £200

9. Mid-Range — £200 to £500

10. High-End — £500 to £1,000

11. Enthusiast — £1,000 and Above

12. GPU Comparison Table

13. What to Avoid

14. Verdict