Llama 3.3 is Meta’s most capable open-source model to date. It delivers performance close to frontier commercial models like GPT-4o on many tasks, while being free to run locally. If you have the hardware for it, it’s one of the best models available in Ollama. Here’s how to set it up.
What is Llama 3.3?
Meta released Llama 3.3 in December 2024. It’s a 70 billion parameter model — the same size as Llama 3.1 70B — but trained with improved techniques that deliver noticeably better performance, particularly on:
- Reasoning and multi-step problem solving
- Code generation and debugging
- Following complex instructions
- Mathematical tasks
On most benchmarks, Llama 3.3 70B matches or exceeds models like GPT-4o mini and Claude 3 Haiku — the difference being you run it entirely on your own hardware, with no API costs and no data leaving your machine.
System Requirements
Llama 3.3 is a large model and requires serious hardware:
| Setup | RAM / VRAM needed | Expected speed |
|---|---|---|
| CPU only | 48 GB RAM minimum | Very slow (2–5 tokens/sec) |
| Single GPU (e.g. RTX 4090) | 24 GB VRAM | Fast with quantisation |
| Dual GPU or workstation | 48 GB+ VRAM | Full quality, fast |
| Mac with Apple Silicon | 64 GB unified memory | Good — Apple Silicon handles this well |
If you don’t have 48 GB of RAM, Llama 3.3 isn’t the right choice for your hardware. Consider Qwen2.5 14B or Phi-4 instead.
How to Install Llama 3.3 in Ollama
ollama pull llama3.3
The download is approximately 43 GB (for the default Q4 quantisation). This will take a while on most connections.
To run it:
ollama run llama3.3
Quantisation Options
Ollama offers different quantised versions of Llama 3.3 that trade quality for lower memory requirements:
ollama pull llama3.3:70b-instruct-q4_K_M
This is the default and the best balance of quality and size. If you need to squeeze into less RAM, try:
ollama pull llama3.3:70b-instruct-q3_K_S
Q3 reduces quality noticeably but cuts memory usage by around 20%.
What is Llama 3.3 Good At?
- Complex reasoning — chains of thought, logical deduction, analysis
- Code generation — writes clean, commented code across most languages
- Long-form writing — reports, documentation, structured content
- Summarisation — handles long documents well with its 128K context window
- Following complex prompts — reliably executes detailed, multi-part instructions
Llama 3.3 vs DeepSeek R1 on Ollama
Both are high-end 70B class models. DeepSeek R1 uses a chain-of-thought reasoning approach that makes it exceptional for maths and logical problems — it shows its working, which is useful for verification. Llama 3.3 is more balanced and faster for general tasks. For pure reasoning and maths, try DeepSeek R1. For everything else, Llama 3.3 is the better everyday choice.
Llama 3.3 vs Llama 3.1 70B — Worth Upgrading?
Yes. Llama 3.3 70B outperforms Llama 3.1 70B across almost all benchmarks and is the same size. If you’re already running Llama 3.1 70B, it’s worth switching.
