Home / AI / Ollama / How to Run Llama 3.3 on Ollama (70B Model Guide)

How to Run Llama 3.3 on Ollama (70B Model Guide)

Llama 3.3 is Meta’s most capable open-source model to date. It delivers performance close to frontier commercial models like GPT-4o on many tasks, while being free to run locally. If you have the hardware for it, it’s one of the best models available in Ollama. Here’s how to set it up.

What is Llama 3.3?

Meta released Llama 3.3 in December 2024. It’s a 70 billion parameter model — the same size as Llama 3.1 70B — but trained with improved techniques that deliver noticeably better performance, particularly on:

  • Reasoning and multi-step problem solving
  • Code generation and debugging
  • Following complex instructions
  • Mathematical tasks

On most benchmarks, Llama 3.3 70B matches or exceeds models like GPT-4o mini and Claude 3 Haiku — the difference being you run it entirely on your own hardware, with no API costs and no data leaving your machine.

System Requirements

Llama 3.3 is a large model and requires serious hardware:

Setup RAM / VRAM needed Expected speed
CPU only 48 GB RAM minimum Very slow (2–5 tokens/sec)
Single GPU (e.g. RTX 4090) 24 GB VRAM Fast with quantisation
Dual GPU or workstation 48 GB+ VRAM Full quality, fast
Mac with Apple Silicon 64 GB unified memory Good — Apple Silicon handles this well

If you don’t have 48 GB of RAM, Llama 3.3 isn’t the right choice for your hardware. Consider Qwen2.5 14B or Phi-4 instead.

How to Install Llama 3.3 in Ollama

ollama pull llama3.3

The download is approximately 43 GB (for the default Q4 quantisation). This will take a while on most connections.

To run it:

ollama run llama3.3

Quantisation Options

Ollama offers different quantised versions of Llama 3.3 that trade quality for lower memory requirements:

ollama pull llama3.3:70b-instruct-q4_K_M

This is the default and the best balance of quality and size. If you need to squeeze into less RAM, try:

ollama pull llama3.3:70b-instruct-q3_K_S

Q3 reduces quality noticeably but cuts memory usage by around 20%.

What is Llama 3.3 Good At?

  • Complex reasoning — chains of thought, logical deduction, analysis
  • Code generation — writes clean, commented code across most languages
  • Long-form writing — reports, documentation, structured content
  • Summarisation — handles long documents well with its 128K context window
  • Following complex prompts — reliably executes detailed, multi-part instructions

Llama 3.3 vs DeepSeek R1 on Ollama

Both are high-end 70B class models. DeepSeek R1 uses a chain-of-thought reasoning approach that makes it exceptional for maths and logical problems — it shows its working, which is useful for verification. Llama 3.3 is more balanced and faster for general tasks. For pure reasoning and maths, try DeepSeek R1. For everything else, Llama 3.3 is the better everyday choice.

Llama 3.3 vs Llama 3.1 70B — Worth Upgrading?

Yes. Llama 3.3 70B outperforms Llama 3.1 70B across almost all benchmarks and is the same size. If you’re already running Llama 3.1 70B, it’s worth switching.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *