Home / AI / Ollama / Llama 3 vs Mistral on Ollama: Which Model Should You Run?

Llama 3 vs Mistral on Ollama: Which Model Should You Run?

When it comes to running local LLMs with Ollama, two models come up in almost every conversation: Llama 3 from Meta and Mistral from Mistral AI. Both are excellent open-source models that run well on consumer hardware — but they have different strengths. This guide helps you decide which to use for your workload.

Quick Overview

Llama 3 is Meta’s flagship open-source model family. The 8B version is one of the most capable small models available and punches well above its weight class. The 70B version competes with GPT-4 on many benchmarks.

Mistral 7B was the model that proved small models could be genuinely useful. It’s fast, efficient, and remarkably capable for its size. Mistral AI has since released Mistral Nemo (12B) and Mistral Small as successors.

How to Run Them in Ollama

# Run Llama 3.1 8B
ollama run llama3.1

# Run Llama 3.1 70B (requires ~40GB RAM/VRAM)
ollama run llama3.1:70b

# Run Mistral 7B
ollama run mistral

# Run Mistral Nemo 12B
ollama run mistral-nemo

Llama 3 vs Mistral: Benchmark Comparison

Benchmark Llama 3.1 8B Mistral 7B
MMLU (general knowledge) ~73% ~64%
HumanEval (coding) ~72% ~30%
GSM8K (maths reasoning) ~84% ~52%
Context window 128K tokens 32K tokens
Inference speed (approx) Moderate Fast
RAM needed (4-bit) ~6GB ~5GB

Coding

Llama 3.1 is significantly stronger at coding. Its HumanEval score (~72%) is far ahead of Mistral 7B (~30%), making it the better default for code generation, debugging, and explanation tasks. If you’re using a local model for coding, Llama 3.1 is the clear choice.

See our full guide to the best Ollama models for coding for a broader comparison including CodeLlama and DeepSeek Coder.

Reasoning and Maths

Again, Llama 3.1 leads on maths and logical reasoning benchmarks. Its GSM8K score is substantially higher than Mistral 7B. For tasks that require multi-step reasoning or calculation, Llama 3.1 handles them more reliably.

For dedicated maths use cases, see the best Ollama models for maths.

Speed

Mistral 7B is slightly faster in practice due to its architecture and lower parameter count. If you’re running on CPU-only hardware or a machine with limited VRAM, Mistral will give you snappier responses. On a GPU with 8GB+ VRAM, the difference becomes negligible.

Context Window

Llama 3.1’s 128K context window is a massive advantage for long document processing, summarisation, and extended conversations. Mistral 7B’s 32K window is still generous but limits how much text you can pass in at once.

For summarisation tasks where context length matters, see the best Ollama models for summarisation.

Instruction Following and Chat

Both models have well-tuned instruction-following versions. Llama 3.1’s instruct variant is generally considered to follow complex, multi-part instructions more reliably. For conversational use and roleplay, both perform well — Mistral has a reputation for being creative and engaging. See the best Ollama models for roleplay and chat for more on this.

When to Choose Llama 3.1

  • You need the best coding performance from a small local model
  • You’re working with long documents (128K context)
  • You want the highest accuracy on general reasoning tasks
  • You have 8GB+ VRAM or 16GB+ RAM

When to Choose Mistral

  • You’re on CPU-only hardware and need faster responses
  • You want a creative, conversational assistant
  • You’re constrained to 6-8GB RAM and need the fastest 7B model
  • You’re experimenting with fine-tuning (Mistral’s licence is very permissive)

What About Mistral Nemo?

Mistral Nemo (12B) is worth considering as a middle ground. It’s more capable than Mistral 7B, has a 128K context window, and still runs on consumer hardware with 10-12GB VRAM. Run it with ollama run mistral-nemo. It narrows the gap with Llama 3.1 8B considerably.

Verdict

For most tasks in 2024, Llama 3.1 8B is the better default model. It outperforms Mistral 7B on coding, reasoning, and long-context tasks while running on similar hardware. Mistral retains an edge for pure speed on constrained hardware and for creative conversational use.

If you’re just getting started, pull both and compare them on your own use case: ollama pull llama3.1 and ollama pull mistral.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *