DeepSeek R1 and Llama 3.1 are two of the best open-source models you can run locally with Ollama — but they’re built for different things. Llama 3.1 is a strong all-round model; DeepSeek R1 is a reasoning specialist. Here’s how they compare in practice.
The Key Difference
Llama 3.1 is a standard instruction-tuned model. It generates answers quickly and handles a wide range of tasks well — chat, coding, summarisation, writing.
DeepSeek R1 is a reasoning model. Before answering, it thinks through the problem step by step, outputting its reasoning in <think> tags. This makes it slower but significantly more accurate on problems that require logic, calculation, or multi-step planning.
Head-to-Head Comparison
| Feature | DeepSeek R1 7B | Llama 3.1 8B |
|---|---|---|
| Model type | Reasoning (chain-of-thought) | Standard instruction-tuned |
| Response speed | Slower (thinks first) | Faster |
| Maths / logic | Excellent | Good |
| Coding | Excellent | Very good |
| General chat | Good | Excellent |
| Summarisation | Good | Excellent |
| Context window | 128K tokens | 128K tokens |
| RAM needed (4-bit) | ~5GB | ~6GB |
| Shows reasoning | Yes (think tags) | No |
| Ollama command | ollama pull deepseek-r1 |
ollama pull llama3.1 |
Benchmark Comparison
| Benchmark | DeepSeek R1 7B | Llama 3.1 8B |
|---|---|---|
| MATH-500 (maths) | ~92% | ~84% |
| GSM8K (maths reasoning) | ~91% | ~84% |
| HumanEval (coding) | ~82% | ~72% |
| MMLU (general knowledge) | ~70% | ~73% |
Benchmarks are approximate and vary by quantisation level and test methodology.
Coding Tasks
Both models are strong at coding, but DeepSeek R1 tends to produce more correct solutions on complex algorithmic problems because it reasons through the logic before writing code. Llama 3.1 is faster and still very capable for everyday coding tasks.
For a full breakdown, see DeepSeek R1 for coding and the best Ollama models for coding.
Maths and Reasoning
This is where DeepSeek R1 clearly wins. Its chain-of-thought approach means it works through multi-step problems rather than pattern-matching to an answer. For anything involving calculation, proof, or logical deduction, R1 is the better tool. See DeepSeek R1 for maths and reasoning.
General Chat and Writing
Llama 3.1 wins here. It’s more conversational, responds faster, and doesn’t spend time “thinking” when a question doesn’t require it. DeepSeek R1 can feel over-engineered for simple questions — it’ll sometimes write three paragraphs of internal reasoning before answering “what’s the capital of France?”
Speed
Llama 3.1 is noticeably faster because it generates the answer directly. DeepSeek R1 produces additional thinking tokens before the response, which adds latency. On a typical query, R1 might take 2-3x as long. For real-time chat this is noticeable; for batch processing it matters less.
When to Use DeepSeek R1
- Maths problems, proofs, or calculation tasks
- Complex coding challenges where correctness matters more than speed
- Multi-step reasoning or planning tasks
- Problems where you want to see the working, not just the answer
When to Use Llama 3.1
- Everyday chat and Q&A
- Writing, editing, and summarisation
- Situations where response speed matters
- General-purpose use where you switch between many task types
Can You Run Both?
Yes — Ollama lets you switch between models instantly. Many users keep both pulled and switch based on the task:
ollama pull deepseek-r1
ollama pull llama3.1
# Use R1 for hard problems
ollama run deepseek-r1
# Use Llama for general chat
ollama run llama3.1
See the full setup guide: How to run DeepSeek R1 on Ollama.


