Llama 3.2 introduced something new to the Llama family — genuinely useful small models. The 1B and 3B variants are fast enough to run on almost any hardware, including phones and Raspberry Pis, while still producing coherent, helpful responses. Here’s everything you need to know about running Llama 3.2 in Ollama.
What’s New in Llama 3.2?
Meta released Llama 3.2 in September 2024 with two main additions to the family:
- Small text models (1B and 3B) — lightweight models designed for edge devices, fast local inference, and low-resource environments
- Vision models (11B and 90B) — multimodal models that can understand images alongside text
This guide focuses on the text models. For vision use, see our guide on Ollama multimodal models.
Llama 3.2 Model Sizes in Ollama
| Model | RAM needed | Best for |
|---|---|---|
| llama3.2:1b | ~1.5 GB | Ultra-low resource, Raspberry Pi, simple tasks |
| llama3.2:3b | ~3 GB | Fast responses on older PCs, good everyday assistant |
| llama3.2:11b | ~8 GB | Vision model — text + image understanding |
| llama3.2:90b | ~55 GB | Vision model — high quality, server hardware only |
How to Install Llama 3.2 in Ollama
ollama pull llama3.2
This pulls the default 3B model. For the 1B:
ollama pull llama3.2:1b
To run it:
ollama run llama3.2
Llama 3.2 1B vs 3B — Which Should You Use?
The 1B model is very fast but noticeably limited — it can answer basic questions and follow simple instructions, but struggles with anything complex. Think of it as useful for specific, narrow tasks in applications rather than general chat.
The 3B model is significantly better in quality while still being very fast. For most people running Llama 3.2 as a lightweight local assistant, 3B is the right choice. It handles general conversation, summarisation and basic coding reasonably well.
How Does Llama 3.2 3B Compare to Larger Models?
Compared to Llama 3.1 8B or Qwen2.5 7B, the 3B model is noticeably less capable on complex tasks. The trade-off is speed — it responds much faster and runs on hardware that simply can’t handle 7B+ models. If you have a PC with 8 GB of RAM to spare, step up to a 7B model for better quality.
Good Use Cases for Llama 3.2 Small Models
- Running on a Raspberry Pi — the 1B model runs (slowly) on a Pi 4 with 4 GB RAM
- Embedded in applications — fast response time makes 1B/3B suitable for apps where latency matters
- Low-end laptops — works on machines with 4–6 GB of total RAM
- Quick local lookups — simple Q&A where you just need a fast answer
- Text classification and extraction — the 3B handles structured tasks like categorisation reliably
Llama 3.2 vs Llama 3.3 — What’s the Difference?
Llama 3.3 is a later, improved 70B model — it’s not a replacement for 3.2 but a separate high-end model. If you’re running on modest hardware, stick with Llama 3.2. If you have 48 GB+ RAM or a powerful GPU, see our guide: How to Run Llama 3.3 on Ollama.
