Home / AI / Ollama / How to Run Llama 3.2 on Ollama (Small Models Explained)

How to Run Llama 3.2 on Ollama (Small Models Explained)

Llama 3.2 introduced something new to the Llama family — genuinely useful small models. The 1B and 3B variants are fast enough to run on almost any hardware, including phones and Raspberry Pis, while still producing coherent, helpful responses. Here’s everything you need to know about running Llama 3.2 in Ollama.

What’s New in Llama 3.2?

Meta released Llama 3.2 in September 2024 with two main additions to the family:

  • Small text models (1B and 3B) — lightweight models designed for edge devices, fast local inference, and low-resource environments
  • Vision models (11B and 90B) — multimodal models that can understand images alongside text

This guide focuses on the text models. For vision use, see our guide on Ollama multimodal models.

Llama 3.2 Model Sizes in Ollama

Model RAM needed Best for
llama3.2:1b ~1.5 GB Ultra-low resource, Raspberry Pi, simple tasks
llama3.2:3b ~3 GB Fast responses on older PCs, good everyday assistant
llama3.2:11b ~8 GB Vision model — text + image understanding
llama3.2:90b ~55 GB Vision model — high quality, server hardware only

How to Install Llama 3.2 in Ollama

ollama pull llama3.2

This pulls the default 3B model. For the 1B:

ollama pull llama3.2:1b

To run it:

ollama run llama3.2

Llama 3.2 1B vs 3B — Which Should You Use?

The 1B model is very fast but noticeably limited — it can answer basic questions and follow simple instructions, but struggles with anything complex. Think of it as useful for specific, narrow tasks in applications rather than general chat.

The 3B model is significantly better in quality while still being very fast. For most people running Llama 3.2 as a lightweight local assistant, 3B is the right choice. It handles general conversation, summarisation and basic coding reasonably well.

How Does Llama 3.2 3B Compare to Larger Models?

Compared to Llama 3.1 8B or Qwen2.5 7B, the 3B model is noticeably less capable on complex tasks. The trade-off is speed — it responds much faster and runs on hardware that simply can’t handle 7B+ models. If you have a PC with 8 GB of RAM to spare, step up to a 7B model for better quality.

Good Use Cases for Llama 3.2 Small Models

  • Running on a Raspberry Pi — the 1B model runs (slowly) on a Pi 4 with 4 GB RAM
  • Embedded in applications — fast response time makes 1B/3B suitable for apps where latency matters
  • Low-end laptops — works on machines with 4–6 GB of total RAM
  • Quick local lookups — simple Q&A where you just need a fast answer
  • Text classification and extraction — the 3B handles structured tasks like categorisation reliably

Llama 3.2 vs Llama 3.3 — What’s the Difference?

Llama 3.3 is a later, improved 70B model — it’s not a replacement for 3.2 but a separate high-end model. If you’re running on modest hardware, stick with Llama 3.2. If you have 48 GB+ RAM or a powerful GPU, see our guide: How to Run Llama 3.3 on Ollama.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *