How much VRAM do I need to run Llama 4 on Ollama?

Llama 4 Scout requires approximately 20-24GB of VRAM or unified memory. An NVIDIA RTX 4090 (24GB) or an Apple Silicon Mac with 32GB unified memory are the most common options for running it locally.

What is the difference between Llama 4 Scout and Maverick?

Both use 17B active parameters per inference pass, but differ in total model size. Scout has 109B total parameters and is designed for single-GPU local use. Maverick has 400B total parameters and requires much more memory, making it better suited for multi-GPU setups.

Does Llama 4 support images in Ollama?

Yes. Llama 4 is natively multimodal and can process both text and images without any additional setup. You can pass image paths directly via the Ollama Python library or REST API.

Home / AI / Ollama / How to Run Llama 4 on Ollama (Scout and Maverick Guide)

Ollama

How to Run Llama 4 on Ollama (Scout and Maverick Guide)

1. Llama 4 Scout vs Maverick: Which Should You Run?

2. Hardware Requirements

3. How to Install Llama 4 Scout on Ollama

4. Running Llama 4 Scout

5. Using Llama 4’s Vision Capabilities

6. Llama 4 via the REST API

7. What Llama 4 Scout Is Good At

8. Llama 4 vs Llama 3.3 on Ollama

9. Troubleshooting

10. Related Guides

Llama 4 is Meta’s most capable open-weight model family to date, released in April 2026. It introduces a mixture-of-experts (MoE) architecture and native multimodal support — meaning it handles both text and images out of the box. Two variants are available to run locally via Ollama: Scout and Maverick. This guide covers what each model offers, what hardware you need, and how to get started.

Llama 4 Scout vs Maverick: Which Should You Run?

Llama 4 comes in two locally-runnable variants:

Llama 4 Scout — 17B active parameters (109B total across experts), supports up to a 10 million token context window. Requires around 20–24GB of VRAM or unified memory. This is the version most home users will run.
Llama 4 Maverick — 17B active parameters (400B total), significantly more capable but requiring substantially more memory. Best suited for multi-GPU setups or high-memory workstations.

For most users: run Scout. It delivers strong performance on a single high-end consumer GPU or an Apple Silicon Mac with 32GB+ unified memory.

Hardware Requirements

Llama 4 Scout: 24GB VRAM (NVIDIA RTX 3090/4090), or 32GB+ Apple Silicon unified memory, or 32GB RAM for CPU-only (slow)
Llama 4 Maverick: Multi-GPU setup or high-memory server — not practical for most home users

Llama 4 Scout at Q4 quantisation runs in around 20GB, making it accessible on a single RTX 4090 or Mac Studio with 32GB.

How to Install Llama 4 Scout on Ollama

Make sure you have Ollama installed and up to date before pulling Llama 4. Run ollama --version and update if needed.

# Pull Llama 4 Scout
ollama pull llama4

# Or pull Maverick (requires significantly more memory)
ollama pull llama4:maverick

The Scout model is the default when you pull llama4. Depending on your connection, the download will take several minutes.

Running Llama 4 Scout

# Start an interactive chat session
ollama run llama4

# Run a single prompt
ollama run llama4 "Explain the difference between MoE and dense transformer models"

Using Llama 4’s Vision Capabilities

Llama 4 is natively multimodal — you can pass images directly without any additional setup. Using the Ollama Python library:

import ollama

response = ollama.chat(
    model='llama4',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['path/to/your/image.jpg']
    }]
)
print(response['message']['content'])

This works with PNG, JPEG, and WebP images. Unlike earlier vision models that used a separate vision encoder, Llama 4’s multimodal capability is baked into the base model.

Llama 4 via the REST API

curl http://localhost:11434/api/chat -d '{
  "model": "llama4",
  "messages": [
    {"role": "user", "content": "What are the key features of Llama 4?"}
  ]
}'

What Llama 4 Scout Is Good At

Long context reasoning — the 10M token context window is transformative for document analysis
Coding — strong performance on standard coding benchmarks
Multimodal tasks — image description, chart analysis, visual QA
General reasoning — improved significantly over Llama 3.3

Llama 4 vs Llama 3.3 on Ollama

If you are currently running Llama 3.3 70B, Scout offers comparable or better performance with lower memory requirements due to the MoE architecture — only 17B parameters are active per inference pass. The native multimodal support is an additional capability Llama 3.3 does not have.

Troubleshooting

Out of memory: Try a lower quantisation — ollama pull llama4:scout-q4_K_M uses less VRAM
Slow inference: Ensure your GPU is being used — run ollama ps to see if the model is loaded on GPU
Model not found: Update Ollama to the latest version — Llama 4 requires a recent build

How to Run Llama 4 on Ollama (Scout and Maverick Guide)

Table of Contents

1. Llama 4 Scout vs Maverick: Which Should You Run?

2. Hardware Requirements

3. How to Install Llama 4 Scout on Ollama

4. Running Llama 4 Scout

5. Using Llama 4’s Vision Capabilities

6. Llama 4 via the REST API

7. What Llama 4 Scout Is Good At

8. Llama 4 vs Llama 3.3 on Ollama

9. Troubleshooting

10. Related Guides

Llama 4 Scout vs Maverick: Which Should You Run?

Hardware Requirements

How to Install Llama 4 Scout on Ollama

Running Llama 4 Scout

Using Llama 4’s Vision Capabilities

Llama 4 via the REST API

What Llama 4 Scout Is Good At

Llama 4 vs Llama 3.3 on Ollama

Troubleshooting

Can Perplexity AI Summarise a PDF?

How to Run Gemma 4 on Ollama (All Sizes Explained)

How to Run Llama 4 on Ollama (Scout and Maverick Guide)

Table of Contents

Llama 4 Scout vs Maverick: Which Should You Run?

Hardware Requirements

How to Install Llama 4 Scout on Ollama

Running Llama 4 Scout

Using Llama 4’s Vision Capabilities

Llama 4 via the REST API

What Llama 4 Scout Is Good At

Llama 4 vs Llama 3.3 on Ollama

Troubleshooting

Related Guides

Can Perplexity AI Summarise a PDF?

How to Run Gemma 4 on Ollama (All Sizes Explained)

Related Posts