Home / AI / Ollama / How to Run Gemma 4 on Ollama (All Sizes Explained)

How to Run Gemma 4 on Ollama (All Sizes Explained)

Gemma 4 is Google’s latest open-weight model family, released in April 2026. It comes in four sizes — E2B, E4B, E12B, and E27B — all natively multimodal, meaning they handle text and images without any additional setup. The smaller variants (E2B and E4B) run comfortably on a laptop GPU, making Gemma 4 one of the most accessible high-quality models available for local inference.

Gemma 4 Model Sizes

  • Gemma 4 E2B — 2.3B parameters. Runs on 4–6GB VRAM. Fast and lightweight, good for simple tasks.
  • Gemma 4 E4B — 4B parameters. Runs on 6–8GB VRAM. The sweet spot for most laptop users — strong quality for the size.
  • Gemma 4 E12B — 12B parameters. Requires 12–16GB VRAM. Significantly stronger reasoning and coding performance.
  • Gemma 4 E27B — 27B parameters. Requires 24GB+ VRAM. Near-frontier performance for a local model.

Start with E4B if you have a modern laptop GPU. Step up to E12B or E27B if you have a desktop GPU with more VRAM.

How to Install Gemma 4 on Ollama

# Pull the default (E4B)
ollama pull gemma4

# Pull a specific size
ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:e12b
ollama pull gemma4:e27b

Running Gemma 4

# Interactive chat
ollama run gemma4

# Single prompt
ollama run gemma4 "Explain transformer attention in simple terms"

# Run a specific size
ollama run gemma4:e12b

Using Gemma 4 Vision Features

All Gemma 4 variants support image input natively. Using the Python library:

import ollama

response = ollama.chat(
    model='gemma4',
    messages=[{
        'role': 'user',
        'content': 'Describe what you see in this image',
        'images': ['screenshot.png']
    }]
)
print(response['message']['content'])

This works for diagram analysis, screenshot interpretation, photo descriptions, and more. The E4B model handles vision tasks well for its size.

Gemma 4 for Coding

Gemma 4 shows significant improvements on coding benchmarks compared to Gemma 3. The E12B and E27B variants are competitive with much larger models on standard coding tasks. For coding on a budget GPU, E4B is a solid choice:

ollama run gemma4:e4b "Write a Python function to parse a CSV file and return a list of dictionaries"

What Hardware Do You Need?

Model VRAM Required Typical Hardware
E2B 4–6GB GTX 1660, RTX 3060, integrated GPU
E4B 6–8GB RTX 3060, RTX 4060, MacBook Air M2
E12B 12–16GB RTX 3080, RTX 4080, Mac with 16GB
E27B 24GB+ RTX 3090, RTX 4090, Mac Studio 32GB

Gemma 4 vs Gemma 3 on Ollama

Gemma 4 is substantially better than Gemma 3 at the same parameter count. Key improvements include native multimodal support (Gemma 3 required a separate vision model), significantly better coding performance, and improved instruction following. If you are currently running Gemma 3, upgrading to the equivalent Gemma 4 size is worth doing.

Troubleshooting

  • Out of memory: Drop to a smaller size — E2B runs on most hardware
  • Slow on CPU: Gemma 4 is designed for GPU inference — CPU-only will be very slow above E4B
  • Model not found: Update Ollama to the latest version

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]