Gemma 4 is Google’s latest open-weight model family, released in April 2026. It comes in four sizes — E2B, E4B, E12B, and E27B — all natively multimodal, meaning they handle text and images without any additional setup. The smaller variants (E2B and E4B) run comfortably on a laptop GPU, making Gemma 4 one of the most accessible high-quality models available for local inference.
Gemma 4 Model Sizes
- Gemma 4 E2B — 2.3B parameters. Runs on 4–6GB VRAM. Fast and lightweight, good for simple tasks.
- Gemma 4 E4B — 4B parameters. Runs on 6–8GB VRAM. The sweet spot for most laptop users — strong quality for the size.
- Gemma 4 E12B — 12B parameters. Requires 12–16GB VRAM. Significantly stronger reasoning and coding performance.
- Gemma 4 E27B — 27B parameters. Requires 24GB+ VRAM. Near-frontier performance for a local model.
Start with E4B if you have a modern laptop GPU. Step up to E12B or E27B if you have a desktop GPU with more VRAM.
How to Install Gemma 4 on Ollama
# Pull the default (E4B)
ollama pull gemma4
# Pull a specific size
ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:e12b
ollama pull gemma4:e27b
Running Gemma 4
# Interactive chat
ollama run gemma4
# Single prompt
ollama run gemma4 "Explain transformer attention in simple terms"
# Run a specific size
ollama run gemma4:e12b
Using Gemma 4 Vision Features
All Gemma 4 variants support image input natively. Using the Python library:
import ollama
response = ollama.chat(
model='gemma4',
messages=[{
'role': 'user',
'content': 'Describe what you see in this image',
'images': ['screenshot.png']
}]
)
print(response['message']['content'])
This works for diagram analysis, screenshot interpretation, photo descriptions, and more. The E4B model handles vision tasks well for its size.
Gemma 4 for Coding
Gemma 4 shows significant improvements on coding benchmarks compared to Gemma 3. The E12B and E27B variants are competitive with much larger models on standard coding tasks. For coding on a budget GPU, E4B is a solid choice:
ollama run gemma4:e4b "Write a Python function to parse a CSV file and return a list of dictionaries"
What Hardware Do You Need?
| Model | VRAM Required | Typical Hardware |
|---|---|---|
| E2B | 4–6GB | GTX 1660, RTX 3060, integrated GPU |
| E4B | 6–8GB | RTX 3060, RTX 4060, MacBook Air M2 |
| E12B | 12–16GB | RTX 3080, RTX 4080, Mac with 16GB |
| E27B | 24GB+ | RTX 3090, RTX 4090, Mac Studio 32GB |
Gemma 4 vs Gemma 3 on Ollama
Gemma 4 is substantially better than Gemma 3 at the same parameter count. Key improvements include native multimodal support (Gemma 3 required a separate vision model), significantly better coding performance, and improved instruction following. If you are currently running Gemma 3, upgrading to the equivalent Gemma 4 size is worth doing.
Troubleshooting
- Out of memory: Drop to a smaller size — E2B runs on most hardware
- Slow on CPU: Gemma 4 is designed for GPU inference — CPU-only will be very slow above E4B
- Model not found: Update Ollama to the latest version
