Home / AI / Ollama / How to Run Gemma 3 on Ollama (Google’s Multimodal Model)

Ollama

How to Run Gemma 3 on Ollama (Google’s Multimodal Model)

2. Gemma 3 Model Sizes in Ollama

3. How to Install Gemma 3 in Ollama

4. How to Use Gemma 3 with Images

6. Gemma 3 vs Llama 3.2 Vision

Gemma 3 is Google’s latest open-source model family and one of the most versatile models you can run in Ollama. What makes it stand out is its support for both text and images — making it one of the few multimodal models available locally at this size. Here’s how to get started.

What is Gemma 3?

Gemma 3 was released by Google DeepMind in March 2025 and represents a significant step up from Gemma 2. The key improvements are:

Multimodal support — Gemma 3 can understand and describe images, not just text
Longer context — 128K token context window (up from 8K in Gemma 2)
Better instruction following — more reliable at complex, multi-step prompts
Multilingual — supports over 140 languages

Gemma 3 Model Sizes in Ollama

Model	RAM needed	Notes
gemma3:1b	~2 GB	Ultra-lightweight, basic tasks
gemma3:4b	~4 GB	Good for older hardware, solid quality
gemma3:12b	~9 GB	Recommended — best balance of size and capability
gemma3:27b	~18 GB	High quality, needs 32 GB RAM

How to Install Gemma 3 in Ollama

ollama pull gemma3

This pulls the default 4B model. For the recommended 12B:

ollama pull gemma3:12b

To run it:

ollama run gemma3:12b

How to Use Gemma 3 with Images

Gemma 3 supports image inputs from the command line. To analyse an image:

ollama run gemma3:12b "What is in this image?" --image /path/to/image.jpg

Or via the API:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:12b",
  "prompt": "Describe what you see in this image",
  "images": ["base64_encoded_image_here"]
}'

For a more user-friendly way to use image inputs, Open WebUI supports image uploads with Gemma 3 out of the box.

What is Gemma 3 Good At?

Image understanding — describe photos, read text in images, analyse diagrams
Long document processing — the 128K context makes it well suited for summarising lengthy texts
Multilingual content — strong performance across many languages
General knowledge Q&A — well-rounded responses across most topics
Creative writing — better than most small models at creative tasks

Gemma 3 vs Llama 3.2 Vision

Both support images locally. Gemma 3 tends to give more detailed image descriptions and handles longer context better. Llama 3.2 Vision has a larger community and more third-party integrations. For most users, Gemma 3 12B is the better choice if multimodal capability is a priority.

Gemma 3 vs Phi-4

Phi-4 is stronger on pure reasoning and maths. Gemma 3 is more versatile — better at creative tasks, images, and multilingual content. If you want one model that handles a wide range of tasks, Gemma 3 12B is a strong pick.

How to Run Gemma 3 on Ollama (Google’s Multimodal Model)

Table of Contents

1. What is Gemma 3?

2. Gemma 3 Model Sizes in Ollama

3. How to Install Gemma 3 in Ollama

4. How to Use Gemma 3 with Images

5. What is Gemma 3 Good At?

6. Gemma 3 vs Llama 3.2 Vision

7. Gemma 3 vs Phi-4

8. Related Guides

What is Gemma 3?

Gemma 3 Model Sizes in Ollama

How to Install Gemma 3 in Ollama

How to Use Gemma 3 with Images

What is Gemma 3 Good At?

Gemma 3 vs Llama 3.2 Vision

Gemma 3 vs Phi-4

How to Run Phi-4 on Ollama (Microsoft’s Best Small Model)

How to Run Llama 3.2 on Ollama (Small Models Explained)

How to Run Gemma 3 on Ollama (Google’s Multimodal Model)

Table of Contents

What is Gemma 3?

Gemma 3 Model Sizes in Ollama

How to Install Gemma 3 in Ollama

How to Use Gemma 3 with Images

What is Gemma 3 Good At?

Gemma 3 vs Llama 3.2 Vision

Gemma 3 vs Phi-4

Related Guides

How to Run Phi-4 on Ollama (Microsoft’s Best Small Model)

How to Run Llama 3.2 on Ollama (Small Models Explained)

Related Posts