Home / AI / Ollama / How to Install Ollama on Mac — Apple Silicon and Intel Guide

How to Install Ollama on Mac — Apple Silicon and Intel Guide

Ollama

Ollama has quickly become the go-to tool for running large language models locally, and Mac users are in a particularly strong position to take advantage of it. Whether you’re on a modern Apple Silicon Mac with unified memory or an older Intel machine, Ollama runs natively on macOS with minimal configuration. This guide walks you through every step — from installation to running your first model — with clear separation between Apple Silicon and Intel performance expectations so you know exactly what to expect from your hardware.

Why Mac Is an Excellent Platform for Ollama

Apple Silicon Macs — covering the M1, M2, M3, and M4 chip families — have a significant architectural advantage when running local AI models. The key is unified memory architecture (UMA), which means the CPU and GPU share the same memory pool. In practice, this allows Ollama to offload model layers to the GPU without any memory copying overhead. A Mac with 16GB of unified memory can run models that would require a dedicated GPU with significantly more VRAM on a Windows machine.

Apple Silicon also benefits from Metal GPU acceleration, which Ollama leverages automatically. Intel Macs can still run Ollama effectively, particularly for smaller models, but the performance ceiling is considerably lower.

System Requirements

Apple Silicon (M1, M2, M3, M4)

  • macOS: Ventura (13) or later recommended
  • RAM (Unified Memory): 8GB minimum; 16GB recommended; 32GB+ for larger models
  • Storage: At least 10GB free — models range from 2GB to over 40GB

Intel Mac

  • macOS: Monterey (12) or later
  • RAM: 16GB minimum for a reasonable experience
  • Expectation: Inference is CPU-bound and noticeably slower — stick to 3B and 7B models
  1. Go to ollama.com and click Download for macOS
  2. Open the downloaded .dmg file
  3. Drag the Ollama application into your Applications folder
  4. Open Ollama from your Applications folder or via Spotlight (Cmd + Space, type “Ollama”)

On first launch, macOS may show a Gatekeeper security warning — see the Troubleshooting section below. Once open, a small llama icon appears in your menu bar. The app runs a local API server in the background on port 11434.

Installation Method 2: Homebrew

If you already use Homebrew, install Ollama with:

brew install ollama

Then start the server:

ollama serve

Or run as a persistent background service that starts on login:

brew services start ollama

Running Your First Model

With Ollama running, open Terminal and start a model. The ollama run command downloads the model if you don’t already have it, then starts an interactive chat session.

ollama run llama3.2:3b

This downloads approximately 2GB and runs comfortably on any Mac with 8GB of memory. Once loaded, type at the >>> prompt. Type /bye to exit.

ollama list
ollama rm llama3.2:3b

Model Recommendations by Mac Specification

8GB Unified Memory (Apple Silicon)

  • llama3.2:3b — Fast, capable for general chat and coding questions
  • phi3:mini — Microsoft’s 3.8B model, strong at reasoning for its size
  • gemma2:2b — Very quick, good for fast responses

Avoid models above 7B on 8GB — macOS will begin swapping to disk, making inference impractically slow.

16GB Unified Memory (Apple Silicon)

  • llama3.1:8b — Excellent all-rounder for chat, coding, and analysis
  • mistral:7b — Fast and very capable for its size
  • llama3.1:14b — High-quality responses, comfortably within 16GB
  • deepseek-coder-v2:16b — Excellent for code generation and review

32GB and Above (Apple Silicon)

  • llama3.1:32b — Excellent balance of capability and speed
  • qwen2.5:32b — Top-tier reasoning and coding performance
  • mixtral:8x7b — Mixture-of-experts; requires ~28GB
  • llama3.1:70b (Q4) — Fits in ~40GB; requires 48GB+ to run comfortably

Intel Mac (16GB RAM)

  • llama3.2:3b — Usable speeds on a modern Intel Core i7/i9
  • phi3:mini — Lightweight and reasonably fast
  • gemma2:2b — Good for quick tasks

Avoid anything above 7B on Intel — generation speed drops to a few tokens per second, making extended conversations frustrating.

Performance by Apple Silicon Chip

  • M1 / M1 Pro / M1 Max: Excellent for 7B models; capable with 13B on Pro/Max variants. Roughly 25–45 tok/s on a 7B model
  • M2 / M2 Pro / M2 Max: ~15–20% improvement over M1 in memory bandwidth — a noticeable step up for larger models
  • M3 / M3 Pro / M3 Max: GPU improvements are meaningful for ML workloads; 32B models become practical on M3 Max
  • M4 / M4 Pro / M4 Max: Currently the fastest Apple Silicon for Ollama; the M4 Max with 128GB can run 70B parameter models at useful speeds

Using Ollama via Terminal

Run a one-shot prompt without interactive mode:

ollama run llama3.2:3b "Explain what a REST API is in two sentences"

Check which models are currently loaded in memory:

ollama ps

Query the REST API directly:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "What is the capital of France?",
  "stream": false
}'

Troubleshooting Common Mac Issues

Gatekeeper Security Warning on First Launch

  1. Open System SettingsPrivacy & Security
  2. Scroll down to find the message about Ollama being blocked
  3. Click Open Anyway and confirm in the dialog

Alternatively, right-click (Control-click) the Ollama app in Finder and select Open. You only need to do this once.

Model Running Too Slowly on Intel Mac

Switch to a smaller model (3B or less). Ollama cannot use the GPU effectively on most Intel Mac configurations, so it falls back to CPU-only inference.

Port Conflict on 11434

lsof -i :11434

Kill any previous Ollama process by PID, or change the port with:

OLLAMA_HOST=127.0.0.1:11435 ollama serve

ollama Command Not Found After Homebrew Install

For Apple Silicon, Homebrew installs to /opt/homebrew/bin. Add it to your shell profile:

echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

What to Try Next

With Ollama running on your Mac, consider installing Open WebUI — a browser-based chat interface that connects to your local Ollama instance and gives you a ChatGPT-style experience with full privacy. For developers, Ollama’s OpenAI API compatibility means many existing tools work with local models by simply changing the base URL. Whether you’re experimenting with AI for the first time or building applications that need to run without external servers, your Mac — especially with Apple Silicon — is one of the best consumer platforms for local AI work available today.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *