Ollama has quickly become the go-to tool for running large language models locally, and Mac users are in a particularly strong position to take advantage of it. Whether you’re on a modern Apple Silicon Mac with unified memory or an older Intel machine, Ollama runs natively on macOS with minimal configuration. This guide walks you through every step — from installation to running your first model — with clear separation between Apple Silicon and Intel performance expectations so you know exactly what to expect from your hardware.
Why Mac Is an Excellent Platform for Ollama
Apple Silicon Macs — covering the M1, M2, M3, and M4 chip families — have a significant architectural advantage when running local AI models. The key is unified memory architecture (UMA), which means the CPU and GPU share the same memory pool. In practice, this allows Ollama to offload model layers to the GPU without any memory copying overhead. A Mac with 16GB of unified memory can run models that would require a dedicated GPU with significantly more VRAM on a Windows machine.
Apple Silicon also benefits from Metal GPU acceleration, which Ollama leverages automatically. Intel Macs can still run Ollama effectively, particularly for smaller models, but the performance ceiling is considerably lower.
System Requirements
Apple Silicon (M1, M2, M3, M4)
- macOS: Ventura (13) or later recommended
- RAM (Unified Memory): 8GB minimum; 16GB recommended; 32GB+ for larger models
- Storage: At least 10GB free — models range from 2GB to over 40GB
Intel Mac
- macOS: Monterey (12) or later
- RAM: 16GB minimum for a reasonable experience
- Expectation: Inference is CPU-bound and noticeably slower — stick to 3B and 7B models
Installation Method 1: Download the Ollama App (Recommended)
- Go to ollama.com and click Download for macOS
- Open the downloaded
.dmgfile - Drag the Ollama application into your Applications folder
- Open Ollama from your Applications folder or via Spotlight (
Cmd + Space, type “Ollama”)
On first launch, macOS may show a Gatekeeper security warning — see the Troubleshooting section below. Once open, a small llama icon appears in your menu bar. The app runs a local API server in the background on port 11434.
Installation Method 2: Homebrew
If you already use Homebrew, install Ollama with:
brew install ollama
Then start the server:
ollama serve
Or run as a persistent background service that starts on login:
brew services start ollama
Running Your First Model
With Ollama running, open Terminal and start a model. The ollama run command downloads the model if you don’t already have it, then starts an interactive chat session.
ollama run llama3.2:3b
This downloads approximately 2GB and runs comfortably on any Mac with 8GB of memory. Once loaded, type at the >>> prompt. Type /bye to exit.
ollama list
ollama rm llama3.2:3b
Model Recommendations by Mac Specification
8GB Unified Memory (Apple Silicon)
- llama3.2:3b — Fast, capable for general chat and coding questions
- phi3:mini — Microsoft’s 3.8B model, strong at reasoning for its size
- gemma2:2b — Very quick, good for fast responses
Avoid models above 7B on 8GB — macOS will begin swapping to disk, making inference impractically slow.
16GB Unified Memory (Apple Silicon)
- llama3.1:8b — Excellent all-rounder for chat, coding, and analysis
- mistral:7b — Fast and very capable for its size
- llama3.1:14b — High-quality responses, comfortably within 16GB
- deepseek-coder-v2:16b — Excellent for code generation and review
32GB and Above (Apple Silicon)
- llama3.1:32b — Excellent balance of capability and speed
- qwen2.5:32b — Top-tier reasoning and coding performance
- mixtral:8x7b — Mixture-of-experts; requires ~28GB
- llama3.1:70b (Q4) — Fits in ~40GB; requires 48GB+ to run comfortably
Intel Mac (16GB RAM)
- llama3.2:3b — Usable speeds on a modern Intel Core i7/i9
- phi3:mini — Lightweight and reasonably fast
- gemma2:2b — Good for quick tasks
Avoid anything above 7B on Intel — generation speed drops to a few tokens per second, making extended conversations frustrating.
Performance by Apple Silicon Chip
- M1 / M1 Pro / M1 Max: Excellent for 7B models; capable with 13B on Pro/Max variants. Roughly 25–45 tok/s on a 7B model
- M2 / M2 Pro / M2 Max: ~15–20% improvement over M1 in memory bandwidth — a noticeable step up for larger models
- M3 / M3 Pro / M3 Max: GPU improvements are meaningful for ML workloads; 32B models become practical on M3 Max
- M4 / M4 Pro / M4 Max: Currently the fastest Apple Silicon for Ollama; the M4 Max with 128GB can run 70B parameter models at useful speeds
Using Ollama via Terminal
Run a one-shot prompt without interactive mode:
ollama run llama3.2:3b "Explain what a REST API is in two sentences"
Check which models are currently loaded in memory:
ollama ps
Query the REST API directly:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "What is the capital of France?",
"stream": false
}'
Troubleshooting Common Mac Issues
Gatekeeper Security Warning on First Launch
- Open System Settings → Privacy & Security
- Scroll down to find the message about Ollama being blocked
- Click Open Anyway and confirm in the dialog
Alternatively, right-click (Control-click) the Ollama app in Finder and select Open. You only need to do this once.
Model Running Too Slowly on Intel Mac
Switch to a smaller model (3B or less). Ollama cannot use the GPU effectively on most Intel Mac configurations, so it falls back to CPU-only inference.
Port Conflict on 11434
lsof -i :11434
Kill any previous Ollama process by PID, or change the port with:
OLLAMA_HOST=127.0.0.1:11435 ollama serve
ollama Command Not Found After Homebrew Install
For Apple Silicon, Homebrew installs to /opt/homebrew/bin. Add it to your shell profile:
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
What to Try Next
With Ollama running on your Mac, consider installing Open WebUI — a browser-based chat interface that connects to your local Ollama instance and gives you a ChatGPT-style experience with full privacy. For developers, Ollama’s OpenAI API compatibility means many existing tools work with local models by simply changing the base URL. Whether you’re experimenting with AI for the first time or building applications that need to run without external servers, your Mac — especially with Apple Silicon — is one of the best consumer platforms for local AI work available today.


