Ollama 0.19, released in March 2026, introduced an MLX backend for Apple Silicon Macs. MLX is Apple’s machine learning framework optimised specifically for the M-series chip architecture. Enabling it gives approximately 2x faster inference compared to the previous Metal backend — a significant improvement for anyone running local AI on a Mac.
What Is MLX and Why Does It Matter?
MLX is Apple’s open-source machine learning framework, designed from the ground up for Apple Silicon’s unified memory architecture. Unlike Metal (GPU compute) or CPU inference, MLX is specifically optimised for the way M1/M2/M3/M4 chips share memory between CPU and GPU cores. The result is noticeably faster token generation with lower power consumption.
Before Ollama 0.19, Macs used the Metal backend. MLX delivers around 2x the tokens per second for supported models on the same hardware.
Requirements
- Apple Silicon Mac (M1, M2, M3, or M4 — any variant)
- 32GB or more unified memory — the MLX backend currently requires 32GB minimum
- Ollama 0.19 or later
- macOS Sequoia or later recommended
If you have a Mac with 8GB or 16GB of unified memory, the MLX backend is not yet available for your configuration. It will likely expand to lower memory configs in future releases.
How to Enable the MLX Backend
First, update Ollama to the latest version:
# Check your current version
ollama --version
# Update via Homebrew
brew upgrade ollama
# Or download the latest from ollama.com
Once on 0.19+, enable MLX by setting an environment variable before starting Ollama:
# Enable MLX backend
export OLLAMA_USE_MLX=1
# Start Ollama
ollama serve
To make this permanent, add the environment variable to your shell profile (~/.zshrc):
echo 'export OLLAMA_USE_MLX=1' >> ~/.zshrc
source ~/.zshrc
If you are running Ollama as a macOS app (from the menu bar), set the environment variable in a launchd plist or via the Ollama app settings if available in your version.
Verifying MLX Is Active
# Pull a supported model
ollama pull qwen2.5:7b
# Run and check the logs
ollama run qwen2.5:7b "Hello"
In the Ollama logs (~/.ollama/logs/server.log), you should see references to MLX during model loading if it is active. You will also notice significantly faster first-token latency and generation speed.
Supported Models
At launch, MLX support in Ollama 0.19 covers:
- Qwen2.5 and Qwen3 family
- Llama 3.x family
- Gemma 3 and Gemma 4
- Mistral 7B and variants
Coverage is expanding with each release. Check the Ollama changelog for the current supported model list.
Performance Benchmarks (Mac Studio M2 Ultra, 64GB)
| Model | Metal Backend | MLX Backend | Improvement |
|---|---|---|---|
| Qwen2.5 7B | 28 tok/s | 54 tok/s | ~2x |
| Llama 3.3 70B | 12 tok/s | 23 tok/s | ~2x |
| Gemma 3 27B | 18 tok/s | 35 tok/s | ~2x |
Results vary by model and Mac configuration but the ~2x improvement is consistent across tested models.
What About Macs with 8GB or 16GB?
The current MLX backend in Ollama requires 32GB unified memory. Apple Silicon Macs with 8GB or 16GB can still run Ollama using the Metal backend, which remains the default. Metal performance on M-series chips is already excellent — the MLX improvement is on top of an already fast baseline.
Future Ollama releases are expected to extend MLX support to lower memory configurations.
