How do I enable MLX in Ollama on a Mac?

Update Ollama to version 0.19 or later, then set the environment variable OLLAMA_USE_MLX=1 before running ollama serve. Add it to your ~/.zshrc to make it permanent. Requires 32GB unified memory.

How much faster is Ollama with MLX?

The MLX backend delivers approximately 2x faster token generation compared to the Metal backend on Apple Silicon Macs with 32GB+ unified memory.

Does Ollama MLX work on 8GB or 16GB Macs?

Not yet. The MLX backend in Ollama currently requires 32GB unified memory. Macs with 8GB or 16GB continue to use the Metal backend, which still delivers good performance on Apple Silicon.

Home / AI / Ollama / Ollama MLX: How to Enable Faster Inference on Apple Silicon

Ollama

Ollama MLX: How to Enable Faster Inference on Apple Silicon

1. What Is MLX and Why Does It Matter?

3. How to Enable the MLX Backend

6. Performance Benchmarks (Mac Studio M2 Ultra, 64GB)

7. What About Macs with 8GB or 16GB?

Ollama 0.19, released in March 2026, introduced an MLX backend for Apple Silicon Macs. MLX is Apple’s machine learning framework optimised specifically for the M-series chip architecture. Enabling it gives approximately 2x faster inference compared to the previous Metal backend — a significant improvement for anyone running local AI on a Mac.

What Is MLX and Why Does It Matter?

MLX is Apple’s open-source machine learning framework, designed from the ground up for Apple Silicon’s unified memory architecture. Unlike Metal (GPU compute) or CPU inference, MLX is specifically optimised for the way M1/M2/M3/M4 chips share memory between CPU and GPU cores. The result is noticeably faster token generation with lower power consumption.

Before Ollama 0.19, Macs used the Metal backend. MLX delivers around 2x the tokens per second for supported models on the same hardware.

Requirements

Apple Silicon Mac (M1, M2, M3, or M4 — any variant)
32GB or more unified memory — the MLX backend currently requires 32GB minimum
Ollama 0.19 or later
macOS Sequoia or later recommended

If you have a Mac with 8GB or 16GB of unified memory, the MLX backend is not yet available for your configuration. It will likely expand to lower memory configs in future releases.

How to Enable the MLX Backend

First, update Ollama to the latest version:

# Check your current version
ollama --version

# Update via Homebrew
brew upgrade ollama

# Or download the latest from ollama.com

Once on 0.19+, enable MLX by setting an environment variable before starting Ollama:

# Enable MLX backend
export OLLAMA_USE_MLX=1

# Start Ollama
ollama serve

To make this permanent, add the environment variable to your shell profile (~/.zshrc):

echo 'export OLLAMA_USE_MLX=1' >> ~/.zshrc
source ~/.zshrc

If you are running Ollama as a macOS app (from the menu bar), set the environment variable in a launchd plist or via the Ollama app settings if available in your version.

Verifying MLX Is Active

# Pull a supported model
ollama pull qwen2.5:7b

# Run and check the logs
ollama run qwen2.5:7b "Hello"

In the Ollama logs (~/.ollama/logs/server.log), you should see references to MLX during model loading if it is active. You will also notice significantly faster first-token latency and generation speed.

Supported Models

At launch, MLX support in Ollama 0.19 covers:

Qwen2.5 and Qwen3 family
Llama 3.x family
Gemma 3 and Gemma 4
Mistral 7B and variants

Coverage is expanding with each release. Check the Ollama changelog for the current supported model list.

Performance Benchmarks (Mac Studio M2 Ultra, 64GB)

Model	Metal Backend	MLX Backend	Improvement
Qwen2.5 7B	28 tok/s	54 tok/s	~2x
Llama 3.3 70B	12 tok/s	23 tok/s	~2x
Gemma 3 27B	18 tok/s	35 tok/s	~2x

Results vary by model and Mac configuration but the ~2x improvement is consistent across tested models.

What About Macs with 8GB or 16GB?

The current MLX backend in Ollama requires 32GB unified memory. Apple Silicon Macs with 8GB or 16GB can still run Ollama using the Metal backend, which remains the default. Metal performance on M-series chips is already excellent — the MLX improvement is on top of an already fast baseline.

Future Ollama releases are expected to extend MLX support to lower memory configurations.

Ollama MLX: How to Enable Faster Inference on Apple Silicon

Table of Contents

1. What Is MLX and Why Does It Matter?

2. Requirements

3. How to Enable the MLX Backend

4. Verifying MLX Is Active

5. Supported Models

6. Performance Benchmarks (Mac Studio M2 Ultra, 64GB)

7. What About Macs with 8GB or 16GB?

8. Related Guides

What Is MLX and Why Does It Matter?

Requirements

How to Enable the MLX Backend

Verifying MLX Is Active

Supported Models

Performance Benchmarks (Mac Studio M2 Ultra, 64GB)

What About Macs with 8GB or 16GB?

Ollama + MCP: Building Local AI Agents Without the Cloud

Windows 11 Security Checklist: 12 Settings to Change Now

Ollama MLX: How to Enable Faster Inference on Apple Silicon

Table of Contents

What Is MLX and Why Does It Matter?

Requirements

How to Enable the MLX Backend

Verifying MLX Is Active

Supported Models

Performance Benchmarks (Mac Studio M2 Ultra, 64GB)

What About Macs with 8GB or 16GB?

Related Guides

Ollama + MCP: Building Local AI Agents Without the Cloud

Windows 11 Security Checklist: 12 Settings to Change Now

Related Posts