Home / AI / Ollama / How to Use Ollama with Cursor IDE: Local AI for Free

How to Use Ollama with Cursor IDE: Local AI for Free

How to Use Ollama with Cursor IDE: Local AI for Free

Cursor is the fastest-growing AI code editor in 2026, but at £13–16 a month for Pro you are paying for premium model access you may not need if you already run Ollama locally. Cursor supports any OpenAI-compatible endpoint, which means you can point it at your local Ollama instance and use models like DeepSeek-R1 or Qwen2.5-Coder at zero ongoing cost. There is one important caveat no other guide covers clearly: Cursor’s Tab autocomplete cannot use local models — it is locked to Cursor’s proprietary Fusion model. Chat and inline edits work; Tab does not. This guide covers the complete setup, which features work, the best models, and an honest cost comparison so you can decide whether local Ollama is the right trade-off for you.

Why Use Ollama with Cursor?

The main reasons are cost, privacy, and no usage caps on chat.

Cost: Cursor Pro costs $20/month (~£16) or $16/month (~£13) on annual billing. With Ollama running locally, Chat and inline edits are completely free — no subscription, no per-token billing, no monthly limits on how much you ask. The only cost is the electricity to run your GPU.

Privacy: Cursor’s Privacy Mode prevents code being stored on Cursor’s servers, but prompts and code snippets still transit Cursor’s infrastructure. For teams with strict data handling requirements — GDPR, air-gapped environments, proprietary codebases — that is not sufficient. Ollama keeps everything on your machine.

No usage caps: Free Cursor Hobby gives you 50 premium model requests per month. With a local model via Ollama, there are no caps at all on chat requests.

If you also want local tab autocomplete, the better route is Ollama with VS Code using the Continue extension — Continue supports local tab autocomplete where Cursor does not.

What You Need Before You Start

  • Ollama installed and runningollama serve must be active. Check with curl http://localhost:11434/api/tags.
  • A coding model pulled — see model recommendations below; ollama pull qwen2.5-coder:7b is a good starting point on most hardware
  • ngrok installed — required because Cursor’s sandbox environment cannot reach localhost directly (see next section for why and how to skip it)
  • Cursor installed — any version from 2025 onwards supports custom OpenAI-compatible endpoints
  • Sufficient VRAM or RAM — a 7B model needs at least 8 GB RAM; a 14B model needs 12 GB. CPU-only inference works but responses are slow (10–30 seconds).

This guide assumes Ollama is running on the same machine as Cursor. If you are running Ollama on a remote server, skip the ngrok section and use your server’s public URL directly.

Why ngrok Is Required (The Technical Reason)

This is the step most guides either skip or under-explain. Cursor’s AI backend runs in a sandboxed environment that cannot make direct requests to localhost:11434. Browser-based CORS policies also block direct localhost access. A publicly reachable HTTPS URL is required — ngrok creates one by tunnelling your local Ollama port to a temporary public address.

Install ngrok from ngrok.com, sign up for a free account, and authenticate:

ngrok config add-authtoken your_token_here

Before starting the tunnel, tell Ollama to accept requests from any origin:

# Mac / Linux
export OLLAMA_ORIGINS="*"

# Windows (Command Prompt)
set OLLAMA_ORIGINS=*

To make this permanent, add it to your systemd service override or shell profile. Then start the tunnel:

ngrok http 11434 --host-header="localhost:11434"

ngrok will display a forwarding URL like https://abc123.ngrok-free.app. Copy this — you will need it in Cursor’s settings. Note: free ngrok URLs change every time you restart the tunnel. A paid ngrok account ($8/month) gives you a static domain that never changes.

Alternatively, if you run Ollama on a server with a real domain and SSL, you can use that URL directly and skip ngrok entirely.

Skipping ngrok: Run Ollama on a Server

If you run Ollama on a VPS or home server with a public domain, you can skip ngrok entirely and use your server URL directly in Cursor’s base URL field. This gives you a permanent URL that never changes, faster response times for large models (dedicated GPU), and access from any machine without repeating the ngrok setup.

The base URL format for a server-hosted setup:

https://ollama.yourdomain.com/v1

Make sure Ollama is behind a reverse proxy with HTTPS — Cursor requires HTTPS for non-localhost URLs. See the Ollama security guide for setting up nginx with TLS in front of Ollama, including the proxy_buffering off setting that is required for streaming to work correctly.

Ollama Cloud is another option that avoids ngrok — cloud models run on Ollama’s own servers and are accessed via a direct API key. If you mainly want large model access in Cursor without managing local hardware at all, see Ollama Cloud for how the :cloud suffix routing works.

Configuring Cursor to Use Ollama

With Ollama running and your ngrok tunnel active, configure Cursor:

  1. Open Cursor Settings (Cmd+, on Mac, Ctrl+, on Windows/Linux)
  2. Navigate to the Models tab
  3. Click Add Model and type the exact model name as it appears in Ollama — for example qwen2.5-coder:7b or deepseek-r1:14b. The name must match exactly what ollama list shows.
  4. Scroll down to the OpenAI API Key section
  5. Toggle on Override OpenAI Base URL
  6. Enter your base URL: https://abc123.ngrok-free.app/v1 (include the /v1 suffix)
  7. In the API Key field, enter ollama — Ollama does not validate keys but Cursor requires a non-empty value
  8. Deselect all other models — leave only your local model selected. This prevents the “does not work with your current plan” error.
  9. Click Verify — a green confirmation means Cursor can reach your Ollama instance

Open Chat with Cmd+L / Ctrl+L and select your model from the dropdown. You are now running fully local AI inside Cursor.

Which Cursor Features Work With a Local Model

This is the most important section if you are evaluating whether local Ollama meets your needs in Cursor:

Feature Works with Ollama? Notes
Chat (Cmd+L) ✅ Yes Works well; primary use case
Inline edit (Cmd+K) ✅ Yes Single-file edits work reliably
Composer / Agent (Cmd+I) ⚠️ Partial Multi-file editing can work; not universal — depends on model tool-calling support
Tab autocomplete ❌ No Cursor Tab uses a proprietary hardcoded “Fusion” model — this cannot be replaced
Codebase indexing ❌ No Requires Cursor’s cloud infrastructure
Background agents ❌ No Cloud-only feature

The Tab autocomplete limitation is confirmed by Cursor’s own team on the community forum. Cursor Tab is a custom fine-tuned model — it is not a generic LLM call and cannot be swapped. If Tab autocomplete is essential to your workflow, you need a Cursor subscription or should consider VS Code with Continue instead.

Which Ollama Models Work Best With Cursor

For Cursor Chat and inline edits, you want a strong coding model that fits your VRAM budget. The context window matters here too — set num_ctx to at least 16,384 for multi-file tasks.

Model Disk size Min VRAM Best for
qwen2.5-coder:7b ~4.7 GB 8 GB Low-spec machines; fast responses
deepseek-r1:14b ~9 GB 12 GB Complex debugging; thinks through problems
qwen3:14b ~9 GB 12 GB Good all-rounder for coding and chat
devstral:24b ~14 GB 16 GB Multi-file and agentic coding; 256K context
qwen2.5-coder:32b ~20 GB 24 GB GPT-4o-level coding quality; 92.7% HumanEval

For most setups, qwen2.5-coder:7b is the right starting point — it is fast enough that Chat responses arrive quickly, and code quality is solid for everyday tasks. Move up to deepseek-r1:14b or qwen2.5-coder:32b if you have the VRAM and need better reasoning on complex problems.

Enabling Thinking Mode for Complex Coding Problems

If you use a Qwen3 or DeepSeek-R1 model, you can take advantage of chain-of-thought reasoning for particularly tricky debugging sessions. With Ollama thinking mode enabled, the model works through the problem step by step before delivering an answer — useful for multi-step bugs or architecture questions where the reasoning process is as valuable as the answer.

For Cursor Chat, you can control thinking mode per request via the API options parameter, or by creating a Modelfile variant with thinking enabled by default:

FROM qwen3:14b
PARAMETER num_ctx 32768
ollama create qwen3-cursor -f Modelfile

Add qwen3-cursor as a model in Cursor using the same setup steps above. Switch to it for complex problems and back to a faster model for quick edits. Note that thinking mode adds 5–10× latency — use it selectively rather than as a default.

Common Errors and Fixes

“Does not work with your current plan”

Cursor is validating the model name against its own plan whitelist and rejecting it. Fix: go back to Settings → Models and deselect every model except your local one. With no other model selected, Cursor cannot fall through to a plan-based check.

Verify fails with connection error

Check in order: (1) ollama serve is running — try curl http://localhost:11434/api/tags locally; (2) ngrok tunnel is still active and the URL in Cursor matches the current forwarding address; (3) OLLAMA_ORIGINS="*" is set in the same process where Ollama is running.

Model name not found

The model name in Cursor must match exactly what Ollama knows. Run ollama list and copy the name verbatim — including the tag, for example qwen2.5-coder:7b not qwen2.5-coder.

ngrok URL keeps changing

Free ngrok generates a new URL every restart. Either pay for a static ngrok domain ($8/month), run Ollama on a server with a real domain, or create a short shell alias that starts the tunnel and prints the new URL for you to paste into Cursor settings:

alias ollama-cursor='ngrok http 11434 --host-header="localhost:11434" & sleep 2 && curl -s http://localhost:4040/api/tunnels | python3 -m json.tool | grep public_url'

This starts the tunnel in the background and immediately prints the forwarding URL.

Responses are slow or time out in Cursor

Cursor has a response timeout that can trigger on slow CPU inference or very large models. Try a smaller model first (qwen2.5-coder:7b instead of a 32B variant) to confirm the connection works, then scale up. If you are on CPU-only hardware, responses on a 14B model can take 30–60 seconds — within that range but tight against some timeouts.

Cursor Free vs Pro vs Ollama: Cost Comparison

Cursor Hobby (Free) Cursor Pro Ollama Local
Monthly cost £0 ~£13–16/month £0 (hardware already owned)
Tab autocomplete Limited Unlimited ❌ Not available
Chat requests ~500 free-tier Unlimited (credit pool) Unlimited
Model quality GPT-4o, Claude 3.5 GPT-4o, Claude 3.5, o1 Depends on your hardware
Privacy Prompts transit Cursor Prompts transit Cursor Fully local, zero egress
Offline use No No Yes

The honest verdict: if Tab autocomplete is important to your workflow, a Cursor subscription is hard to replace with local Ollama alone. If you primarily use Chat and Cmd+K, local Ollama covers those use cases completely free. Many developers run both — Cursor Pro for Tab autocomplete, Ollama for long chat sessions and document-heavy queries where they do not want to burn through their credit pool. If you later decide you want full local tab autocomplete as well, the VS Code + Continue combination is the most complete local-first coding environment available without any ongoing subscription cost.

Related articles: What is Hermes Agent and How Does It Work with Ollama?, What is Kimi K2.6 and Is It Worth Using on Ollama?, Ollama + OpenCode: Free Local AI Coding Agent Setup