What Is Continue?
Continue is a free, open-source VS Code extension that turns any locally-running model into a coding assistant. It integrates directly into the editor sidebar and inline with your code, giving you chat, autocomplete, and code editing — all powered by models running on your own machine via Ollama.
Unlike GitHub Copilot, Continue never sends your code to an external server. Everything runs locally, making it well-suited for proprietary codebases, offline environments, and privacy-conscious teams.
Prerequisites
Before you start, you need:
- VS Code (or a compatible fork like Cursor or VSCodium)
- Ollama installed and running — see the Windows, Mac, or Linux install guides if needed
- At least one model pulled — a code-focused model works best
Step 1: Pull a Good Coding Model
For coding tasks you want a model that understands code structure, can generate functions, and handles fill-in-the-middle (FIM) completions. Recommended options:
For chat and code generation
ollama pull qwen2.5-coder:7b
Qwen2.5-Coder 7B offers excellent code quality for its size and handles a wide range of languages. It’s a solid all-rounder that runs comfortably on 8 GB of VRAM.
For autocomplete
ollama pull deepseek-coder-v2:16b
Or if you’re constrained on RAM, a smaller model with FIM support:
ollama pull starcoder2:3b
For lower-spec hardware
ollama pull qwen2.5-coder:3b
The 3B models run on integrated graphics or CPU-only and still provide useful completions.
Step 2: Install Continue in VS Code
Open VS Code, then either:
- Press Ctrl+P (Windows/Linux) or Cmd+P (Mac) and type
ext install Continue.continue - Or search “Continue” in the Extensions panel (Ctrl+Shift+X)
After installing, a Continue icon appears in the Activity Bar on the left. Click it to open the sidebar panel.
Step 3: Configure Continue to Use Ollama
Continue uses a JSON configuration file. Open it by clicking the gear icon in the Continue sidebar, or go to:
- Windows:
%USERPROFILE%\.continue\config.json - Mac/Linux:
~/.continue/config.json
A minimal working configuration with Ollama:
{
"models": [
{
"title": "Qwen2.5 Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
],
"tabAutocompleteModel": {
"title": "Starcoder2 3B",
"provider": "ollama",
"model": "starcoder2:3b"
}
}
Save the file. Continue automatically picks up changes — no restart needed.
Step 4: Using Continue for Chat
Open the Continue sidebar and type a question. You can:
- Ask general questions: “Explain what a context manager is in Python”
- Reference selected code: Highlight code in the editor, then press Ctrl+L (Windows/Linux) or Cmd+L (Mac) to add it to the chat context automatically
- Add entire files: Type
@filename.pyin the chat to include a file’s contents - Reference docs: Add URLs with
@https://...and Continue will fetch and include the page
Step 5: Inline Editing
Select a block of code in the editor, then press Ctrl+I (Windows/Linux) or Cmd+I (Mac) to open the inline edit prompt. Type an instruction:
- “Add type hints to this function”
- “Refactor this to use async/await”
- “Add error handling and logging”
Continue shows a diff you can accept or reject — similar to how GitHub Copilot’s edit mode works.
Step 6: Tab Autocomplete
If you’ve configured a tabAutocompleteModel, Continue provides inline ghost-text suggestions as you type. Press Tab to accept the suggestion.
Autocomplete works best with smaller, faster models (3B–7B) because the suggestions need to appear quickly. If completions feel slow, switch to a smaller model:
ollama pull deepseek-coder:1.3b
Advanced Configuration: Multiple Models
You can configure multiple models and switch between them. This is useful if you want a larger model for complex questions and a faster one for quick queries:
{
"models": [
{
"title": "Qwen2.5 Coder 7B (fast)",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
},
{
"title": "Llama3.2 (general)",
"provider": "ollama",
"model": "llama3.2"
},
{
"title": "DeepSeek Coder 16B (thorough)",
"provider": "ollama",
"model": "deepseek-coder-v2:16b"
}
],
"tabAutocompleteModel": {
"title": "StarCoder2 3B",
"provider": "ollama",
"model": "starcoder2:3b"
},
"contextProviders": [
{ "name": "diff" },
{ "name": "open" },
{ "name": "terminal" }
]
}
Use the model picker in the sidebar to switch between models mid-conversation.
Context Providers: Give Continue More Awareness
Context providers let you include extra information in your prompts. Useful ones:
diff— includes your current git diffopen— includes all open filesterminal— includes recent terminal outputcodebase— semantic search across your repo (requires embedding model)
In the chat, type @ to see available context providers and include them in your message.
Connecting Continue to a Remote Ollama Instance
If Ollama runs on a server or another machine on your network, point Continue at it with the apiBase setting:
{
"models": [
{
"title": "Remote Llama",
"provider": "ollama",
"model": "llama3.2",
"apiBase": "http://192.168.1.50:11434"
}
]
}
Make sure Ollama on the remote machine is bound to its network interface:
OLLAMA_HOST=0.0.0.0 ollama serve
Model Recommendations by Hardware
| System RAM / VRAM | Chat model | Autocomplete model |
|---|---|---|
| 4–6 GB | qwen2.5-coder:3b | starcoder2:3b |
| 8 GB | qwen2.5-coder:7b | starcoder2:3b |
| 16 GB | qwen2.5-coder:14b | deepseek-coder-v2:16b |
| 24 GB+ | deepseek-coder-v2:16b | deepseek-coder-v2:16b |
Troubleshooting
Continue can’t connect to Ollama
Check that Ollama is running (ollama serve or check the system tray). Verify the API is reachable:
curl http://localhost:11434/api/tags
If you’re on Windows and Ollama starts automatically, it may not be running yet — open it from the Start menu or run ollama serve in a terminal.
Autocomplete suggestions are slow
Switch to a smaller autocomplete model. The chat model and autocomplete model are independent — you can use a large model for chat and a fast small model for completions.
“Model not found” error
Make sure the model name in config.json exactly matches what ollama list shows. Names are case-sensitive and must include the tag if it’s not :latest.


