Running AI code assistance locally with Ollama and VS Code gives you GitHub Copilot-style autocomplete and chat — without sending your code to any external server. This guide covers two main approaches: the Continue extension (recommended) and Cline.
Why Run AI Code Assistance Locally?
Cloud-based tools like GitHub Copilot work well, but they send your code to external servers. For private codebases, client work, or environments with strict data policies, a fully local setup is preferable. Ollama handles the model; VS Code extensions handle the integration.
Prerequisites
Install Ollama and pull a good coding model:
ollama pull llama3.1
# or for dedicated code models:
ollama pull deepseek-coder-v2
ollama pull qwen2.5-coder:7b
See the best Ollama models for coding for a full comparison. Make sure Ollama is running (ollama serve).
Option 1: Continue (Recommended)
Continue is the most mature open-source AI coding assistant for VS Code. It supports chat, inline autocomplete, and slash commands — all pointing at your local Ollama instance.
Installation
- Open VS Code
- Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
- Search for Continue and install it
Configuration
After installation, Continue will open a config file at ~/.continue/config.json. Update it to use Ollama:
{
"models": [
{
"title": "Llama 3.1",
"provider": "ollama",
"model": "llama3.1",
"apiBase": "http://localhost:11434"
},
{
"title": "DeepSeek Coder V2",
"provider": "ollama",
"model": "deepseek-coder-v2",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen2.5 Coder",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
}
Using Continue
- Chat panel: Click the Continue icon in the sidebar, or press Ctrl+Shift+L (Cmd+Shift+L on Mac)
- Inline chat: Select code, press Ctrl+Shift+J to ask a question about it
- Autocomplete: Starts suggesting as you type once a tab autocomplete model is configured
- Slash commands: Type
/edit,/comment, or/testin the chat panel
Useful Slash Commands
/edit refactor this function to use async/await
/comment add docstrings to all functions
/test write unit tests for this class
/share export this conversation
Option 2: Cline
Cline (formerly Claude Dev) is an agentic coding assistant that can read and edit files, run terminal commands, and complete multi-step tasks. It’s more autonomous than Continue.
Installation and Setup
- Install the Cline extension from the VS Code marketplace
- Open Cline settings and set the provider to OpenAI Compatible
- Set the base URL to
http://localhost:11434/v1 - Set the API key to
ollama(any string works) - Set the model name to your pulled model (e.g.
llama3.1)
Which Model to Use for What
| Task | Recommended Model | Command |
|---|---|---|
| Chat and explanation | Llama 3.1 8B | ollama pull llama3.1 |
| Code generation | DeepSeek Coder V2 16B | ollama pull deepseek-coder-v2 |
| Tab autocomplete | Qwen2.5 Coder 7B | ollama pull qwen2.5-coder:7b |
| Lightweight / fast | Phi-3 Mini | ollama pull phi3:mini |
Performance Tips
- Use a dedicated autocomplete model — smaller models (3B-7B) respond faster for tab completion; save the larger models for chat
- GPU acceleration — if you have an NVIDIA or AMD GPU, Ollama will use it automatically for much faster responses
- Keep Ollama running in the background — it starts as a daemon by default, so VS Code extensions can connect any time
Troubleshooting
Continue can’t connect to Ollama: Check that Ollama is running with ollama list. If not, run ollama serve.
Slow autocomplete: Switch to a smaller model like qwen2.5-coder:1.5b for faster suggestions.
Model not found error: Make sure you’ve pulled the model — ollama pull model-name — before referencing it in your config.
Next Steps
Once you have Ollama working in VS Code, consider calling Ollama from Python to build your own tools, or set up Ollama in Docker for a portable development environment.


