Home / AI / Ollama / Using Ollama with VS Code: Continue Extension Setup Guide

Using Ollama with VS Code: Continue Extension Setup Guide

Ollama

What Is Continue?

Continue is a free, open-source VS Code extension that turns any locally-running model into a coding assistant. It integrates directly into the editor sidebar and inline with your code, giving you chat, autocomplete, and code editing — all powered by models running on your own machine via Ollama.

Unlike GitHub Copilot, Continue never sends your code to an external server. Everything runs locally, making it well-suited for proprietary codebases, offline environments, and privacy-conscious teams.

Prerequisites

Before you start, you need:

  • VS Code (or a compatible fork like Cursor or VSCodium)
  • Ollama installed and running — see the Windows, Mac, or Linux install guides if needed
  • At least one model pulled — a code-focused model works best

Step 1: Pull a Good Coding Model

For coding tasks you want a model that understands code structure, can generate functions, and handles fill-in-the-middle (FIM) completions. Recommended options:

For chat and code generation

ollama pull qwen2.5-coder:7b

Qwen2.5-Coder 7B offers excellent code quality for its size and handles a wide range of languages. It’s a solid all-rounder that runs comfortably on 8 GB of VRAM.

For autocomplete

ollama pull deepseek-coder-v2:16b

Or if you’re constrained on RAM, a smaller model with FIM support:

ollama pull starcoder2:3b

For lower-spec hardware

ollama pull qwen2.5-coder:3b

The 3B models run on integrated graphics or CPU-only and still provide useful completions.

Step 2: Install Continue in VS Code

Open VS Code, then either:

  • Press Ctrl+P (Windows/Linux) or Cmd+P (Mac) and type ext install Continue.continue
  • Or search “Continue” in the Extensions panel (Ctrl+Shift+X)

After installing, a Continue icon appears in the Activity Bar on the left. Click it to open the sidebar panel.

Step 3: Configure Continue to Use Ollama

Continue uses a JSON configuration file. Open it by clicking the gear icon in the Continue sidebar, or go to:

  • Windows: %USERPROFILE%\.continue\config.json
  • Mac/Linux: ~/.continue/config.json

A minimal working configuration with Ollama:

{
  "models": [
    {
      "title": "Qwen2.5 Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder2 3B",
    "provider": "ollama",
    "model": "starcoder2:3b"
  }
}

Save the file. Continue automatically picks up changes — no restart needed.

Step 4: Using Continue for Chat

Open the Continue sidebar and type a question. You can:

  • Ask general questions: “Explain what a context manager is in Python”
  • Reference selected code: Highlight code in the editor, then press Ctrl+L (Windows/Linux) or Cmd+L (Mac) to add it to the chat context automatically
  • Add entire files: Type @filename.py in the chat to include a file’s contents
  • Reference docs: Add URLs with @https://... and Continue will fetch and include the page

Step 5: Inline Editing

Select a block of code in the editor, then press Ctrl+I (Windows/Linux) or Cmd+I (Mac) to open the inline edit prompt. Type an instruction:

  • “Add type hints to this function”
  • “Refactor this to use async/await”
  • “Add error handling and logging”

Continue shows a diff you can accept or reject — similar to how GitHub Copilot’s edit mode works.

Step 6: Tab Autocomplete

If you’ve configured a tabAutocompleteModel, Continue provides inline ghost-text suggestions as you type. Press Tab to accept the suggestion.

Autocomplete works best with smaller, faster models (3B–7B) because the suggestions need to appear quickly. If completions feel slow, switch to a smaller model:

ollama pull deepseek-coder:1.3b

Advanced Configuration: Multiple Models

You can configure multiple models and switch between them. This is useful if you want a larger model for complex questions and a faster one for quick queries:

{
  "models": [
    {
      "title": "Qwen2.5 Coder 7B (fast)",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    },
    {
      "title": "Llama3.2 (general)",
      "provider": "ollama",
      "model": "llama3.2"
    },
    {
      "title": "DeepSeek Coder 16B (thorough)",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder2 3B",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "contextProviders": [
    { "name": "diff" },
    { "name": "open" },
    { "name": "terminal" }
  ]
}

Use the model picker in the sidebar to switch between models mid-conversation.

Context Providers: Give Continue More Awareness

Context providers let you include extra information in your prompts. Useful ones:

  • diff — includes your current git diff
  • open — includes all open files
  • terminal — includes recent terminal output
  • codebase — semantic search across your repo (requires embedding model)

In the chat, type @ to see available context providers and include them in your message.

Connecting Continue to a Remote Ollama Instance

If Ollama runs on a server or another machine on your network, point Continue at it with the apiBase setting:

{
  "models": [
    {
      "title": "Remote Llama",
      "provider": "ollama",
      "model": "llama3.2",
      "apiBase": "http://192.168.1.50:11434"
    }
  ]
}

Make sure Ollama on the remote machine is bound to its network interface:

OLLAMA_HOST=0.0.0.0 ollama serve

Model Recommendations by Hardware

System RAM / VRAM Chat model Autocomplete model
4–6 GB qwen2.5-coder:3b starcoder2:3b
8 GB qwen2.5-coder:7b starcoder2:3b
16 GB qwen2.5-coder:14b deepseek-coder-v2:16b
24 GB+ deepseek-coder-v2:16b deepseek-coder-v2:16b

Troubleshooting

Continue can’t connect to Ollama

Check that Ollama is running (ollama serve or check the system tray). Verify the API is reachable:

curl http://localhost:11434/api/tags

If you’re on Windows and Ollama starts automatically, it may not be running yet — open it from the Start menu or run ollama serve in a terminal.

Autocomplete suggestions are slow

Switch to a smaller autocomplete model. The chat model and autocomplete model are independent — you can use a large model for chat and a fast small model for completions.

“Model not found” error

Make sure the model name in config.json exactly matches what ollama list shows. Names are case-sensitive and must include the tag if it’s not :latest.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *