Home / AI / Ollama / How to Use Ollama with VS Code: Continue and Cline Extensions

How to Use Ollama with VS Code: Continue and Cline Extensions

Ollama brings powerful AI models to your local machine, and VS Code is where most developers spend their working day. Connecting the two gives you free, private AI coding assistance that runs entirely on your hardware — no API keys, no usage costs, and no data leaving your computer.

There are two excellent VS Code extensions for working with Ollama: Continue and Cline. They serve different purposes and are best used together. This guide shows you how to install and configure both.

Prerequisites

Before starting, you need:

  • Ollama installed and running — see our guide on how to install Ollama on Windows 11 if you haven’t done this yet.
  • VS Code installed — download from code.visualstudio.com if needed.
  • A coding-focused model pulled in Ollama. We recommend qwen2.5-coder:7b as an excellent balance of speed and code quality for most hardware. Pull it with: ollama pull qwen2.5-coder:7b

If you want to explore which models perform best for different tasks, our guide to the best Ollama models covers the landscape in detail.

Continue Extension: Setup and Configuration

Installing Continue

Open VS Code, go to the Extensions panel (Ctrl+Shift+X or Cmd+Shift+X), search for Continue, and install the extension by Continue.dev. It’s free and open-source.

Once installed, you’ll see a Continue icon in the left sidebar. Click it to open the Continue panel.

Configuring Continue to Use Ollama

Continue uses a JSON configuration file. To open it, click the settings gear icon at the bottom of the Continue panel, or press Ctrl+Shift+P (Cmd+Shift+P on Mac) and search for Continue: Open config.json.

Replace the contents with the following configuration:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen Autocomplete",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://localhost:11434"
  }
}

Save the file. Continue will connect to Ollama immediately. You can add multiple models to the models array and switch between them from the Continue chat panel.

How to Use Continue

Chat panel: Click the Continue icon in the sidebar to open the chat. You can ask questions about your code, request explanations, or ask for new code to be generated. Highlight a block of code and press Ctrl+L (Cmd+L on Mac) to include the selected code in your next message automatically.

Inline editing: Highlight code in the editor and press Ctrl+I (Cmd+I on Mac). Type an instruction such as “refactor this function to use async/await” and Continue will suggest changes inline, which you can accept or reject.

Autocomplete: Continue provides tab-completion as you type. Press Tab to accept suggestions. This works similarly to GitHub Copilot but uses your local Ollama model instead.

Cline Extension: Setup and Configuration

What Is Cline?

Cline (formerly Claude Dev) is a more autonomous AI coding agent. While Continue is focused on in-editor chat and completions, Cline can read multiple files, write code across your project, run terminal commands, and complete multi-step tasks — all with your approval at each step.

Installing Cline

Search for Cline in the VS Code Extensions panel and install it. You’ll see a Cline icon appear in the left sidebar.

Configuring Cline to Use Ollama

  1. Click the Cline icon in the sidebar to open the Cline panel.
  2. Click the settings gear icon (top-right of the Cline panel).
  3. Under API Provider, select Ollama from the dropdown.
  4. Set the Base URL to http://localhost:11434.
  5. In the Model field, type the model name, e.g. qwen2.5-coder:7b.
  6. Click Save.

How to Use Cline

Open the Cline panel and describe a task in plain English. Examples:

  • “Create a new Python function in utils.py that parses a CSV file and returns a list of dictionaries”
  • “Find all TODO comments in this project and create a markdown file listing them”
  • “Refactor the authentication module to use JWT tokens instead of sessions”

Cline will propose a plan, then ask for permission before making each change. You can approve, reject, or modify each action. This makes it safe to use on real projects — it won’t make changes you haven’t explicitly allowed.

Continue vs Cline: Which Should You Use?

Use Continue when:

  • You want quick answers or explanations while coding
  • You need inline code suggestions or tab completion
  • You’re working on a specific function or block and want focused assistance

Use Cline when:

  • You have a larger multi-step task that spans several files
  • You want the AI to explore the codebase and make coordinated changes
  • You’re scaffolding a new feature from scratch

Most developers use both — Continue for day-to-day coding help and Cline for bigger feature work. They don’t conflict and can both be active at the same time.

Performance Tips

Pre-Load Your Model

By default Ollama unloads models from memory after five minutes of inactivity. The first request after an unload has a cold-start delay. To keep your coding model warm while you work, open a terminal and run:

ollama run qwen2.5-coder:7b

Type /bye to exit the interactive session — the model stays loaded in memory for future requests.

Context Window Size

The default context window for most Ollama models is 2048 tokens, which can be limiting for large files. You can increase it in the Continue config.json by adding "contextLength": 8192 inside the model object. Be aware that larger context windows consume more RAM.

Use a Smaller Model for Autocomplete

Tab completion fires on every keystroke, so a faster, smaller model works better for it. Consider using qwen2.5-coder:1.5b for autocomplete and reserving the 7B model for chat.

Troubleshooting

Continue Shows “No Models Available”

Check that Ollama is running (ollama list should output model names). Verify the apiBase in config.json is http://localhost:11434 and that the model name exactly matches what ollama list shows, including the tag (e.g. qwen2.5-coder:7b, not just qwen2.5-coder).

Cline Cannot Connect to Ollama

Some versions of Cline require Ollama to have the OpenAI-compatible API enabled. Try setting the Base URL to http://localhost:11434/v1 and selecting OpenAI Compatible as the provider instead of Ollama.

Responses Are Very Slow

A 7B model requires at least 8 GB of RAM, and ideally should run on a GPU. If responses are taking more than 30 seconds, either switch to a smaller model (qwen2.5-coder:1.5b) or check that Ollama is using your GPU (run ollama ps during generation to see where it’s running).

Summary

Setting up Continue and Cline with Ollama takes about ten minutes and gives you a genuinely capable AI coding assistant that is completely private and costs nothing to run. Continue handles everyday chat and completions, while Cline handles larger autonomous tasks.

For a broader introduction to what you can do with local AI models, see our complete Ollama beginner’s guide.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *