Is Ollama compatible with the OpenAI API?

Yes. Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1. You can use the OpenAI Python SDK or any tool that supports custom OpenAI endpoints by changing the base_url to point at Ollama.

How do I switch from OpenAI to Ollama in Python?

Change the base_url to http://localhost:11434/v1 and set api_key to any non-empty string (Ollama ignores it). No other code changes are needed for basic chat completions.

Does Ollama support streaming with the OpenAI-compatible API?

Yes. Ollama supports streaming responses via the OpenAI-compatible endpoint using the same stream=True parameter as the OpenAI SDK.

Home / AI / Ollama / Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

1. The OpenAI-Compatible Endpoint

2. Switching from OpenAI to Ollama: One Line Change

3. Supported Endpoints

4. Using with LangChain

5. Using with curl

6. Streaming Responses

7. Using with Tools and Applications That Support Custom OpenAI Endpoints

8. Embeddings with the OpenAI-Compatible Endpoint

9. Limitations vs the Real OpenAI API

10. Related Guides

Ollama includes a built-in OpenAI-compatible API endpoint. This means you can take existing code written for the OpenAI API — Python scripts, applications, integrations — and point them at your local Ollama instance with a single line change. No cloud, no API costs, full privacy.

The OpenAI-Compatible Endpoint

When Ollama is running, it exposes an OpenAI-compatible endpoint at:

http://localhost:11434/v1

This endpoint supports the same request and response format as OpenAI’s API, including the /v1/chat/completions endpoint used by most applications.

Switching from OpenAI to Ollama: One Line Change

If you are using the OpenAI Python SDK, you only need to change the base_url and set a dummy API key:

from openai import OpenAI

# Before: pointing at OpenAI
# client = OpenAI(api_key="sk-...")

# After: pointing at local Ollama
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required by the SDK but ignored by Ollama
)

response = client.chat.completions.create(
    model="llama3.3",  # Any model you have pulled in Ollama
    messages=[
        {"role": "user", "content": "Explain how neural networks work"}
    ]
)
print(response.choices[0].message.content)

Supported Endpoints

Ollama’s OpenAI-compatible layer supports the most commonly used endpoints:

POST /v1/chat/completions — chat with a model (streaming supported)
POST /v1/completions — raw text completions
POST /v1/embeddings — generate embeddings for RAG and similarity search
GET /v1/models — list available models

Using with LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama3.3",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = llm.invoke("What are the benefits of local AI inference?")
print(response.content)

Using with curl

curl http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "llama3.3",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Responses

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

stream = client.chat.completions.create(
    model="llama3.3",
    messages=[{"role": "user", "content": "Write a short story about a robot"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using with Tools and Applications That Support Custom OpenAI Endpoints

Many tools allow you to set a custom OpenAI base URL. Common examples:

Continue (VS Code) — set provider to openai, base URL to http://localhost:11434/v1
Open WebUI — natively supports Ollama, but also works via the OpenAI compatibility layer
LibreChat — supports custom OpenAI endpoints
n8n — use the OpenAI node with a custom base URL
Any app with “Custom OpenAI endpoint” or “base URL” settings

Embeddings with the OpenAI-Compatible Endpoint

response = client.embeddings.create(
    model="nomic-embed-text",  # Pull this model first
    input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Limitations vs the Real OpenAI API

Not all OpenAI API features are supported — function calling support varies by model
Model names are your local Ollama model names, not OpenAI model names
No authentication or rate limiting by default (fine for local use)
Vision inputs follow Ollama’s format rather than exactly matching OpenAI’s image input format

For a full reference of every Ollama command and flag, see the Ollama CLI Cheat Sheet.

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Table of Contents

1. The OpenAI-Compatible Endpoint

2. Switching from OpenAI to Ollama: One Line Change

3. Supported Endpoints

4. Using with LangChain

5. Using with curl

6. Streaming Responses

7. Using with Tools and Applications That Support Custom OpenAI Endpoints

8. Embeddings with the OpenAI-Compatible Endpoint

9. Limitations vs the Real OpenAI API

10. Related Guides

The OpenAI-Compatible Endpoint

Switching from OpenAI to Ollama: One Line Change

Supported Endpoints

Using with LangChain

Using with curl

Streaming Responses

Using with Tools and Applications That Support Custom OpenAI Endpoints

Embeddings with the OpenAI-Compatible Endpoint

Limitations vs the Real OpenAI API

Qwen3-Coder vs Llama 4 Scout: Best Local Coding Model

Best Ollama Models for 8GB RAM and Low VRAM Hardware

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Table of Contents

The OpenAI-Compatible Endpoint

Switching from OpenAI to Ollama: One Line Change

Supported Endpoints

Using with LangChain

Using with curl

Streaming Responses

Using with Tools and Applications That Support Custom OpenAI Endpoints

Embeddings with the OpenAI-Compatible Endpoint

Limitations vs the Real OpenAI API

Related Guides

Qwen3-Coder vs Llama 4 Scout: Best Local Coding Model

Best Ollama Models for 8GB RAM and Low VRAM Hardware

Related Posts