Home / AI / Ollama / Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama includes a built-in OpenAI-compatible API endpoint. This means you can take existing code written for the OpenAI API — Python scripts, applications, integrations — and point them at your local Ollama instance with a single line change. No cloud, no API costs, full privacy.

The OpenAI-Compatible Endpoint

When Ollama is running, it exposes an OpenAI-compatible endpoint at:

http://localhost:11434/v1

This endpoint supports the same request and response format as OpenAI’s API, including the /v1/chat/completions endpoint used by most applications.

Switching from OpenAI to Ollama: One Line Change

If you are using the OpenAI Python SDK, you only need to change the base_url and set a dummy API key:

from openai import OpenAI

# Before: pointing at OpenAI
# client = OpenAI(api_key="sk-...")

# After: pointing at local Ollama
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required by the SDK but ignored by Ollama
)

response = client.chat.completions.create(
    model="llama3.3",  # Any model you have pulled in Ollama
    messages=[
        {"role": "user", "content": "Explain how neural networks work"}
    ]
)
print(response.choices[0].message.content)

Supported Endpoints

Ollama’s OpenAI-compatible layer supports the most commonly used endpoints:

  • POST /v1/chat/completions — chat with a model (streaming supported)
  • POST /v1/completions — raw text completions
  • POST /v1/embeddings — generate embeddings for RAG and similarity search
  • GET /v1/models — list available models

Using with LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama3.3",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = llm.invoke("What are the benefits of local AI inference?")
print(response.content)

Using with curl

curl http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "llama3.3",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Responses

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

stream = client.chat.completions.create(
    model="llama3.3",
    messages=[{"role": "user", "content": "Write a short story about a robot"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using with Tools and Applications That Support Custom OpenAI Endpoints

Many tools allow you to set a custom OpenAI base URL. Common examples:

  • Continue (VS Code) — set provider to openai, base URL to http://localhost:11434/v1
  • Open WebUI — natively supports Ollama, but also works via the OpenAI compatibility layer
  • LibreChat — supports custom OpenAI endpoints
  • n8n — use the OpenAI node with a custom base URL
  • Any app with “Custom OpenAI endpoint” or “base URL” settings

Embeddings with the OpenAI-Compatible Endpoint

response = client.embeddings.create(
    model="nomic-embed-text",  # Pull this model first
    input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Limitations vs the Real OpenAI API

  • Not all OpenAI API features are supported — function calling support varies by model
  • Model names are your local Ollama model names, not OpenAI model names
  • No authentication or rate limiting by default (fine for local use)
  • Vision inputs follow Ollama’s format rather than exactly matching OpenAI’s image input format

For a full reference of every Ollama command and flag, see the Ollama CLI Cheat Sheet.