Home / AI / Ollama / Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama OpenAI API Compatibility: Drop-In Replacement Guide

Ollama includes a built-in OpenAI-compatible API endpoint. This means you can take existing code written for the OpenAI API — Python scripts, applications, integrations — and point them at your local Ollama instance with a single line change. No cloud, no API costs, full privacy.

The OpenAI-Compatible Endpoint

When Ollama is running, it exposes an OpenAI-compatible endpoint at:

http://localhost:11434/v1

This endpoint supports the same request and response format as OpenAI’s API, including the /v1/chat/completions endpoint used by most applications.

Switching from OpenAI to Ollama: One Line Change

If you are using the OpenAI Python SDK, you only need to change the base_url and set a dummy API key:

from openai import OpenAI

# Before: pointing at OpenAI
# client = OpenAI(api_key="sk-...")

# After: pointing at local Ollama
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required by the SDK but ignored by Ollama
)

response = client.chat.completions.create(
    model="llama3.3",  # Any model you have pulled in Ollama
    messages=[
        {"role": "user", "content": "Explain how neural networks work"}
    ]
)
print(response.choices[0].message.content)

Supported Endpoints

Ollama’s OpenAI-compatible layer supports the most commonly used endpoints:

  • POST /v1/chat/completions — chat with a model (streaming supported)
  • POST /v1/completions — raw text completions
  • POST /v1/embeddings — generate embeddings for RAG and similarity search
  • GET /v1/models — list available models

Using with LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama3.3",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = llm.invoke("What are the benefits of local AI inference?")
print(response.content)

Using with curl

curl http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "llama3.3",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming Responses

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

stream = client.chat.completions.create(
    model="llama3.3",
    messages=[{"role": "user", "content": "Write a short story about a robot"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using with Tools and Applications That Support Custom OpenAI Endpoints

Many tools allow you to set a custom OpenAI base URL. Common examples:

  • Continue (VS Code) — set provider to openai, base URL to http://localhost:11434/v1
  • Open WebUI — natively supports Ollama, but also works via the OpenAI compatibility layer
  • LibreChat — supports custom OpenAI endpoints
  • n8n — use the OpenAI node with a custom base URL
  • Any app with “Custom OpenAI endpoint” or “base URL” settings

Embeddings with the OpenAI-Compatible Endpoint

response = client.embeddings.create(
    model="nomic-embed-text",  # Pull this model first
    input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Limitations vs the Real OpenAI API

  • Not all OpenAI API features are supported — function calling support varies by model
  • Model names are your local Ollama model names, not OpenAI model names
  • No authentication or rate limiting by default (fine for local use)
  • Vision inputs follow Ollama’s format rather than exactly matching OpenAI’s image input format

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]