Ollama includes a built-in OpenAI-compatible API endpoint. This means you can take existing code written for the OpenAI API — Python scripts, applications, integrations — and point them at your local Ollama instance with a single line change. No cloud, no API costs, full privacy.
The OpenAI-Compatible Endpoint
When Ollama is running, it exposes an OpenAI-compatible endpoint at:
http://localhost:11434/v1
This endpoint supports the same request and response format as OpenAI’s API, including the /v1/chat/completions endpoint used by most applications.
Switching from OpenAI to Ollama: One Line Change
If you are using the OpenAI Python SDK, you only need to change the base_url and set a dummy API key:
from openai import OpenAI
# Before: pointing at OpenAI
# client = OpenAI(api_key="sk-...")
# After: pointing at local Ollama
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Required by the SDK but ignored by Ollama
)
response = client.chat.completions.create(
model="llama3.3", # Any model you have pulled in Ollama
messages=[
{"role": "user", "content": "Explain how neural networks work"}
]
)
print(response.choices[0].message.content)
Supported Endpoints
Ollama’s OpenAI-compatible layer supports the most commonly used endpoints:
POST /v1/chat/completions— chat with a model (streaming supported)POST /v1/completions— raw text completionsPOST /v1/embeddings— generate embeddings for RAG and similarity searchGET /v1/models— list available models
Using with LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="llama3.3",
base_url="http://localhost:11434/v1",
api_key="ollama"
)
response = llm.invoke("What are the benefits of local AI inference?")
print(response.content)
Using with curl
curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama3.3",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Streaming Responses
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
stream = client.chat.completions.create(
model="llama3.3",
messages=[{"role": "user", "content": "Write a short story about a robot"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Using with Tools and Applications That Support Custom OpenAI Endpoints
Many tools allow you to set a custom OpenAI base URL. Common examples:
- Continue (VS Code) — set provider to
openai, base URL tohttp://localhost:11434/v1 - Open WebUI — natively supports Ollama, but also works via the OpenAI compatibility layer
- LibreChat — supports custom OpenAI endpoints
- n8n — use the OpenAI node with a custom base URL
- Any app with “Custom OpenAI endpoint” or “base URL” settings
Embeddings with the OpenAI-Compatible Endpoint
response = client.embeddings.create(
model="nomic-embed-text", # Pull this model first
input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
Limitations vs the Real OpenAI API
- Not all OpenAI API features are supported — function calling support varies by model
- Model names are your local Ollama model names, not OpenAI model names
- No authentication or rate limiting by default (fine for local use)
- Vision inputs follow Ollama’s format rather than exactly matching OpenAI’s image input format
