Home / AI / Ollama / How to Use Ollama with Python

Ollama

How to Use Ollama with Python

2. Method 1: The Official Ollama Python Library

5. Generate (single prompt, no chat history)

6. Method 2: Using the OpenAI Python SDK

7. Method 3: Direct HTTP Requests

10. Listing and Managing Models

11. Choosing the Right Model for Your Python Project

Ollama’s local REST API makes it straightforward to call local LLMs from Python — either directly with the requests library, via the official Ollama Python package, or through the OpenAI SDK. This guide covers all three approaches so you can pick the one that fits your project.

Prerequisites

Make sure Ollama is installed and running, and that you’ve pulled at least one model:

ollama pull llama3.1
ollama serve  # starts the API on http://localhost:11434

Method 1: The Official Ollama Python Library

The simplest option. Install the package and start calling models immediately:

pip install ollama

Basic chat completion

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[
        {'role': 'user', 'content': 'Explain what a REST API is in simple terms.'}
    ]
)

print(response['message']['content'])

Streaming responses

import ollama

stream = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Write a short poem about Python.'}],
    stream=True
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Generate (single prompt, no chat history)

import ollama

response = ollama.generate(model='llama3.1', prompt='What is the capital of France?')
print(response['response'])

Method 2: Using the OpenAI Python SDK

If your project already uses the OpenAI SDK, you can point it at your local Ollama instance with one change — the base_url:

pip install openai

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # required but ignored by Ollama
)

response = client.chat.completions.create(
    model='llama3.1',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Summarise the key features of Python 3.12.'}
    ]
)

print(response.choices[0].message.content)

This approach is ideal for projects where you want to swap between local Ollama models and OpenAI’s API without changing your application code.

Method 3: Direct HTTP Requests

No dependencies required — just requests:

import requests
import json

response = requests.post(
    'http://localhost:11434/api/chat',
    json={
        'model': 'llama3.1',
        'messages': [{'role': 'user', 'content': 'What is machine learning?'}],
        'stream': False
    }
)

data = response.json()
print(data['message']['content'])

Working with Embeddings

Ollama also supports embedding generation, which is useful for RAG pipelines and semantic search:

import ollama

# Pull an embedding model first
# ollama pull nomic-embed-text

response = ollama.embeddings(
    model='nomic-embed-text',
    prompt='The quick brown fox jumps over the lazy dog'
)

embedding = response['embedding']
print(f"Embedding dimensions: {len(embedding)}")

Multi-turn Conversation

Build a simple chat loop that maintains conversation history:

import ollama

messages = []

print("Chat with Llama 3.1 (type 'quit' to exit)")

while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break

    messages.append({'role': 'user', 'content': user_input})

    response = ollama.chat(model='llama3.1', messages=messages)
    assistant_message = response['message']['content']

    messages.append({'role': 'assistant', 'content': assistant_message})
    print(f"Assistant: {assistant_message}\n")

Listing and Managing Models

import ollama

# List available models
models = ollama.list()
for model in models['models']:
    print(model['name'])

# Pull a new model
ollama.pull('mistral')

# Check model details
info = ollama.show('llama3.1')
print(info['modelfile'])

Choosing the Right Model for Your Python Project

The best model depends on your use case:

Coding assistance: See best Ollama models for coding
Summarisation: See best Ollama models for summarisation
RAG and embeddings: See best Ollama models for RAG

Next Steps

Once you’re comfortable calling Ollama from Python, the natural next step is building a RAG pipeline with LangChain or integrating Ollama into a web app using FastAPI. You can also explore the Docker deployment guide for running Ollama in a containerised environment.

How to Use Ollama with Python

Table of Contents

1. Prerequisites

2. Method 1: The Official Ollama Python Library

3. Basic chat completion

4. Streaming responses

5. Generate (single prompt, no chat history)

6. Method 2: Using the OpenAI Python SDK

7. Method 3: Direct HTTP Requests

8. Working with Embeddings

9. Multi-turn Conversation

10. Listing and Managing Models

11. Choosing the Right Model for Your Python Project

12. Next Steps

Prerequisites

Method 1: The Official Ollama Python Library

Basic chat completion

Streaming responses

Generate (single prompt, no chat history)

Method 2: Using the OpenAI Python SDK

Method 3: Direct HTTP Requests

Working with Embeddings

Multi-turn Conversation

Listing and Managing Models

Choosing the Right Model for Your Python Project

Next Steps

Llama 3 vs Mistral on Ollama: Which Model Should You Run?

How to Use Ollama with VS Code (Continue and Cline)

Leave a Reply Cancel reply

How to Use Ollama with Python

Table of Contents

Prerequisites

Method 1: The Official Ollama Python Library

Basic chat completion

Streaming responses

Generate (single prompt, no chat history)

Method 2: Using the OpenAI Python SDK

Method 3: Direct HTTP Requests

Working with Embeddings

Multi-turn Conversation

Listing and Managing Models

Choosing the Right Model for Your Python Project

Next Steps

Llama 3 vs Mistral on Ollama: Which Model Should You Run?

How to Use Ollama with VS Code (Continue and Cline)

Sign Up For Daily Newsletter

Related Posts

Leave a Reply Cancel reply