Home / AI / Ollama / How to Use Ollama with LangChain

How to Use Ollama with LangChain

LangChain is the most widely used framework for building LLM-powered applications. Combining it with Ollama gives you a fully local, private AI pipeline — no API keys, no data leaving your machine, no per-token costs. This guide covers the essentials: basic calls, chains, and a working RAG pipeline.

Prerequisites

pip install langchain langchain-ollama langchain-community chromadb

Make sure Ollama is running with at least one model pulled:

ollama pull llama3.1
ollama pull nomic-embed-text  # for embeddings / RAG

Basic Chat with LangChain and Ollama

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1")

response = llm.invoke("What is the difference between RAM and storage?")
print(response.content)

Streaming

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1")

for chunk in llm.stream("Explain Docker in simple terms."):
    print(chunk.content, end="", flush=True)

Prompt Templates

Use prompt templates to keep your prompts reusable and structured:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3.1")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains technical concepts simply."),
    ("human", "Explain {concept} as if I'm a complete beginner.")
])

chain = prompt | llm

response = chain.invoke({"concept": "vector databases"})
print(response.content)

Building a RAG Pipeline

Retrieval-Augmented Generation (RAG) lets your model answer questions based on your own documents. Here’s a complete working example using a local vector store.

For model recommendations for RAG, see the best Ollama models for RAG.

Step 1: Load and split documents

from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load a single file
loader = TextLoader("my_document.txt")
docs = loader.load()

# Or load all .txt files from a folder
# loader = DirectoryLoader("./docs", glob="**/*.txt")
# docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

Step 2: Create embeddings and vector store

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

Step 3: Build the retrieval chain

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

llm = ChatOllama(model="llama3.1")

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the context below. 
If you don't know the answer from the context, say so.

Context: {context}

Question: {question}
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("What are the main topics covered in the document?")
print(answer)

Step 4: Load an existing vector store

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Load previously persisted store
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)

Conversation Memory

Add memory to maintain context across multiple questions:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOllama(model="llama3.1")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

config = {"configurable": {"session_id": "user_1"}}

print(chain_with_history.invoke({"input": "My name is Alice."}, config=config).content)
print(chain_with_history.invoke({"input": "What's my name?"}, config=config).content)

Choosing Models for LangChain + Ollama

Next Steps

Once you have a working LangChain + Ollama setup, consider packaging it with Docker for portable deployment, or building a web interface using FastAPI and the Ollama Python library.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *