Home / AI / Ollama / CrewAI + Ollama: Run Multi-Agent AI Workflows Locally

CrewAI + Ollama: Run Multi-Agent AI Workflows Locally

CrewAI agents collaborating in a local AI workflow powered by Ollama

CrewAI is an open-source Python framework for building multi-agent AI workflows — where instead of one model answering a question, you define a team of agents with different roles that collaborate to complete a task. Running it with Ollama keeps the entire pipeline local: no OpenAI API keys, no per-token costs, no data sent to external services. This guide shows you how to set it up and build your first crew.

What Is CrewAI?

CrewAI models how teams work. You define agents (each with a role, goal, and backstory), assign them tasks, wire them into a crew, and kick off the workflow. The agents can hand work off to each other, use tools like web search or file reading, and produce a final output that combines their individual contributions.

The framework has four core building blocks:

  • Agent — a role with a defined purpose (e.g. “Senior Researcher”, “Technical Writer”)
  • Task — a specific piece of work with an expected output
  • Tool — capabilities agents can call, such as web search, file I/O, or custom functions
  • Crew — the assembled team that executes the tasks in sequence or in parallel

Where a single LLM call is a request-response, a CrewAI workflow is a pipeline. Each agent completes its task and passes context forward. The result is more thorough than any single prompt because the work is broken into focused steps.

Why Run CrewAI with Ollama?

Most CrewAI tutorials use OpenAI as the LLM backend. Ollama gives you the same capability with no API costs and full privacy. A research-and-write workflow that would cost $0.10–$0.50 per run with GPT-4o costs nothing with a local model. For development, testing, and internal tooling, that difference adds up fast.

It also means your agent workflows can process confidential data — internal documents, client information, proprietary research — without that data leaving your network.

What You Need

  • Ollama installed and running with at least one capable model pulled (qwen3:14b or llama3.3 recommended)
  • Python 3.10 or higher
  • pip

Step 1 — Install CrewAI

pip install crewai crewai-tools

This installs the core framework and the optional tools package (web search, file read/write, scraping).

Step 2 — Configure Ollama as the LLM Backend

CrewAI uses LiteLLM under the hood, which supports Ollama natively. Set the model using the ollama/ prefix:

from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="ollama/qwen3:14b",
    base_url="http://localhost:11434"
)

That is the only configuration needed to point CrewAI at your local Ollama instance. Pass this llm object to each agent you create.

Step 3 — Build Your First Crew

Here is a complete working example: a two-agent crew that researches a topic and then writes a summary report.

from crewai import Agent, Task, Crew, LLM

llm = LLM(model="ollama/qwen3:14b", base_url="http://localhost:11434")

# Define agents
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive information about the given topic",
    backstory="You are an expert researcher with a talent for finding and synthesising information.",
    llm=llm,
    verbose=True
)

writer = Agent(
    role="Technical Writer",
    goal="Produce a clear, well-structured written summary",
    backstory="You turn complex research into readable, accurate reports.",
    llm=llm,
    verbose=True
)

# Define tasks
research_task = Task(
    description="Research the key advantages and use cases of running AI models locally using Ollama. Cover: privacy, cost, hardware requirements, and limitations.",
    expected_output="A detailed set of research notes covering the main points, minimum 500 words.",
    agent=researcher
)

write_task = Task(
    description="Using the research notes provided, write a clear summary article suitable for a technical blog audience.",
    expected_output="A structured article with introduction, main sections, and conclusion.",
    agent=writer,
    context=[research_task]
)

# Assemble and run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    verbose=True
)

result = crew.kickoff()
print(result)

Run this with python crew.py. The researcher agent completes its task first, then the writer agent receives that output as context and produces the final article. The full chain runs locally on your Ollama models.

Adding Tools

Tools extend what your agents can do. The most commonly used are web search and file handling:

from crewai_tools import SerperDevTool, FileReadTool

search_tool = SerperDevTool()  # requires SERPER_API_KEY env var
file_tool = FileReadTool()

researcher = Agent(
    role="Senior Research Analyst",
    goal="Research the topic thoroughly using web search",
    backstory="Expert researcher who finds current, accurate information.",
    tools=[search_tool],
    llm=llm
)

Web search requires a Serper API key (free tier available). If you want fully local tooling, use FileReadTool, DirectoryReadTool, and CodeInterpreterTool, which need no external APIs.

Process Types

By default, CrewAI runs tasks sequentially — each agent completes its task before the next begins. You can also run tasks in parallel:

from crewai import Process

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.hierarchical,  # manager agent coordinates the team
    manager_llm=llm
)

Hierarchical process adds a manager agent that plans, delegates, and reviews the work — useful for complex multi-step workflows where task dependencies are not linear.

Which Models Work Best

Use Case Recommended Model Notes
General multi-agent workflows qwen3:14b Best balance of capability and speed
Complex reasoning / manager agent qwen3:32b or llama3.3 Better at instruction following and planning
Fast iteration / testing qwen3:8b Quick feedback loop during development
Low-spec hardware qwen3:4b Works but struggles with complex tasks

Larger models are noticeably better at following the structured output formats CrewAI expects. If an agent is producing malformed responses or going off-task, switching to a larger model usually fixes it before any prompt engineering is needed.

Practical Use Cases

  • Content pipelines — research agent finds information, writer agent drafts, editor agent refines
  • Code review — analyst agent reads code, security agent checks for vulnerabilities, documentation agent writes summary
  • Data analysis — data reader agent parses files, analyst agent interprets, reporter agent produces output
  • Internal knowledge base queries — combine with file tools to process internal documents
  • Automated reporting — schedule a crew to run nightly and produce summaries from data files

Limitations to Know

  • Token cost multiplies — each agent turn consumes tokens. A 3-agent crew can use 3–5x the tokens of a single prompt. On local models this is free, but it does mean longer run times.
  • Smaller models struggle with structured output — agents that need to return JSON or follow strict formats work better with 14B+ models.
  • Thinking mode and CrewAI — if using Qwen3, disable thinking mode for agent tasks (model="ollama/qwen3:14b --no-think" is not directly supported, but you can set num_ctx and avoid /think prompts in backstories).