CrewAI is an open-source Python framework for building multi-agent AI workflows — where instead of one model answering a question, you define a team of agents with different roles that collaborate to complete a task. Running it with Ollama keeps the entire pipeline local: no OpenAI API keys, no per-token costs, no data sent to external services. This guide shows you how to set it up and build your first crew.
What Is CrewAI?
CrewAI models how teams work. You define agents (each with a role, goal, and backstory), assign them tasks, wire them into a crew, and kick off the workflow. The agents can hand work off to each other, use tools like web search or file reading, and produce a final output that combines their individual contributions.
The framework has four core building blocks:
- Agent — a role with a defined purpose (e.g. “Senior Researcher”, “Technical Writer”)
- Task — a specific piece of work with an expected output
- Tool — capabilities agents can call, such as web search, file I/O, or custom functions
- Crew — the assembled team that executes the tasks in sequence or in parallel
Where a single LLM call is a request-response, a CrewAI workflow is a pipeline. Each agent completes its task and passes context forward. The result is more thorough than any single prompt because the work is broken into focused steps.
Why Run CrewAI with Ollama?
Most CrewAI tutorials use OpenAI as the LLM backend. Ollama gives you the same capability with no API costs and full privacy. A research-and-write workflow that would cost $0.10–$0.50 per run with GPT-4o costs nothing with a local model. For development, testing, and internal tooling, that difference adds up fast.
It also means your agent workflows can process confidential data — internal documents, client information, proprietary research — without that data leaving your network.
What You Need
- Ollama installed and running with at least one capable model pulled (qwen3:14b or llama3.3 recommended)
- Python 3.10 or higher
- pip
Step 1 — Install CrewAI
pip install crewai crewai-tools
This installs the core framework and the optional tools package (web search, file read/write, scraping).
Step 2 — Configure Ollama as the LLM Backend
CrewAI uses LiteLLM under the hood, which supports Ollama natively. Set the model using the ollama/ prefix:
from crewai import Agent, Task, Crew, LLM
llm = LLM(
model="ollama/qwen3:14b",
base_url="http://localhost:11434"
)
That is the only configuration needed to point CrewAI at your local Ollama instance. Pass this llm object to each agent you create.
Step 3 — Build Your First Crew
Here is a complete working example: a two-agent crew that researches a topic and then writes a summary report.
from crewai import Agent, Task, Crew, LLM
llm = LLM(model="ollama/qwen3:14b", base_url="http://localhost:11434")
# Define agents
researcher = Agent(
role="Senior Research Analyst",
goal="Find comprehensive information about the given topic",
backstory="You are an expert researcher with a talent for finding and synthesising information.",
llm=llm,
verbose=True
)
writer = Agent(
role="Technical Writer",
goal="Produce a clear, well-structured written summary",
backstory="You turn complex research into readable, accurate reports.",
llm=llm,
verbose=True
)
# Define tasks
research_task = Task(
description="Research the key advantages and use cases of running AI models locally using Ollama. Cover: privacy, cost, hardware requirements, and limitations.",
expected_output="A detailed set of research notes covering the main points, minimum 500 words.",
agent=researcher
)
write_task = Task(
description="Using the research notes provided, write a clear summary article suitable for a technical blog audience.",
expected_output="A structured article with introduction, main sections, and conclusion.",
agent=writer,
context=[research_task]
)
# Assemble and run the crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
verbose=True
)
result = crew.kickoff()
print(result)
Run this with python crew.py. The researcher agent completes its task first, then the writer agent receives that output as context and produces the final article. The full chain runs locally on your Ollama models.
Adding Tools
Tools extend what your agents can do. The most commonly used are web search and file handling:
from crewai_tools import SerperDevTool, FileReadTool
search_tool = SerperDevTool() # requires SERPER_API_KEY env var
file_tool = FileReadTool()
researcher = Agent(
role="Senior Research Analyst",
goal="Research the topic thoroughly using web search",
backstory="Expert researcher who finds current, accurate information.",
tools=[search_tool],
llm=llm
)
Web search requires a Serper API key (free tier available). If you want fully local tooling, use FileReadTool, DirectoryReadTool, and CodeInterpreterTool, which need no external APIs.
Process Types
By default, CrewAI runs tasks sequentially — each agent completes its task before the next begins. You can also run tasks in parallel:
from crewai import Process
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.hierarchical, # manager agent coordinates the team
manager_llm=llm
)
Hierarchical process adds a manager agent that plans, delegates, and reviews the work — useful for complex multi-step workflows where task dependencies are not linear.
Which Models Work Best
| Use Case | Recommended Model | Notes |
|---|---|---|
| General multi-agent workflows | qwen3:14b | Best balance of capability and speed |
| Complex reasoning / manager agent | qwen3:32b or llama3.3 | Better at instruction following and planning |
| Fast iteration / testing | qwen3:8b | Quick feedback loop during development |
| Low-spec hardware | qwen3:4b | Works but struggles with complex tasks |
Larger models are noticeably better at following the structured output formats CrewAI expects. If an agent is producing malformed responses or going off-task, switching to a larger model usually fixes it before any prompt engineering is needed.
Practical Use Cases
- Content pipelines — research agent finds information, writer agent drafts, editor agent refines
- Code review — analyst agent reads code, security agent checks for vulnerabilities, documentation agent writes summary
- Data analysis — data reader agent parses files, analyst agent interprets, reporter agent produces output
- Internal knowledge base queries — combine with file tools to process internal documents
- Automated reporting — schedule a crew to run nightly and produce summaries from data files
Limitations to Know
- Token cost multiplies — each agent turn consumes tokens. A 3-agent crew can use 3–5x the tokens of a single prompt. On local models this is free, but it does mean longer run times.
- Smaller models struggle with structured output — agents that need to return JSON or follow strict formats work better with 14B+ models.
- Thinking mode and CrewAI — if using Qwen3, disable thinking mode for agent tasks (
model="ollama/qwen3:14b --no-think"is not directly supported, but you can setnum_ctxand avoid/thinkprompts in backstories).
Related Guides
- What Is Ollama? A Beginner’s Guide to Local AI
- How to Run Qwen3 on Ollama
- Ollama + MCP: Building Local AI Agents Without the Cloud
- LibreChat + Ollama: Setup Guide
- AnythingLLM + Ollama: Chat with Your Documents Privately






