If you’re running n8n for workflow automation and Ollama for local large language models, combining the two gives you a genuinely useful AI automation stack that keeps every byte of data on your own infrastructure. No OpenAI API bills, no data leaving your network, and no compliance headaches when processing customer information under GDPR. This guide walks through the architecture, how to connect the two tools, and three practical workflows you can build today.
The Architecture
n8n and Ollama can run on the same server or on separate machines within the same local network. When co-located, n8n calls Ollama’s REST API at http://localhost:11434. When running on separate hosts — for example, Ollama on a dedicated GPU machine and n8n on your main server — use the host’s local IP or hostname: http://ollama-host:11434. Ollama exposes a simple HTTP API; there is no authentication required by default, so keep both services behind your firewall or VPN.
Ollama must have the model you intend to use already pulled before n8n tries to call it. Run ollama pull llama3.2 (or your chosen model) on the Ollama host before building any workflows. Models are loaded into memory on first request and stay resident, so the first call will be slower than subsequent ones.
Connecting n8n to Ollama
There are two approaches. The most universal is the HTTP Request node, which works in every version of n8n. Set the method to POST, the URL to http://localhost:11434/api/generate, and the body to JSON:
{
"model": "llama3.2",
"prompt": "{{ $json.text }}",
"stream": false
}Setting stream to false makes Ollama return a single JSON response rather than a streamed token-by-token output, which is what you want for workflow automation. The response will contain a response field with the model’s text output — always extract this with an expression like {{ $json.response }} in subsequent nodes.
Alternatively, recent versions of n8n include a built-in Ollama node available in the AI nodes section (look under “AI” when adding a node). This handles the API call and response parsing automatically and supports chain-based AI workflows using LangChain under the hood. For simple prompt-in/text-out tasks, the HTTP Request node is more transparent and easier to debug.
Workflow 1: Email Categorisation
This workflow classifies inbound emails and applies Gmail labels automatically — no external AI API involved.
- Gmail Trigger — polls your inbox for new messages. Configure it with your Gmail credentials and set the poll interval (every minute is reasonable for email triage).
- HTTP Request node — calls Ollama with a classification prompt. Set the body to:
{"model": "llama3.2", "prompt": "Classify this email into exactly one category: Support, Sales, Spam, or Other. Reply with the category name only.\n\nSubject: {{ $json.subject }}\n\nBody: {{ $json.snippet }}", "stream": false}. Extract the result with{{ $json.response.trim() }}. - IF node — branch on the extracted category. Add conditions for each value (Support, Sales, Spam) and route accordingly.
- Gmail node — on each branch, apply the relevant label to the message using the message ID passed through from the trigger.
Recommended model: llama3.2 handles short classification prompts well and is fast enough for near-real-time email processing. Keep your prompt concise and instruct the model to return only the category name — this prevents parsing issues when extracting the response.
Workflow 2: Document Summarisation API
This workflow exposes a summarisation endpoint your team can call from any tool — a browser extension, a script, or another workflow.
- Webhook Trigger — listens for POST requests containing a
textfield in the JSON body. n8n gives you a unique webhook URL when you activate the workflow. - HTTP Request node — calls Ollama with the submitted text:
{"model": "mistral", "prompt": "Summarise the following in exactly 3 bullet points. Be concise.\n\n{{ $json.body.text }}", "stream": false}. Extract the response field. - Respond to Webhook node — returns the summary to the caller: set the response body to
{"summary": "{{ $json.response }}"}with a 200 status code.
Recommended model: Mistral performs well on summarisation tasks and tends to follow structured output instructions reliably. If you’re summarising longer documents, consider chunking the input before sending — Ollama’s context window varies by model, and sending 20,000 words in a single prompt will cause errors or truncation.
This is an intentionally async-tolerant workflow. Local inference typically takes 5–30 seconds depending on your hardware and model size. For internal tools where a few seconds of latency is acceptable, this is perfectly usable. For customer-facing real-time responses, cloud APIs will be faster — but that defeats the purpose of keeping data local.
Workflow 3: Weekly Content Ideas Digest
This workflow generates blog post ideas on a schedule and delivers them as a weekly digest.
- Schedule Trigger — set to run every Monday at 8am. No external service needed.
- HTTP Request node — calls Ollama with a content generation prompt:
{"model": "llama3.2", "prompt": "Generate exactly 5 blog post ideas for a B2B SaaS company selling CRM software to UK small businesses. Format each idea as a title only, one per line.", "stream": false}. Adjust the topic to match your site. Extract{{ $json.response }}. - Google Sheets node — append the generated ideas to a spreadsheet. Use the “Append Row” operation and map the date and the ideas text to appropriate columns. This builds a running backlog over time.
- Send Email node (Gmail or SMTP) — send a digest to yourself or your content team with the five ideas in the message body. Map the Ollama response directly into the email body field.
Recommended model: llama3.2 is a solid general-purpose choice here. If you find the ideas are too generic, try adding more context to your prompt — industry, target audience, content format preferences — and iterate. Because this runs weekly and asynchronously, inference time is irrelevant.
Handling Ollama’s Response Format
Every Ollama API response from /api/generate returns a JSON object. The field you need is always response. In n8n expressions, access it as {{ $json.response }}. Watch for leading or trailing whitespace — use .trim() where you’re matching against expected values (such as category names in an IF node). If a model returns unexpected formatting, tighten your prompt: instruct it explicitly to reply with only the required output and nothing else.
Performance Considerations
Local inference is slower than cloud APIs. A GPU-accelerated Ollama instance (NVIDIA with CUDA, or Apple Silicon) will process a prompt in 2–10 seconds for a 7B parameter model. CPU-only inference on the same model can take 30–90 seconds. This is fine for asynchronous workflows — email triage, scheduled generation, webhook-based tools — but unsuitable for anything requiring a sub-second response.
For UK businesses processing customer data, the trade-off is compelling: you eliminate OpenAI API costs entirely, all data is processed on your own hardware, and you have a clear, auditable data flow that satisfies GDPR requirements without needing to review a third-party’s data processing agreement. For workflows handling support tickets, customer emails, or internal documents, keeping inference local is a straightforward way to reduce both cost and compliance risk.
Related n8n Guides
- n8n — The Complete Self-Hosted Automation Guide
- What Is n8n? The Self-Hosted Automation Tool Explained
- How to Install n8n with Docker: Self-Hosted Setup Guide
- n8n vs Zapier vs Make: Which Automation Tool Should You Choose?
- How to Build Your First n8n Workflow: A Beginner’s Guide