A joint analysis by SentinelOne and Censys published in early 2026 scanned the internet for 293 days and found 175,000 unique Ollama hosts exposed across 130 countries — many running multiple AI models with no authentication, no firewall, and no protection whatsoever. Nearly half had tool-calling capabilities enabled, meaning an attacker could not just steal your compute but potentially execute code on your server. Ollama ships with zero authentication by default, and every tutorial that tells you to set OLLAMA_HOST=0.0.0.0 to enable remote access will make your server one of those 175,000 if you are on a public VPS. This guide shows you the right way to expose Ollama safely — and what the wrong way costs.
Are You Already Exposed? How to Check
Before changing anything, confirm your current exposure. Run this from the server:
ss -tlnp | grep 11434
If the output shows 0.0.0.0:11434 or *:11434, Ollama is listening on all interfaces. If your server has a public IP and no firewall blocking port 11434, you are exposed right now. If it shows 127.0.0.1:11434, you are fine — Ollama is local-only.
To check whether port 11434 is reachable from outside, you can test from a separate machine:
curl -s http://YOUR_SERVER_PUBLIC_IP:11434/api/tags
If this returns a JSON list of your installed models, the API is publicly accessible with no authentication. Treat this as an active security incident and remediate immediately using the steps in this guide.
Also check your firewall rules:
sudo ufw status | grep 11434
If nothing is returned, there is no UFW rule protecting port 11434. Whether that matters depends on your server’s network configuration — a cloud provider security group may protect it at the infrastructure level — but relying solely on a cloud firewall without application-level defence is a single point of failure.
Why OLLAMA_HOST=0.0.0.0 on a Public VPS Is a Security Mistake
Ollama’s default configuration binds to 127.0.0.1:11434, which means only processes on the same machine can reach the API. When you change this to OLLAMA_HOST=0.0.0.0, Ollama binds to every network interface simultaneously: loopback, LAN, any VPN adapters, and on a public VPS the internet-facing NIC.
There is no authentication layer in Ollama itself. The API does not support API keys, bearer tokens, or any login mechanism out of the box. The moment port 11434 is reachable from the internet, every one of these endpoints is fully open:
GET /api/tags— lists all installed models, revealing your project structure and any fine-tuned proprietary modelsPOST /api/generate— runs inference at your expensePOST /api/pull— downloads any model from the internet onto your server, consuming disk space and bandwidthPOST /api/push— can be used to inject a modified or backdoored model into your registryDELETE /api/delete— deletes any installed model with no confirmation prompt
On a home network where port 11434 is not forwarded, setting OLLAMA_HOST=0.0.0.0 is relatively low-risk — your router provides a basic perimeter. On a VPS or cloud instance with a public IP, it is immediate and silent exposure.
What an Attacker Can Do with Your Exposed Ollama API
This is not a theoretical risk. The Censys/SentinelOne research documented an active criminal marketplace (attributed to a threat actor known as “Hecker”) reselling hijacked AI compute — a practice now called LLMjacking. Here is what actually happens:
- Free inference (LLMjacking) — the most common attack. Spam
/api/generatewith requests and consume your GPU. One analysis estimated $46,000/day in compute abuse from a single high-spec exposed server. You pay the electricity and hosting bill; the attacker gets the output. - Model exfiltration — pull your fine-tuned models via the copy endpoints. If your model was trained on proprietary data, customer records, or internal documents, those weights are now exfiltrated.
- Model poisoning — replace a legitimate model with a backdoored variant via
/api/push. Any application using that model will then be running attacker-controlled inference without any visible indication. - Denial of service — trigger maximum-length inference requests to exhaust GPU memory and crash the Ollama service, or repeatedly pull large models to fill the disk.
- Lateral movement — if tool-calling is enabled, attackers can craft prompts that cause the model to make outbound HTTP calls, interact with internal APIs, or execute shell commands, pivoting from the Ollama port into the wider network.
UpGuard independently identified six critical vulnerabilities in the Ollama framework, including CVE-2024-28224, which allows unauthenticated callers to read and exfiltrate files from the host filesystem via certain API endpoints on unpatched versions.
The Right Way: Tailscale for Secure Remote Access
Tailscale is the cleanest solution for accessing Ollama from another machine without exposing it to the internet. It creates a private encrypted network (a “tailnet”) between your devices — Ollama on the server is reachable only by machines you have explicitly authorised. There are no open ports, no public IPs, and no passwords to manage on the Ollama side.
Step 1: Install Tailscale on your server
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
Log in with your Tailscale account when prompted. Your server will appear in the Tailscale admin console at tailscale.com.
Step 2: Get your server’s Tailscale IP
tailscale ip -4
This returns a stable 100.x.x.x address. You will bind Ollama to this address so it only listens on the Tailscale interface.
Step 3: Bind Ollama to the Tailscale IP via systemd
sudo systemctl edit ollama.service
In the override editor that opens, add:
[Service]
Environment="OLLAMA_HOST=100.x.x.x:11434"
Replace 100.x.x.x with your actual Tailscale IP. Save, then reload:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Step 4: Verify the bind address
ss -tlnp | grep 11434
You should now see 100.x.x.x:11434 — not 0.0.0.0:11434. If 0.0.0.0 still appears, the systemd override did not apply; check the file was saved to the correct override path (/etc/systemd/system/ollama.service.d/override.conf).
Step 5: Lock it down with UFW as a second layer
sudo ufw allow in on tailscale0 to any port 11434 proto tcp
sudo ufw deny in to any port 11434 proto tcp
sudo ufw reload
This adds defence-in-depth: even if Ollama’s bind address were ever misconfigured, UFW blocks port 11434 on all non-Tailscale interfaces.
From any other device enrolled in your tailnet, Ollama is now reachable at http://100.x.x.x:11434 or via MagicDNS at http://your-server.your-tailnet.ts.net:11434. Install Tailscale on your laptop or desktop, enroll it in the same tailnet, and access your models as if they were local.
Tailscale Serve — the zero-config option
If you do not want to change Ollama’s bind address at all, Tailscale Serve proxies the local API through your tailnet with automatic TLS in one command:
tailscale serve --bg http://localhost:11434
Ollama stays bound to 127.0.0.1. Tailscale handles routing, TLS, and access control. Only authenticated tailnet members can reach it.
If You Must Expose Ollama Publicly: Nginx + Basic Auth
In situations where a VPN is not an option — for example, a shared API endpoint for a small team — a reverse proxy with authentication is the minimum viable protection. This is still internet-facing, so it carries more risk than Tailscale, but it adds an authentication layer that the raw Ollama API completely lacks.
Keep Ollama bound to 127.0.0.1 (the default), then put nginx in front of it:
server {
listen 443 ssl;
server_name ollama.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/ollama.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ollama.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.3;
location / {
auth_basic "Ollama";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:11434;
proxy_set_header Host $host;
proxy_buffering off;
proxy_read_timeout 300s;
chunked_transfer_encoding on;
}
}
Two settings most guides omit: proxy_buffering off is required for Ollama’s streaming token output to work — with buffering on, responses hang or arrive all at once after a long delay. proxy_read_timeout 300s prevents nginx from killing long inference requests mid-generation.
Create the password file:
sudo apt install apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd your_username
For more on working with Ollama’s API including authentication headers in client requests, see the Ollama REST API complete developer guide.
Cloudflare Tunnel: A Third Option
If you do not want to run Tailscale and do not have a domain for nginx TLS, Cloudflare Tunnel is another zero-port-forwarding option. It creates an outbound-only tunnel from your server to Cloudflare’s edge, and you access Ollama via a Cloudflare-managed URL with optional access policies.
# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
sudo dpkg -i cloudflared.deb
# Authenticate and create a tunnel
cloudflared tunnel login
cloudflared tunnel create ollama-tunnel
Configure the tunnel to proxy localhost:11434, then add a Cloudflare Access policy requiring authentication before requests reach your server. This keeps Ollama on 127.0.0.1, adds MFA-backed access control, and requires no inbound firewall rules.
The trade-off versus Tailscale: Cloudflare Tunnel routes traffic through Cloudflare’s infrastructure (your requests and model outputs pass through their edge), whereas Tailscale’s WireGuard tunnels are peer-to-peer with Cloudflare seeing only encrypted metadata. For home lab use with non-sensitive models, Cloudflare Tunnel is a perfectly reasonable option. For proprietary models or production use, Tailscale’s peer-to-peer architecture is preferable.
Rate Limiting: Stop Inference Abuse Even with Auth
A compromised credential or a runaway client can still drain your GPU even behind basic auth. Rate limiting at the nginx level caps how many inference requests any single IP can make per minute:
http {
limit_req_zone $binary_remote_addr zone=ollama:10m rate=5r/m;
}
server {
location / {
limit_req zone=ollama burst=3 nodelay;
# ... rest of proxy config
}
}
Five requests per minute with a burst of three is a reasonable baseline for a personal instance. If you are running a small team setup, increase the rate and consider keying the zone on $http_authorization instead of $binary_remote_addr to rate-limit per credential rather than per IP.
Ollama Security Checklist
Use this before exposing Ollama to any network beyond localhost:
- ✅ Ollama is NOT binding to 0.0.0.0 on a public VPS — verify with
ss -tlnp | grep 11434 - ✅ Remote access uses Tailscale or nginx with authentication — never raw port 11434 directly to the internet
- ✅ UFW blocks port 11434 on all non-Tailscale interfaces — defence-in-depth even if bind config is wrong
- ✅ Rate limiting is configured on any public-facing proxy
- ✅ TLS is enforced — Tailscale handles this automatically; nginx requires a certificate (use Certbot)
- ✅ Fine-tuned or proprietary models are stored only on trusted servers — not on any machine with public exposure
- ✅ Ollama is updated regularly — CVE-2024-28224 and other vulnerabilities were patched in subsequent releases. See how to update Ollama and its models for the process.
- ✅ Tool-calling access is restricted — if tool use is enabled, treat the API like shell access. Only fully trusted clients should be able to reach a tool-calling-enabled Ollama instance.
If you are running Open WebUI in front of Ollama to provide a chat interface for your team, the proxy security considerations apply there too — see the Open WebUI setup guide for the recommended nginx configuration.






