Home / Server / How to Run Ollama on a Home Server

How to Run Ollama on a Home Server

Ollama AI Featured Image

Learning how to run Ollama on a home server is one of the most practical steps a UK small business can take towards using artificial intelligence without paying monthly subscription fees or sending sensitive data to third-party cloud providers. Ollama is an open-source tool that lets you download and run large language models locally, giving you a private, self-hosted AI assistant that runs entirely on your own hardware.

For IT managers and business owners who want control over their data, Ollama on a home server or repurposed office machine is a genuinely compelling option. This guide walks you through exactly what you need, how to set it up, and what to expect from day-to-day use.


What Is Ollama and Why Should UK Businesses Care?

Ollama is a free, open-source application that allows you to pull and run large language models (LLMs) directly on your own machine. Think of it as a local version of ChatGPT, but one that never leaves your network. You can run models such as Llama 3, Mistral, Gemma 2, Phi-3, and dozens of others, all through a simple command-line interface or via a web front-end.

For UK businesses handling customer data, financial records, or any information subject to GDPR, this is significant. When you run a model locally, your prompts and responses never leave your premises. There is no API key, no usage bill, and no risk of your business data being used to train a commercial model. Once the model is downloaded, it works entirely offline if needed.

Ollama is compatible with Linux, macOS, and Windows. For a home server setup, Linux is the most common and best-supported operating system, with Ubuntu Server or Debian being popular choices. Windows Server environments can also run Ollama natively, which makes it easy to test on an existing machine before committing to a dedicated box.


What Hardware Do You Actually Need?

The hardware requirements for running Ollama depend heavily on which model you want to use. Smaller models like Phi-3 Mini or Gemma 2 2B can run on modest hardware with 8 GB of RAM, while larger models like Llama 3 70B require a machine with 64 GB of RAM or a dedicated GPU with significant VRAM. For most small business use cases, a middle-ground setup is perfectly adequate.

GPU acceleration dramatically improves response times. Ollama supports NVIDIA GPUs via CUDA and AMD GPUs via ROCm. If your server has a consumer-grade NVIDIA card such as an RTX 3060 (12 GB VRAM) or better, you will get fast, near-instant responses from 7B parameter models. Without a GPU, Ollama falls back to CPU-only inference, which works but is considerably slower, especially for longer responses.

Use CaseRecommended RAMGPU VRAMSuitable Models
Light testing / drafting8 GBNot requiredPhi-3 Mini, Gemma 2 2B
Day-to-day business tasks16 GB6-8 GBMistral 7B, Llama 3 8B
Complex reasoning / coding32 GB12-16 GBLlama 3 70B (quantised)
High-performance local AI64 GB+24 GB+Llama 3 70B (full), DeepSeek

A repurposed desktop PC or a small form-factor machine such as a mini PC running an Intel Core i7 or AMD Ryzen 7 with 32 GB of RAM and an RTX 3060 will handle most SME workloads comfortably. You can source used hardware from UK suppliers on eBay, Scan, or Overclockers, often for well under £600 for a capable setup. If you are already running a NAS for file storage, check whether it has the CPU headroom to run Ollama in CPU-only mode for lighter tasks.


Step-by-Step: Installing Ollama on a Linux Home Server

The installation process on Linux is remarkably straightforward. Ollama provides a one-line install script that handles everything automatically, including CUDA detection if you have an NVIDIA GPU installed. Before starting, make sure your server is running a supported Linux distribution such as Ubuntu 22.04 LTS or Debian 12, that you have sudo access, and that your NVIDIA drivers are installed if applicable.

Open a terminal and run the following command to download and install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Once the installation completes, Ollama runs as a background service. You can verify it is active by running systemctl status ollama. To pull your first model, use the ollama pull command followed by the model name. For example:

  • ollama pull mistral downloads the Mistral 7B model (around 4 GB)
  • ollama pull llama3 downloads Meta’s Llama 3 8B model (around 4.7 GB)
  • ollama pull phi3 downloads Microsoft’s Phi-3 Mini (around 2.3 GB)
  • ollama pull gemma2 downloads Google’s Gemma 2 9B model (around 5.4 GB)

Once a model is downloaded, you can chat with it directly from the terminal using ollama run mistral, replacing the model name as appropriate. Responses are streamed token by token, just as you would see with an online AI tool. To exit, type /bye and press Enter. Models are stored locally in ~/.ollama/models and do not need to be re-downloaded unless you delete them.


Installing Ollama on Windows Server or Windows 11

If your home server runs Windows, the setup is equally simple. Visit the official Ollama website at ollama.com and download the Windows installer. Run the .exe file and Ollama will install as a background service. Once installed, open a Command Prompt or PowerShell window and use the same ollama pull and ollama run commands as on Linux.

Windows GPU support requires an NVIDIA GPU with up-to-date drivers and CUDA installed. AMD GPU support on Windows is more limited at the time of writing, so CPU-only inference may be your starting point on AMD hardware in a Windows environment. Response times on CPU are acceptable for short prompts but can feel slow for longer generation tasks. If you want to test a quick network diagnostic using PowerShell while setting things up, our guide on how to test if a network port is open using PowerShell is useful for verifying Ollama’s API port (11434) is accessible on your local network.

One important note for Windows users: Ollama’s API listens on localhost by default. If you want other devices on your network to access the model (for example, to connect a web interface from another PC), you will need to set the environment variable OLLAMA_HOST=0.0.0.0 before starting the service. On Windows, you can do this through System Properties, then Environment Variables, then add a new system variable with that name and value.


Adding a Web Interface with Open WebUI

Running Ollama from the terminal is fine for testing, but most business users will want a proper chat interface. Open WebUI (formerly Ollama WebUI) is the most popular front-end for Ollama and provides a clean, ChatGPT-style browser interface. It supports multiple models, conversation history, system prompts, and even document upload for context-aware responses.

The easiest way to run Open WebUI alongside Ollama is using Docker. If you do not have Docker installed, run sudo apt install docker.io on Ubuntu. Then pull and launch Open WebUI with the following command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Once running, navigate to http://your-server-ip:3000 from any browser on your network. You will be prompted to create an admin account on first launch. From there, you can select any model you have downloaded via Ollama and begin chatting immediately. Open WebUI also supports connecting to external APIs such as OpenAI if you ever want to blend local and cloud-based models within the same interface.

This kind of self-hosted AI setup pairs well with a broader strategy of keeping business infrastructure on-premises. If you are also evaluating whether to keep servers local or move to the cloud more broadly, our comparison of on-prem vs hosted servers after end of life covers the key considerations for UK businesses in detail.


Practical Business Use Cases for Ollama

Once Ollama is running, the question becomes what to actually use it for. The answer depends on your business, but there are several tasks that translate well to a locally-hosted LLM without needing a high-end model or GPU. The key advantage over cloud tools is that you can feed in genuinely sensitive information such as customer records, financial summaries, or internal documents without any risk of that data leaving your network.

  • Drafting and proofreading internal documents, quotes, and proposals
  • Summarising meeting notes or lengthy email threads
  • Writing and reviewing Python or PowerShell scripts for internal automation
  • Answering staff questions from a custom knowledge base using document context
  • Generating product descriptions, social media posts, or marketing copy drafts
  • Analysing sales data or customer feedback when pasted directly into the chat

For UK wholesalers, builders merchants, and trade businesses in particular, a local AI tool that can be trained with company-specific prompts and context can replicate many of the benefits described in guides covering AI for wholesale distribution, without the subscription cost or the data privacy trade-off of using a hosted platform.

It is worth being realistic about limitations too. Locally-hosted models on modest hardware will be slower than commercial APIs and may produce less polished output than GPT-4 class models on complex tasks. However, for the vast majority of everyday business writing, summarisation, and coding assistance, a well-configured Mistral or Llama 3 instance on a home server performs remarkably well and is available around the clock at zero ongoing cost.


Security and Network Considerations

Running a service on your local network requires some basic security housekeeping. By default, Ollama and Open WebUI are not intended to be exposed directly to the public internet. If you want to access your local AI from outside your office or home, use a VPN rather than opening ports on your router. A simple WireGuard VPN tunnel is the recommended approach for remote access, keeping the service entirely private.

Open WebUI has built-in authentication, which means staff accounts can be created with individual logins and access can be restricted. However, you should still ensure the server running Ollama is kept up to date with operating system patches, particularly if it is on the same network as other business systems. Separating your AI server onto a VLAN is a sensible precaution in more security-conscious environments.

Model files themselves are large but inert. They do not execute code independently and do not phone home once downloaded. Ollama does check for updates when online, but all inference happens entirely locally. This makes the privacy model genuinely strong compared to any cloud-based AI tool, and the data sovereignty argument is particularly relevant for UK businesses operating under UK GDPR obligations.


Key Takeaways

  • Ollama lets you run large language models locally on your own server hardware with no ongoing subscription cost
  • Installation takes minutes on Linux or Windows using the official installer or install script
  • A mid-range PC with 16 GB RAM and a modern GPU handles most small business AI tasks effectively
  • Open WebUI provides a polished browser-based chat interface that works across your whole local network
  • All prompts and responses stay on your server, making this ideal for GDPR-conscious UK businesses
  • Do not expose Ollama directly to the internet; use a VPN for remote access
  • Smaller models like Mistral 7B and Llama 3 8B deliver strong results for drafting, summarising, and coding tasks
  • GPU acceleration via NVIDIA CUDA gives the fastest response times, but CPU-only inference is perfectly usable for lighter workloads


Frequently Asked Questions

Does Ollama work on Windows as well as Linux?

Yes. Ollama has a native Windows installer available from ollama.com and works on Windows 10 and Windows 11. It also runs on Windows Server environments. GPU acceleration works with NVIDIA cards on Windows using CUDA. AMD GPU support on Windows is more limited compared to Linux, so CPU-only inference may be the starting point for AMD hardware in a Windows environment.

Can I run Ollama without a GPU?

Yes, Ollama falls back to CPU inference automatically if no compatible GPU is detected. Smaller models such as Phi-3 Mini and Gemma 2 2B are quite usable on CPU-only hardware with 8 to 16 GB of RAM, though response times will be slower than with GPU acceleration. For a dedicated business server where speed matters, a GPU is recommended but not essential for getting started.

Most open-source models available through Ollama have licences that permit commercial use, but the terms vary. Meta’s Llama 3 is available under a community licence that allows commercial use for most businesses, though organisations with over 700 million monthly active users face additional conditions. Mistral models are generally released under Apache 2.0, which permits broad commercial use. Always check the specific licence for any model you intend to use in a production business context.

How much storage space do the models require?

Model size varies significantly. Smaller models like Phi-3 Mini require around 2 to 3 GB of storage, while mid-range 7B and 8B models such as Mistral and Llama 3 require approximately 4 to 5 GB. Larger models such as Llama 3 70B in a quantised format can require 40 GB or more. For a home server running two or three models for everyday business use, a 500 GB SSD provides comfortable headroom alongside the operating system and other applications.

Can multiple users on my network access Ollama at the same time?

Yes. When Ollama is configured to listen on your local network address (by setting OLLAMA_HOST=0.0.0.0) and Open WebUI is deployed as the front-end, multiple staff members can access it simultaneously through their browsers. Performance will depend on your server’s hardware, as each concurrent conversation draws on CPU or GPU resources. For small teams of two to five users with a mid-range GPU server, concurrent use is generally very manageable.



Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]