This is the complete Ollama help centre — a single, regularly updated resource covering everything you need to know about installing, configuring, and getting the most out of Ollama. Whether you’re running your first local language model or optimising inference on a home server, you’ll find what you need here.
Ollama has become the go-to tool for running large language models locally on your own hardware. It handles model downloads, GPU acceleration, and a clean API layer so you can focus on what matters: using AI privately, offline, and without paying per token. This hub brings together every guide, tutorial, and troubleshooting article on this site into one place.
Quick Navigation
- What Is Ollama?
- Getting Started: Installation
- Choosing the Right Model
- Hardware Requirements
- Integrations & API
- Troubleshooting
- Advanced Usage
- Ollama vs the Alternatives
What Is Ollama?
Ollama is an open-source runtime that lets you download and run large language models (LLMs) directly on your own computer — no cloud account, no API key, no internet connection required once a model is downloaded. It wraps model files in a simple command-line interface and exposes a local REST API that mirrors the OpenAI spec, making it easy to plug into existing tools and workflows.
Unlike cloud-based AI services, everything you type stays on your machine. This makes Ollama particularly valuable for businesses handling sensitive data, developers who want to work offline, or anyone who simply doesn’t want their prompts stored on a third-party server. It supports a wide range of model families including Llama 3, Mistral, Gemma, Phi, Qwen, and many more.
Ollama is free, open source, and available for Windows, macOS, and Linux. It handles GPU acceleration automatically when a compatible GPU is detected, falling back to CPU if not.
- What Is Ollama? The Plain-English Guide to Running AI Locally →
- Ollama for Beginners: Complete Getting Started Guide →
- Ollama FAQ: Common Questions Answered →
Getting Started: Installation
Getting Ollama running takes less than five minutes on most systems. The installation process differs slightly between Windows, macOS, and Linux, but the core experience is the same: download the installer or run a single shell command, then pull your first model and start chatting from the terminal or a connected UI.
These guides walk through each platform step by step, including how to verify your installation, run your first model, and confirm GPU acceleration is working correctly. If you’re new to local AI, start with the guide for your operating system and then move on to choosing a model.
- How to Install Ollama on Windows 11 →
- How to Install Ollama on Mac — Apple Silicon and Intel Guide →
- How to Install Ollama on Linux →
- How to Run Ollama in WSL2 (Windows Subsystem for Linux) →
- How to Run Ollama in Docker →
- How to Run Ollama on a Home Server →
- How to Run Ollama on a Raspberry Pi →
Choosing the Right Model
Ollama’s model library includes dozens of open-weight models ranging from lightweight 1B parameter models that run comfortably on 8 GB of RAM to 70B+ models that need a high-end workstation or server. Choosing the right model depends on your use case, your hardware, and how much latency you’re willing to tolerate.
General-purpose chat, coding assistance, summarisation, and document Q&A all have different sweet spots. Smaller models like Phi-3 Mini or Gemma 2 2B punch well above their weight for focused tasks, while larger models like Llama 3.1 70B produce noticeably more capable output if your hardware can support them. The guides below help you match model to machine.
- Best Ollama Models in 2026: Which Should You Run? →
- Best Ollama Models for Coding in 2026 →
- Best Ollama Models for Writing in 2026 →
- How Much RAM Do You Need to Run Ollama Models? →
Hardware Requirements
Ollama can run on surprisingly modest hardware, but performance varies enormously depending on whether you’re using a GPU, how much VRAM or system RAM you have, and which model you’ve loaded. Understanding these constraints upfront saves you from frustrating slowness or failed model loads.
The key metrics to understand are VRAM (for GPU inference), system RAM (for CPU inference or models that overflow VRAM), and storage (model files range from 2 GB to 40+ GB). For serious use, a dedicated GPU with 8 GB or more of VRAM makes a significant difference.
- How Much RAM Do You Need to Run Ollama Models? →
- Best GPUs for Ollama in 2026 →
- How to Run Ollama on a Home Server →
Integrations & API
One of Ollama’s biggest strengths is its local REST API, which is compatible with the OpenAI API spec. This means you can point tools like Open WebUI, Continue (the VS Code extension), LangChain, and dozens of other applications at your local Ollama instance with minimal configuration changes.
The API runs on http://localhost:11434 by default and supports streaming responses, embeddings, and multi-turn conversations. You can also expose it to your local network so other devices can use your Ollama instance without needing their own installation.
- How to Set Up Open WebUI with Ollama →
- Ollama REST API: Complete Developer Guide →
- Using Ollama with VS Code: Continue Setup →
- How to Use the Ollama Python Library →
- How to Use Ollama with LangChain →
- How to Use Ollama for Embeddings and RAG →
- How to Access Ollama Over a Network and Remotely →
- How to Get Structured JSON Output from Ollama →
Troubleshooting
Most Ollama problems fall into a small number of categories: slow inference, GPU not being detected, models refusing to load due to insufficient memory, or connection errors when trying to use the API from another application. These are all solvable, and the guides below cover the most common issues with step-by-step fixes.
- Ollama Running Slow? How to Speed Up Local LLM Inference →
- Ollama GPU Not Detected: How to Fix CUDA and ROCm Errors →
- Ollama Out of Memory Errors: How to Fix Them →
Advanced Usage
Once you’re comfortable with the basics, Ollama offers a range of features that make it a genuinely powerful tool for developers and power users. Modelfiles let you create custom model variants with system prompts, temperature settings, and parameter overrides baked in. You can also run multiple models simultaneously, script interactions via the API, and use Ollama as the inference backend for more complex AI pipelines.
- How to Create Custom Models with Ollama Modelfiles →
- How to Use Multimodal Vision Models with Ollama →
- How to Get Structured JSON Output from Ollama →
Ollama vs the Alternatives
Ollama isn’t the only tool for running LLMs locally. LM Studio, Jan, GPT4All, and llama.cpp are all viable options depending on your workflow. The right choice depends on whether you prioritise a GUI, API compatibility, model format support, or raw performance.
- Ollama vs LM Studio: Which Should You Use in 2026? →
- Ollama vs Jan →
- Ollama vs GPT4All: Which Local AI Tool Should You Use? →
- Ollama vs llama.cpp: Which Should You Use? →
Why Running AI Locally Matters
The case for local AI has never been stronger. Cloud-based LLM APIs are convenient, but they come with real trade-offs: your data leaves your machine, costs scale with usage, and you’re dependent on a third party’s uptime, pricing decisions, and terms of service. Ollama removes all of that. Once a model is downloaded, it runs entirely on your hardware — air-gapped if needed, free to use as heavily as you want, with no data ever leaving your network.
For individuals, this means genuine privacy and zero ongoing cost. For businesses, it means sensitive documents, customer data, and internal knowledge bases can be queried by AI without ever touching external infrastructure. As open-weight models continue to close the capability gap with proprietary cloud models, the calculus is shifting fast. Ollama is the tool that makes local AI practical, and this hub exists to help you use it to its full potential.
This page is updated regularly as new guides are published. Bookmark it as your starting point for everything Ollama.


