Home / AI / Ollama / How to Create Custom Ollama Models with Modelfiles

How to Create Custom Ollama Models with Modelfiles

Ollama

What Is an Ollama Modelfile?

A Modelfile is a plain-text configuration file that defines how Ollama should build or customise a model. It works like a Dockerfile — you start from a base model, then layer instructions on top to change its behaviour, system prompt, temperature, context window, and more.

Modelfiles let you create persistent, reusable model variants without touching the underlying weights. Save a Modelfile once and you can recreate that exact configuration on any machine running Ollama.

Modelfile Syntax Overview

A Modelfile is a plain text file — conventionally named Modelfile (no extension). Each line starts with an instruction keyword followed by its value.

FROM llama3.2
SYSTEM "You are a helpful assistant who always responds concisely."
PARAMETER temperature 0.7
PARAMETER num_ctx 4096

The only required instruction is FROM. Everything else is optional.

FROM — Choose Your Base Model

FROM sets the base model. You can reference:

  • A model name: FROM llama3.2 (must already be pulled)
  • A model with tag: FROM llama3.2:3b
  • A local GGUF file: FROM /path/to/model.gguf
# Start from the 3B parameter llama3.2 model
FROM llama3.2:3b

SYSTEM — Set a System Prompt

SYSTEM sets the system prompt that shapes the model’s behaviour at the start of every conversation. Use it to define a persona, restrict topics, or enforce output formats.

SYSTEM """
You are a senior Linux systems administrator. 
Answer only questions about Linux, bash scripting, and server management.
Be concise. Show commands in code blocks.
"""

Use triple quotes for multi-line system prompts.

PARAMETER — Tune Model Behaviour

The PARAMETER instruction lets you override the model’s default inference settings. Common parameters:

temperature

Controls randomness. Lower = more deterministic, higher = more creative. Default is usually 0.8.

PARAMETER temperature 0.3   # More focused, less random
PARAMETER temperature 1.2   # More creative, more varied

num_ctx

Context window size in tokens. How much conversation history the model can “see” at once. Default is 2048 for most models.

PARAMETER num_ctx 8192   # Increase context window to 8k tokens

top_p and top_k

Fine-tune token sampling. top_p limits sampling to tokens whose cumulative probability reaches this threshold. top_k limits to the top K most likely tokens.

PARAMETER top_p 0.9
PARAMETER top_k 40

repeat_penalty

Penalises the model for repeating tokens it has already used. Useful if the model gets stuck in repetitive loops.

PARAMETER repeat_penalty 1.1

num_predict

Maximum number of tokens to generate per response. -1 means unlimited.

PARAMETER num_predict 512

TEMPLATE — Define the Prompt Format

TEMPLATE lets you override the prompt template used to format messages before they’re sent to the model. Most of the time you won’t need this — Ollama automatically applies the correct chat template from the model’s metadata. Use it only if you’re loading a raw GGUF that doesn’t embed template information.

TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
"""

MESSAGE — Pre-seed Conversation History

MESSAGE lets you inject example messages into the model’s context before the first user message. This is useful for few-shot prompting — showing the model examples of how you want it to respond.

MESSAGE user "What is 2 + 2?"
MESSAGE assistant "4"
MESSAGE user "What is 10 divided by 2?"
MESSAGE assistant "5"

ADAPTER — Add LoRA Adapters

If you have a LoRA adapter trained on top of a base model, you can apply it with ADAPTER:

FROM llama3.2
ADAPTER /path/to/adapter.gguf

The adapter weights are merged on top of the base model at creation time.

Building a Model from a Modelfile

Once your Modelfile is written, build it into a named model with ollama create:

ollama create my-linux-expert -f ./Modelfile

This registers the model locally. You can then run it like any other model:

ollama run my-linux-expert

To see all your local models including custom ones:

ollama list

Inspecting an Existing Model’s Modelfile

You can see the Modelfile used to build any model — including the official ones — with ollama show:

ollama show llama3.2 --modelfile

This is handy for understanding what system prompt and parameters a model was built with, and as a starting point for your own customisation.

Practical Example: A Concise Code Assistant

Here’s a complete Modelfile for a code-focused assistant that skips lengthy explanations and gets straight to the answer:

FROM codellama:7b

SYSTEM """
You are an expert software engineer. When asked coding questions:
- Respond with working code first, explanation second
- Keep explanations brief and technical
- Always use proper code blocks with language tags
- If a question is ambiguous, make a reasonable assumption and state it
"""

PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.05

Build and run it:

ollama create code-assistant -f ./Modelfile
ollama run code-assistant

Practical Example: A Strict JSON Output Model

For applications that need structured output, you can instruct the model to always respond in JSON:

FROM llama3.2

SYSTEM """
You are a data extraction assistant. You ALWAYS respond with valid JSON only.
Never include any text outside the JSON object.
If you cannot extract the requested data, return {"error": "reason"}.
"""

PARAMETER temperature 0.0
PARAMETER num_predict 1024

Sharing and Pushing Models to Ollama.com

If you have an account on ollama.com, you can push your custom models to share them with others:

# Tag the model with your username
ollama cp my-linux-expert yourusername/linux-expert

# Push to ollama.com
ollama push yourusername/linux-expert

Others can then pull and run it with ollama run yourusername/linux-expert.

Modelfile Quick Reference

Instruction Required? Purpose
FROM Yes Base model or GGUF path
SYSTEM No System prompt / persona
PARAMETER No Inference settings (temperature, ctx, etc.)
TEMPLATE No Override prompt format template
MESSAGE No Pre-seed conversation with examples
ADAPTER No Apply LoRA adapter weights

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *