What Is an Ollama Modelfile?
A Modelfile is a plain-text configuration file that defines how Ollama should build or customise a model. It works like a Dockerfile — you start from a base model, then layer instructions on top to change its behaviour, system prompt, temperature, context window, and more.
Modelfiles let you create persistent, reusable model variants without touching the underlying weights. Save a Modelfile once and you can recreate that exact configuration on any machine running Ollama.
Modelfile Syntax Overview
A Modelfile is a plain text file — conventionally named Modelfile (no extension). Each line starts with an instruction keyword followed by its value.
FROM llama3.2
SYSTEM "You are a helpful assistant who always responds concisely."
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
The only required instruction is FROM. Everything else is optional.
FROM — Choose Your Base Model
FROM sets the base model. You can reference:
- A model name:
FROM llama3.2(must already be pulled) - A model with tag:
FROM llama3.2:3b - A local GGUF file:
FROM /path/to/model.gguf
# Start from the 3B parameter llama3.2 model
FROM llama3.2:3b
SYSTEM — Set a System Prompt
SYSTEM sets the system prompt that shapes the model’s behaviour at the start of every conversation. Use it to define a persona, restrict topics, or enforce output formats.
SYSTEM """
You are a senior Linux systems administrator.
Answer only questions about Linux, bash scripting, and server management.
Be concise. Show commands in code blocks.
"""
Use triple quotes for multi-line system prompts.
PARAMETER — Tune Model Behaviour
The PARAMETER instruction lets you override the model’s default inference settings. Common parameters:
temperature
Controls randomness. Lower = more deterministic, higher = more creative. Default is usually 0.8.
PARAMETER temperature 0.3 # More focused, less random
PARAMETER temperature 1.2 # More creative, more varied
num_ctx
Context window size in tokens. How much conversation history the model can “see” at once. Default is 2048 for most models.
PARAMETER num_ctx 8192 # Increase context window to 8k tokens
top_p and top_k
Fine-tune token sampling. top_p limits sampling to tokens whose cumulative probability reaches this threshold. top_k limits to the top K most likely tokens.
PARAMETER top_p 0.9
PARAMETER top_k 40
repeat_penalty
Penalises the model for repeating tokens it has already used. Useful if the model gets stuck in repetitive loops.
PARAMETER repeat_penalty 1.1
num_predict
Maximum number of tokens to generate per response. -1 means unlimited.
PARAMETER num_predict 512
TEMPLATE — Define the Prompt Format
TEMPLATE lets you override the prompt template used to format messages before they’re sent to the model. Most of the time you won’t need this — Ollama automatically applies the correct chat template from the model’s metadata. Use it only if you’re loading a raw GGUF that doesn’t embed template information.
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
"""
MESSAGE — Pre-seed Conversation History
MESSAGE lets you inject example messages into the model’s context before the first user message. This is useful for few-shot prompting — showing the model examples of how you want it to respond.
MESSAGE user "What is 2 + 2?"
MESSAGE assistant "4"
MESSAGE user "What is 10 divided by 2?"
MESSAGE assistant "5"
ADAPTER — Add LoRA Adapters
If you have a LoRA adapter trained on top of a base model, you can apply it with ADAPTER:
FROM llama3.2
ADAPTER /path/to/adapter.gguf
The adapter weights are merged on top of the base model at creation time.
Building a Model from a Modelfile
Once your Modelfile is written, build it into a named model with ollama create:
ollama create my-linux-expert -f ./Modelfile
This registers the model locally. You can then run it like any other model:
ollama run my-linux-expert
To see all your local models including custom ones:
ollama list
Inspecting an Existing Model’s Modelfile
You can see the Modelfile used to build any model — including the official ones — with ollama show:
ollama show llama3.2 --modelfile
This is handy for understanding what system prompt and parameters a model was built with, and as a starting point for your own customisation.
Practical Example: A Concise Code Assistant
Here’s a complete Modelfile for a code-focused assistant that skips lengthy explanations and gets straight to the answer:
FROM codellama:7b
SYSTEM """
You are an expert software engineer. When asked coding questions:
- Respond with working code first, explanation second
- Keep explanations brief and technical
- Always use proper code blocks with language tags
- If a question is ambiguous, make a reasonable assumption and state it
"""
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.05
Build and run it:
ollama create code-assistant -f ./Modelfile
ollama run code-assistant
Practical Example: A Strict JSON Output Model
For applications that need structured output, you can instruct the model to always respond in JSON:
FROM llama3.2
SYSTEM """
You are a data extraction assistant. You ALWAYS respond with valid JSON only.
Never include any text outside the JSON object.
If you cannot extract the requested data, return {"error": "reason"}.
"""
PARAMETER temperature 0.0
PARAMETER num_predict 1024
Sharing and Pushing Models to Ollama.com
If you have an account on ollama.com, you can push your custom models to share them with others:
# Tag the model with your username
ollama cp my-linux-expert yourusername/linux-expert
# Push to ollama.com
ollama push yourusername/linux-expert
Others can then pull and run it with ollama run yourusername/linux-expert.
Modelfile Quick Reference
| Instruction | Required? | Purpose |
|---|---|---|
FROM |
Yes | Base model or GGUF path |
SYSTEM |
No | System prompt / persona |
PARAMETER |
No | Inference settings (temperature, ctx, etc.) |
TEMPLATE |
No | Override prompt format template |
MESSAGE |
No | Pre-seed conversation with examples |
ADAPTER |
No | Apply LoRA adapter weights |


