Home / AI / Ollama / How to Get Structured JSON Output from Ollama

How to Get Structured JSON Output from Ollama

Ollama

One of the most practical challenges when integrating large language models into real applications is getting output you can actually use programmatically. Free-form prose is fine for chatbots, but when you need to extract data, feed results into a pipeline, or call downstream services, you need structured, parseable output — and that almost always means JSON.

Ollama provides several mechanisms for controlling output format, ranging from a simple flag that nudges the model toward JSON, to full schema-constrained generation that enforces exact field names and types. This guide covers all of them, with working examples you can adapt immediately.

Why Structured Output Matters

When you call json.loads() on a model’s response, you’re making an assumption: that the string is valid JSON. That assumption fails surprisingly often when you rely on instruction-following alone. Models add explanatory prose, wrap JSON in markdown fences, use single quotes instead of double quotes, or trail commas where none are allowed.

For production use cases — data extraction, classification pipelines, tool use, agentic workflows — these failures are not acceptable. You need either constrained decoding (where the model is physically prevented from generating invalid JSON) or at minimum a deterministic way to coerce output into a known schema.

Structured output is also critical for:

  • Data extraction: Pulling named entities, dates, prices, or contact details from unstructured text
  • Classification: Getting a label and confidence score from a model in a consistent format
  • Tool use: Passing structured arguments to functions or APIs
  • Multi-step pipelines: Feeding one model’s output as structured input to the next stage
  • Database population: Writing extracted data directly to rows without manual parsing

Method 1: The format: "json" Flag

The simplest way to request JSON output is to include "format": "json" in your request body. This works with both /api/generate and /api/chat endpoints. It activates grammar-constrained decoding in Ollama, which means the token sampler is restricted to tokens that can legally continue a valid JSON document.

Here is a minimal example using /api/generate:

curl http://localhost:11434/api/generate \\\\\\n  -H "Content-Type: application/json" \\\\\\n  -d '{\\n    "model": "llama3.1",\\n    "prompt": "Extract the person name, email address, and company from this text: Hi, I am Sarah Chen, lead engineer at Vertex Systems. You can reach me at [email protected]",\\n    "format": "json",\\n    "stream": false\\n  }'

And the equivalent using /api/chat with a messages array:

curl http://localhost:11434/api/chat \\\\\\n  -H "Content-Type: application/json" \\\\\\n  -d '{\\n    "model": "llama3.1",\\n    "messages": [\\n      {\\n        "role": "user",\\n        "content": "Extract the person name, email address, and company from this text: Hi, I am Sarah Chen, lead engineer at Vertex Systems. You can reach me at [email protected]. Return JSON only."\\n      }\\n    ],\\n    "format": "json",\\n    "stream": false\\n  }'

The format: "json" flag guarantees syntactically valid JSON but does not enforce a schema. The model decides which keys to include. You still need to validate that the keys you expect are actually present.

Method 2: JSON Schema-Constrained Output

Newer versions of Ollama (0.5.0 and later) support passing a full JSON schema as the format value instead of just the string "json". This tells the constrained decoder exactly which fields to produce, their types, and which are required. The model cannot deviate from the schema.

Here is an example extracting a product record with specific fields:

curl http://localhost:11434/api/generate \\\\\\n  -H "Content-Type: application/json" \\\\\\n  -d '{\\n    "model": "llama3.1",\\n    "prompt": "Extract product details from: The Bosch PSB 18 LI-2 is a cordless drill available for £89.99. It falls under the Power Tools category and has a 18V lithium battery.",\\n    "format": {\\n      "type": "object",\\n      "properties": {\\n        "name": { "type": "string" },\\n        "price": { "type": "number" },\\n        "currency": { "type": "string" },\\n        "category": { "type": "string" }\\n      },\\n      "required": ["name", "price", "currency", "category"]\\n    },\\n    "stream": false\\n  }'

With this approach, the response will always contain exactly those four keys with the correct types. The required array ensures no field is omitted. This is constrained decoding at its most useful: you are defining an output contract that the model must fulfil regardless of what it would otherwise prefer to generate.

Prompting Effectively for JSON

Even with constrained decoding enabled, the quality of your prompt strongly influences output quality. The model still chooses the values — the schema just constrains the structure. A vague prompt produces vague values inside correctly-shaped JSON.

Follow these principles when writing prompts for structured extraction:

  • Name the fields explicitly: Tell the model exactly what to put in each field. “Extract the product name, price as a number without currency symbol, and top-level category” is better than “extract product details”.
  • Include a brief example: Showing the model one example of the expected output in your system prompt dramatically improves consistency, especially for edge cases like nested objects or arrays.
  • Instruct the model to return only JSON: Even with the flag set, adding “Return only valid JSON, no explanation” in the prompt reduces the chance of models that partly ignore the flag adding prose.
  • Handle ambiguity explicitly: Tell the model what to do when a field is missing from the source text: “If the price is not mentioned, use null”.

Using Structured Output with the Python Library

The official Ollama Python library exposes the same format parameter. Install it with pip install ollama and then use it as follows:

import ollama\\nimport json\\n\\nresponse = ollama.chat(\\n    model="llama3.1",\\n    messages=[\\n        {\\n            "role": "user",\\n            "content": "Classify this support ticket and return the result as JSON with fields: category, priority, and summary. Ticket: 'My invoice from last month shows the wrong VAT amount and I need a corrected copy urgently.'"\\n        }\\n    ],\\n    format="json"\\n)\\n\\ndata = json.loads(response["message"]["content"])\\nprint(data["category"])\\nprint(data["priority"])

You can also pass a full schema dict to format in the same way as the REST API. The library serialises it for you.

Using Pydantic Models for Schema-Constrained Output

If you are already using Pydantic in your application — which is common in FastAPI projects — you can generate a JSON schema directly from a Pydantic model and pass it to Ollama. This keeps your data contracts in one place:

import ollama\\nimport json\\nfrom pydantic import BaseModel\\nfrom typing import Optional\\n\\nclass ContactRecord(BaseModel):\\n    name: str\\n    email: str\\n    company: Optional[str] = None\\n    phone: Optional[str] = None\\n\\nschema = ContactRecord.model_json_schema()\\n\\nresponse = ollama.chat(\\n    model="llama3.1",\\n    messages=[\\n        {\\n            "role": "user",\\n            "content": "Extract contact details from: Please get in touch with James Whitfield at [email protected], he works at Whitfield Consulting and can be called on 07700 900123."\\n        }\\n    ],\\n    format=schema\\n)\\n\\nraw = json.loads(response["message"]["content"])\\ncontact = ContactRecord(**raw)\\nprint(contact.name)\\nprint(contact.email)

This pattern is particularly clean: ContactRecord.model_json_schema() generates the schema, Ollama enforces it, and ContactRecord(**raw) gives you a validated Python object with type hints intact.

Model Choice: Which Models Handle JSON Best

Not all models follow format instructions equally well, and this matters even when using constrained decoding, because the values inside the JSON still depend on instruction-following ability.

  • Llama 3.1 and 3.2: Excellent JSON instruction following. The 8B variant is a strong default for structured extraction tasks — fast, accurate, and reliable with schema constraints.
  • Qwen2.5 (7B, 14B): Very strong at structured output, particularly for data extraction from mixed-language or technical content. Follows field-level instructions consistently.
  • Mistral 7B: Good baseline performance. Handles simple schemas reliably but can struggle with nested or conditional schemas.
  • Gemma 2 (9B): Generally reliable with the format flag, though the values it populates can be more literal and less interpretive than Llama models.
  • Older or smaller models (Phi-2, TinyLlama, early Llama 2 variants): These often ignore format instructions even with the flag set. Constrained decoding forces valid JSON syntax but the field values may be nonsensical or hallucinated.

For production pipelines, test your specific model against representative samples before committing. JSON format compliance at the syntax level does not guarantee semantic accuracy.

Handling Failures: Validation and Retry Logic

Even with constrained decoding, you should validate output before using it. Syntactically valid JSON does not mean semantically useful JSON. A field expected to contain a price might contain a string like “not mentioned” rather than a number.

A robust pattern looks like this:

import ollama\\nimport json\\nfrom pydantic import BaseModel, ValidationError\\n\\ndef extract_with_retry(prompt: str, model: str, schema_class, retries: int = 3):\\n    schema = schema_class.model_json_schema()\\n    for attempt in range(retries):\\n        try:\\n            response = ollama.chat(\\n                model=model,\\n                messages=[{"role": "user", "content": prompt}],\\n                format=schema\\n            )\\n            raw = json.loads(response["message"]["content"])\\n            return schema_class(**raw)\\n        except (json.JSONDecodeError, ValidationError) as e:\\n            if attempt == retries - 1:\\n                raise\\n            continue\\n    return None

Key points in this pattern: json.loads() is wrapped in a try/except for JSONDecodeError, and Pydantic validation catches type mismatches or missing required fields. The retry gives the model another chance with the same prompt — useful for intermittent failures, though if a model consistently fails on a schema you should review the prompt rather than just retrying.

Real-World Use Cases

Extracting Contact Information from Email

A common automation task is parsing inbound emails to extract contact details for CRM entry. Pass the email body as the prompt, define a schema with fields like name, email, phone, company, and job_title, and use Llama 3.1 with constrained decoding. The result can be written directly to your CRM API without manual parsing.

Classifying Support Tickets

Define a schema with fields category (an enum: billing, technical, account, other), priority (low, medium, high, urgent), and one_line_summary. Feed ticket text as the prompt. With Qwen2.5 or Llama 3.1, classification accuracy on well-defined categories is high enough for use in automated routing workflows, with human review reserved for low-confidence cases.

Parsing Product Details from Unstructured Descriptions

E-commerce and procurement teams often receive product data in inconsistent formats — supplier PDFs, email attachments, scraped pages. A schema covering product_name, sku, price, unit, manufacturer, and category fed to a structured extraction pipeline can normalise hundreds of records per minute on local hardware, with no API costs.

Limitations and What Constrained Decoding Cannot Fix

It is worth being precise about what constrained decoding actually guarantees. Grammar-based constraints — the mechanism Ollama uses — ensure the output is a valid JSON document that matches the schema’s structural requirements. They do not guarantee:

  • That values are accurate extractions rather than hallucinations
  • That optional fields are populated when the source text contains the relevant information
  • That numeric fields contain meaningful numbers rather than arbitrary values
  • That enum fields contain the most appropriate value when the model is uncertain

Some models also have limited support for complex JSON schemas — deeply nested objects, anyOf, oneOf, or array items with their own schemas can produce unexpected behaviour depending on the underlying grammar implementation in a given Ollama release.

The most reliable approach for production systems is to treat structured output as a strong hint combined with application-level validation, rather than a fully trusted contract. Schema constraints bring you most of the way there; your validation layer handles the rest.

Summary

Ollama gives you two practical levers for structured output: the "format": "json" flag for syntax-guaranteed JSON, and a full JSON schema object for field-constrained output in newer versions. Pair either with effective prompting — naming fields explicitly, providing examples, and handling missing data — and you have a reliable foundation for data extraction, classification, and pipeline integration using locally-run models. Use the Python library with Pydantic for clean, maintainable integration in real applications, and always validate output before acting on it.

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *