Choosing the Right Models

Errand has four model slots, each with different requirements. This page explains what each slot needs, introduces a simple tier system for thinking about model capabilities, and gives you concrete recommendations.

Model Capability Tiers

To make model selection easier, we group models into three capability tiers. These tiers are not official classifications — they are a practical shorthand for thinking about which models are suitable for which tasks.

Tier	What It Means	Current Examples
Frontier	The most capable models available. Best reasoning, most reliable tool use, handles complex multi-step tasks with ease. Higher cost, sometimes slower.	Claude Opus 4, GPT-4.1, Gemini 2.5 Pro
Balanced	Strong all-round performers. Reliable tool use, good reasoning, noticeably faster and more affordable than Frontier models. The sweet spot for most work.	Claude Sonnet 4, GPT-4o, Gemini 2.5 Flash
Efficient	Fast and affordable. Excellent for straightforward tasks like summarisation, classification, and simple text generation. Not suitable for complex multi-step reasoning.	Claude Haiku 4.5, GPT-4o Mini, Gemini 2.0 Flash

Model names change as providers release new versions. The tier a model belongs to matters more than its specific name — when a provider releases a new model, check where it sits in their lineup and map it to the tier above.

The Agent Model

This is the most important model choice you will make. The agent model runs Errand’s core task execution loop: it reads your instructions, plans what to do, calls tools (web search, email, file access, and more), interprets the results, and produces a final output.

What the Agent Needs

The agent model must be strong in several areas:

Tool calling. The agent works by calling external tools — searching the web, sending emails, reading files, managing tasks. The model must reliably generate correctly structured tool calls. This is the single most important capability.
Multi-step reasoning. Most tasks require several steps: research a topic, then draft an email, then send it. The model must maintain context and make good decisions across multiple turns.
Long context window. The agent receives a system prompt, your task description, tool definitions, recalled memories, and the results of every tool call it makes. This adds up quickly. Models with larger context windows handle complex tasks more gracefully.
Instruction following. The agent receives detailed instructions about how to behave, what tools are available, and what format to use for its output. The model must follow these reliably.

Minimum Recommended: Balanced Tier

We recommend using at least a Balanced-tier model for the agent. Here is why:

Models below this level — including most Efficient-tier models and smaller open-source models — often struggle with reliable tool calling. In practice, this means:

Failed tool calls. The model generates tool calls with incorrect parameters, missing fields, or malformed syntax. The agent gets stuck or produces errors instead of results.
Lost context. The model forgets earlier steps in a multi-turn task, repeating work or contradicting its own previous decisions.
Poor judgement. The model makes questionable choices about which tools to use or when to stop, leading to incomplete or irrelevant results.
Wasted time and cost. A task that fails and needs to be re-run with a better model ends up costing more than using the right model in the first place.

Using an Efficient-tier model for the agent is like hiring an intern for a senior role — they might occasionally get it right, but the reliability is not there for work you depend on.

When to Use a Frontier Model

For most everyday tasks — drafting emails, researching topics, managing to-do lists — a Balanced-tier model is excellent. Consider stepping up to a Frontier model when:

Tasks involve complex reasoning across many steps
You need the highest possible quality for important work
Tasks require synthesising information from many sources
You are doing detailed code review or technical writing

You can use Task Profiles to assign a Frontier model to specific types of work without changing your default. For example, set up a “Research” profile that uses a Frontier model, while everyday tasks use your Balanced default.

Trade-Offs

	Frontier	Balanced
Quality	Best available	Very good
Speed	Slower	Faster
Cost	Higher per task	More affordable
Best for	Complex, high-stakes work	Everyday tasks

Most users find that a Balanced-tier model as the default, with a Frontier model available via Task Profiles for demanding work, gives the best overall experience.

The Title Generation Model

When you create a task, Errand automatically generates a short, descriptive title from your task description. This is a single-shot summarisation job — the model reads your description and produces a few words.

Recommended: Efficient Tier

This is the easiest choice. Even the smallest, most affordable models handle short-text summarisation well. There is no benefit to using a more powerful model here — it would produce the same quality title at higher cost and slower speed.

Any Efficient-tier model will do the job. Pick whichever one your provider offers at the lowest cost.

The Hindsight Memory Model

Hindsight is Errand’s persistent memory system. It uses an AI model to:

Extract entities and concepts from your tasks and their results
Form “beliefs” — structured knowledge about your preferences, past decisions, and context
Search and retrieve relevant memories when a new task starts

Recommended: Efficient to Balanced

The Hindsight model needs good language comprehension — it must understand what is important in a piece of text and extract meaning from it. However, it does not need the multi-step reasoning or tool-calling capabilities that the agent requires.

Efficient tier works well for most users. Memories are stored and recalled reliably, and the cost stays low since Hindsight processes text for every task.
Balanced tier can provide richer, more nuanced memory extraction if you find that the Efficient model is missing important context. This is a quality-versus-cost decision you can experiment with.

Start with an Efficient-tier model and upgrade if you notice the agent is not recalling relevant context from previous tasks.

The Transcription Model

If you want to create tasks by speaking instead of typing, Errand can transcribe your voice input into text. This requires a Whisper-compatible transcription model.

Options

Cloud: OpenAI’s Whisper API, accessed through LiteLLM, is the most common choice. Several other providers also offer Whisper-compatible endpoints.
Local: You can run Whisper models locally through Ollama or other local model servers, keeping your audio data entirely on your own hardware.

If you do not plan to use voice input, leave the transcription model unconfigured.

Task Profiles — Per-Task Model Overrides

Errand’s Task Profiles feature lets you assign different models to different types of work. Instead of choosing one model for everything, you can create profiles that automatically match tasks to the right model:

A Research profile might use a Frontier-tier model with access to web search tools
A Quick Reply profile might use an Efficient-tier model for near-instant responses
A Code Review profile might use a Frontier-tier model with access to your Git repository

Profiles give you flexibility without requiring you to change your default settings every time. See the Task Profiles documentation for setup instructions.

Example Configurations

Here are some common setups to help you get started. All models are accessed through LiteLLM regardless of which configuration you choose.

Best Quality

For users who want the highest quality results and are comfortable with the associated cost.

Slot	Tier	Example
Agent	Frontier	Claude Opus 4
Title Generation	Efficient	Claude Haiku 4.5
Hindsight	Balanced	Claude Sonnet 4
Transcription	Whisper	whisper-large-v3

Balanced (Recommended)

The sweet spot for most users — excellent results at a reasonable cost.

Slot	Tier	Example
Agent	Balanced	Claude Sonnet 4
Title Generation	Efficient	Claude Haiku 4.5
Hindsight	Efficient	GPT-4o Mini
Transcription	Whisper	whisper-large-v3

Budget-Conscious

Keeps costs low while still delivering good agent performance.

Slot	Tier	Example
Agent	Balanced	Gemini 2.5 Flash
Title Generation	Efficient	Gemini 2.0 Flash
Hindsight	Efficient	Gemini 2.0 Flash
Transcription	Whisper	whisper-large-v3

Privacy-First

Everything runs on your own hardware. See Running Models Locally for setup.

Slot	Tier	Example
Agent	Balanced (local)	Llama 3.3 70B (via Ollama)
Title Generation	Efficient (local)	Llama 3.2 3B (via Ollama)
Hindsight	Efficient (local)	Llama 3.2 3B (via Ollama)
Transcription	Whisper (local)	whisper-large-v3 (via Ollama)

Note that the Privacy-First configuration requires significant hardware — see the local models guide for requirements.