Choosing the Right Models
Errand has four model slots, each with different requirements. This page explains what each slot needs, introduces a simple tier system for thinking about model capabilities, and gives you concrete recommendations.
Model Capability Tiers
Section titled “Model Capability Tiers”To make model selection easier, we group models into three capability tiers. These tiers are not official classifications — they are a practical shorthand for thinking about which models are suitable for which tasks.
| Tier | What It Means | Current Examples |
|---|---|---|
| Frontier | The most capable models available. Best reasoning, most reliable tool use, handles complex multi-step tasks with ease. Higher cost, sometimes slower. | Claude Opus 4, GPT-4.1, Gemini 2.5 Pro |
| Balanced | Strong all-round performers. Reliable tool use, good reasoning, noticeably faster and more affordable than Frontier models. The sweet spot for most work. | Claude Sonnet 4, GPT-4o, Gemini 2.5 Flash |
| Efficient | Fast and affordable. Excellent for straightforward tasks like summarisation, classification, and simple text generation. Not suitable for complex multi-step reasoning. | Claude Haiku 4.5, GPT-4o Mini, Gemini 2.0 Flash |
Model names change as providers release new versions. The tier a model belongs to matters more than its specific name — when a provider releases a new model, check where it sits in their lineup and map it to the tier above.
The Agent Model
Section titled “The Agent Model”This is the most important model choice you will make. The agent model runs Errand’s core task execution loop: it reads your instructions, plans what to do, calls tools (web search, email, file access, and more), interprets the results, and produces a final output.
What the Agent Needs
Section titled “What the Agent Needs”The agent model must be strong in several areas:
- Tool calling. The agent works by calling external tools — searching the web, sending emails, reading files, managing tasks. The model must reliably generate correctly structured tool calls. This is the single most important capability.
- Multi-step reasoning. Most tasks require several steps: research a topic, then draft an email, then send it. The model must maintain context and make good decisions across multiple turns.
- Long context window. The agent receives a system prompt, your task description, tool definitions, recalled memories, and the results of every tool call it makes. This adds up quickly. Models with larger context windows handle complex tasks more gracefully.
- Instruction following. The agent receives detailed instructions about how to behave, what tools are available, and what format to use for its output. The model must follow these reliably.
Minimum Recommended: Balanced Tier
Section titled “Minimum Recommended: Balanced Tier”We recommend using at least a Balanced-tier model for the agent. Here is why:
Models below this level — including most Efficient-tier models and smaller open-source models — often struggle with reliable tool calling. In practice, this means:
- Failed tool calls. The model generates tool calls with incorrect parameters, missing fields, or malformed syntax. The agent gets stuck or produces errors instead of results.
- Lost context. The model forgets earlier steps in a multi-turn task, repeating work or contradicting its own previous decisions.
- Poor judgement. The model makes questionable choices about which tools to use or when to stop, leading to incomplete or irrelevant results.
- Wasted time and cost. A task that fails and needs to be re-run with a better model ends up costing more than using the right model in the first place.
Using an Efficient-tier model for the agent is like hiring an intern for a senior role — they might occasionally get it right, but the reliability is not there for work you depend on.
When to Use a Frontier Model
Section titled “When to Use a Frontier Model”For most everyday tasks — drafting emails, researching topics, managing to-do lists — a Balanced-tier model is excellent. Consider stepping up to a Frontier model when:
- Tasks involve complex reasoning across many steps
- You need the highest possible quality for important work
- Tasks require synthesising information from many sources
- You are doing detailed code review or technical writing
You can use Task Profiles to assign a Frontier model to specific types of work without changing your default. For example, set up a “Research” profile that uses a Frontier model, while everyday tasks use your Balanced default.
Trade-Offs
Section titled “Trade-Offs”| Frontier | Balanced | |
|---|---|---|
| Quality | Best available | Very good |
| Speed | Slower | Faster |
| Cost | Higher per task | More affordable |
| Best for | Complex, high-stakes work | Everyday tasks |
Most users find that a Balanced-tier model as the default, with a Frontier model available via Task Profiles for demanding work, gives the best overall experience.
The Title Generation Model
Section titled “The Title Generation Model”When you create a task, Errand automatically generates a short, descriptive title from your task description. This is a single-shot summarisation job — the model reads your description and produces a few words.
Recommended: Efficient Tier
Section titled “Recommended: Efficient Tier”This is the easiest choice. Even the smallest, most affordable models handle short-text summarisation well. There is no benefit to using a more powerful model here — it would produce the same quality title at higher cost and slower speed.
Any Efficient-tier model will do the job. Pick whichever one your provider offers at the lowest cost.
The Hindsight Memory Model
Section titled “The Hindsight Memory Model”Hindsight is Errand’s persistent memory system. It uses an AI model to:
- Extract entities and concepts from your tasks and their results
- Form “beliefs” — structured knowledge about your preferences, past decisions, and context
- Search and retrieve relevant memories when a new task starts
Recommended: Efficient to Balanced
Section titled “Recommended: Efficient to Balanced”The Hindsight model needs good language comprehension — it must understand what is important in a piece of text and extract meaning from it. However, it does not need the multi-step reasoning or tool-calling capabilities that the agent requires.
- Efficient tier works well for most users. Memories are stored and recalled reliably, and the cost stays low since Hindsight processes text for every task.
- Balanced tier can provide richer, more nuanced memory extraction if you find that the Efficient model is missing important context. This is a quality-versus-cost decision you can experiment with.
Start with an Efficient-tier model and upgrade if you notice the agent is not recalling relevant context from previous tasks.
The Transcription Model
Section titled “The Transcription Model”If you want to create tasks by speaking instead of typing, Errand can transcribe your voice input into text. This requires a Whisper-compatible transcription model.
Options
Section titled “Options”- Cloud: OpenAI’s Whisper API, accessed through LiteLLM, is the most common choice. Several other providers also offer Whisper-compatible endpoints.
- Local: You can run Whisper models locally through Ollama or other local model servers, keeping your audio data entirely on your own hardware.
If you do not plan to use voice input, leave the transcription model unconfigured.
Task Profiles — Per-Task Model Overrides
Section titled “Task Profiles — Per-Task Model Overrides”Errand’s Task Profiles feature lets you assign different models to different types of work. Instead of choosing one model for everything, you can create profiles that automatically match tasks to the right model:
- A Research profile might use a Frontier-tier model with access to web search tools
- A Quick Reply profile might use an Efficient-tier model for near-instant responses
- A Code Review profile might use a Frontier-tier model with access to your Git repository
Profiles give you flexibility without requiring you to change your default settings every time. See the Task Profiles documentation for setup instructions.
Example Configurations
Section titled “Example Configurations”Here are some common setups to help you get started. All models are accessed through LiteLLM regardless of which configuration you choose.
Best Quality
Section titled “Best Quality”For users who want the highest quality results and are comfortable with the associated cost.
| Slot | Tier | Example |
|---|---|---|
| Agent | Frontier | Claude Opus 4 |
| Title Generation | Efficient | Claude Haiku 4.5 |
| Hindsight | Balanced | Claude Sonnet 4 |
| Transcription | Whisper | whisper-large-v3 |
Balanced (Recommended)
Section titled “Balanced (Recommended)”The sweet spot for most users — excellent results at a reasonable cost.
| Slot | Tier | Example |
|---|---|---|
| Agent | Balanced | Claude Sonnet 4 |
| Title Generation | Efficient | Claude Haiku 4.5 |
| Hindsight | Efficient | GPT-4o Mini |
| Transcription | Whisper | whisper-large-v3 |
Budget-Conscious
Section titled “Budget-Conscious”Keeps costs low while still delivering good agent performance.
| Slot | Tier | Example |
|---|---|---|
| Agent | Balanced | Gemini 2.5 Flash |
| Title Generation | Efficient | Gemini 2.0 Flash |
| Hindsight | Efficient | Gemini 2.0 Flash |
| Transcription | Whisper | whisper-large-v3 |
Privacy-First
Section titled “Privacy-First”Everything runs on your own hardware. See Running Models Locally for setup.
| Slot | Tier | Example |
|---|---|---|
| Agent | Balanced (local) | Llama 3.3 70B (via Ollama) |
| Title Generation | Efficient (local) | Llama 3.2 3B (via Ollama) |
| Hindsight | Efficient (local) | Llama 3.2 3B (via Ollama) |
| Transcription | Whisper (local) | whisper-large-v3 (via Ollama) |
Note that the Privacy-First configuration requires significant hardware — see the local models guide for requirements.