Skip to content

Choosing the Right Models

Errand has four model slots, each with different requirements. This page explains what each slot needs, introduces a simple tier system for thinking about model capabilities, and gives you concrete recommendations.

To make model selection easier, we group models into three capability tiers. These tiers are not official classifications — they are a practical shorthand for thinking about which models are suitable for which tasks.

TierWhat It MeansCurrent Examples
FrontierThe most capable models available. Best reasoning, most reliable tool use, handles complex multi-step tasks with ease. Higher cost, sometimes slower.Claude Opus 4, GPT-4.1, Gemini 2.5 Pro
BalancedStrong all-round performers. Reliable tool use, good reasoning, noticeably faster and more affordable than Frontier models. The sweet spot for most work.Claude Sonnet 4, GPT-4o, Gemini 2.5 Flash
EfficientFast and affordable. Excellent for straightforward tasks like summarisation, classification, and simple text generation. Not suitable for complex multi-step reasoning.Claude Haiku 4.5, GPT-4o Mini, Gemini 2.0 Flash

Model names change as providers release new versions. The tier a model belongs to matters more than its specific name — when a provider releases a new model, check where it sits in their lineup and map it to the tier above.

This is the most important model choice you will make. The agent model runs Errand’s core task execution loop: it reads your instructions, plans what to do, calls tools (web search, email, file access, and more), interprets the results, and produces a final output.

The agent model must be strong in several areas:

  • Tool calling. The agent works by calling external tools — searching the web, sending emails, reading files, managing tasks. The model must reliably generate correctly structured tool calls. This is the single most important capability.
  • Multi-step reasoning. Most tasks require several steps: research a topic, then draft an email, then send it. The model must maintain context and make good decisions across multiple turns.
  • Long context window. The agent receives a system prompt, your task description, tool definitions, recalled memories, and the results of every tool call it makes. This adds up quickly. Models with larger context windows handle complex tasks more gracefully.
  • Instruction following. The agent receives detailed instructions about how to behave, what tools are available, and what format to use for its output. The model must follow these reliably.

We recommend using at least a Balanced-tier model for the agent. Here is why:

Models below this level — including most Efficient-tier models and smaller open-source models — often struggle with reliable tool calling. In practice, this means:

  • Failed tool calls. The model generates tool calls with incorrect parameters, missing fields, or malformed syntax. The agent gets stuck or produces errors instead of results.
  • Lost context. The model forgets earlier steps in a multi-turn task, repeating work or contradicting its own previous decisions.
  • Poor judgement. The model makes questionable choices about which tools to use or when to stop, leading to incomplete or irrelevant results.
  • Wasted time and cost. A task that fails and needs to be re-run with a better model ends up costing more than using the right model in the first place.

Using an Efficient-tier model for the agent is like hiring an intern for a senior role — they might occasionally get it right, but the reliability is not there for work you depend on.

For most everyday tasks — drafting emails, researching topics, managing to-do lists — a Balanced-tier model is excellent. Consider stepping up to a Frontier model when:

  • Tasks involve complex reasoning across many steps
  • You need the highest possible quality for important work
  • Tasks require synthesising information from many sources
  • You are doing detailed code review or technical writing

You can use Task Profiles to assign a Frontier model to specific types of work without changing your default. For example, set up a “Research” profile that uses a Frontier model, while everyday tasks use your Balanced default.

FrontierBalanced
QualityBest availableVery good
SpeedSlowerFaster
CostHigher per taskMore affordable
Best forComplex, high-stakes workEveryday tasks

Most users find that a Balanced-tier model as the default, with a Frontier model available via Task Profiles for demanding work, gives the best overall experience.

When you create a task, Errand automatically generates a short, descriptive title from your task description. This is a single-shot summarisation job — the model reads your description and produces a few words.

This is the easiest choice. Even the smallest, most affordable models handle short-text summarisation well. There is no benefit to using a more powerful model here — it would produce the same quality title at higher cost and slower speed.

Any Efficient-tier model will do the job. Pick whichever one your provider offers at the lowest cost.

Hindsight is Errand’s persistent memory system. It uses an AI model to:

  • Extract entities and concepts from your tasks and their results
  • Form “beliefs” — structured knowledge about your preferences, past decisions, and context
  • Search and retrieve relevant memories when a new task starts

The Hindsight model needs good language comprehension — it must understand what is important in a piece of text and extract meaning from it. However, it does not need the multi-step reasoning or tool-calling capabilities that the agent requires.

  • Efficient tier works well for most users. Memories are stored and recalled reliably, and the cost stays low since Hindsight processes text for every task.
  • Balanced tier can provide richer, more nuanced memory extraction if you find that the Efficient model is missing important context. This is a quality-versus-cost decision you can experiment with.

Start with an Efficient-tier model and upgrade if you notice the agent is not recalling relevant context from previous tasks.

If you want to create tasks by speaking instead of typing, Errand can transcribe your voice input into text. This requires a Whisper-compatible transcription model.

  • Cloud: OpenAI’s Whisper API, accessed through LiteLLM, is the most common choice. Several other providers also offer Whisper-compatible endpoints.
  • Local: You can run Whisper models locally through Ollama or other local model servers, keeping your audio data entirely on your own hardware.

If you do not plan to use voice input, leave the transcription model unconfigured.

Task Profiles — Per-Task Model Overrides

Section titled “Task Profiles — Per-Task Model Overrides”

Errand’s Task Profiles feature lets you assign different models to different types of work. Instead of choosing one model for everything, you can create profiles that automatically match tasks to the right model:

  • A Research profile might use a Frontier-tier model with access to web search tools
  • A Quick Reply profile might use an Efficient-tier model for near-instant responses
  • A Code Review profile might use a Frontier-tier model with access to your Git repository

Profiles give you flexibility without requiring you to change your default settings every time. See the Task Profiles documentation for setup instructions.

Here are some common setups to help you get started. All models are accessed through LiteLLM regardless of which configuration you choose.

For users who want the highest quality results and are comfortable with the associated cost.

SlotTierExample
AgentFrontierClaude Opus 4
Title GenerationEfficientClaude Haiku 4.5
HindsightBalancedClaude Sonnet 4
TranscriptionWhisperwhisper-large-v3

The sweet spot for most users — excellent results at a reasonable cost.

SlotTierExample
AgentBalancedClaude Sonnet 4
Title GenerationEfficientClaude Haiku 4.5
HindsightEfficientGPT-4o Mini
TranscriptionWhisperwhisper-large-v3

Keeps costs low while still delivering good agent performance.

SlotTierExample
AgentBalancedGemini 2.5 Flash
Title GenerationEfficientGemini 2.0 Flash
HindsightEfficientGemini 2.0 Flash
TranscriptionWhisperwhisper-large-v3

Everything runs on your own hardware. See Running Models Locally for setup.

SlotTierExample
AgentBalanced (local)Llama 3.3 70B (via Ollama)
Title GenerationEfficient (local)Llama 3.2 3B (via Ollama)
HindsightEfficient (local)Llama 3.2 3B (via Ollama)
TranscriptionWhisper (local)whisper-large-v3 (via Ollama)

Note that the Privacy-First configuration requires significant hardware — see the local models guide for requirements.