Capabilities

What a Return Architecture agent can do.

This page lists every capability the local runtime currently ships with. Most are configurable per agent; all are local and leave no telemetry. For the day-to-day tool surface the agent uses, see agent tools.

Providers

Each agent declares a provider and a model in its config. The runtime talks to providers through a normalised interface.

  • Anthropic — Claude models (Opus, Sonnet, Haiku). Tool use, vision input, configurable temperature, top_p, top_k.
  • OpenAI — GPT-4o, GPT-5 and family. Tool use, vision input, configurable temperature and top_p. Reasoning models accept only the default temperature.
  • Google Gemini — Gemini 2.5 and 3.x via the native API (not the OpenAI-compat endpoint). Supports tool use, vision input, configurable temperature, top_p, top_k, and an explicit thinking_budget-1 for dynamic thinking (default on 2.5/3.x), 0 to disable on Flash, a positive integer to cap. Thought signatures are preserved across tool loops so reasoning context is not lost mid-conversation.

Provider keys live in your install secrets file. They are stored locally and never sent anywhere except the provider you chose.

Memory

One ChromaDB collection per agent at <agent>/memory/. Uses a small local embedding model (all-MiniLM-L6-v2 via ONNX) — no calls leave your machine to embed or recall.

  • User and assistant turns are stored as they happen. Tool-call payloads and tool results are not stored as memory entries.
  • On each turn, the runtime semantically recalls the top k most relevant past entries based on the human's current message, and injects them as a context block in the system prompt.
  • Seeded chat history — set behavior.seed_chat_history_from_memory = N on an agent and each new session pre-fills its chat history with the N most recent turns from memory, oldest first. The agent "arrives" with continuity instead of relying solely on semantic recall. 0 (the default) keeps the original behaviour.

Scheduling

Two scheduling layers run side by side in the daemon. Both share the same agent session and the same tool-loop machinery.

Configured schedules

Defined in config.toml under [schedules.X] with cron, prompt, kind, and enabled. These ship disabled. Kinds include:

  • regular — a plain ping. The agent receives the prompt and can choose silence, reach out via Telegram, write privately, or any combination.
  • daily_summary, weekly_summary, monthly_summary — the ping prompt is augmented with a summary context block (conversation excerpts, tagged items, recent artifact exchanges within the lookback window) before the agent sees it.
  • question_session — every N days the agent receives a curated batch of questions from a built-in bank. The agent answers via a forced-tool call (log_answer / skip_question); skipping is meaningful and tracked. Answers persist and feed future recall.
  • question_pattern — a "quiet observer" call (different model) reads the past two weeks of question responses and writes a concrete recap of what was asked for, returned to, and skipped. Designed to avoid characterology by construction.
  • reflective_interruption — fires on its cron but runs only when the thresholds in [reflective_review] are crossed (days since the last review and number of new messages). A third-party model reads the recent context, writes a brief recap, and the agent decides what (if anything) to do with it. Skipped silently when thresholds are not met.

Agent self-scheduling

The agent can also set its own schedules during a chat or ping via the schedule_self tool — one-shot wakeups at a specific moment or recurring rhythms by cron expression. Self-set entries persist to self_schedules.json in the agent's folder; one-shots auto-remove after firing. A daily cap (behavior.max_self_scheduled_jobs_per_day) keeps the agent from over-scheduling. The agent introspects via list_my_schedules and cancels via cancel_my_schedule.

Continuous presence

Three layers that, together, give the agent a sense of being across time rather than only reacting in discrete moments.

  • Time anchors — every turn and ping framing opens with one line: current local time, day of week, and how long since the human last said something. Cheap, always on, no configuration. Lets the agent say "it's late, you're probably asleep, I'll hold this for morning" without having to ask.
  • The 'now' file — a small running 'where I am' note at <agent>/now.md that the agent maintains via the update_now tool. Pinned into context on every turn and ping, re-read fresh each time. One paragraph, replaced not appended — what's in the middle, what is waiting, what still matters. This is what gives the agent a state between contacts.
  • Sitting with — the sit_with_this tool turns deferral into a deliberate response. The agent chooses to hold what the human said rather than answer flatly, provides a private holding_note for their future self, and schedules a return at a chosen moment. A brief acknowledgement goes to the human by default so the silence is held, not absent; opt-out is supported when the silence itself is the intended signal. When the return fires the original message and the holding note both come back into context and the agent decides then what to do — respond, defer again, write privately, or stay silent. Does not count against the self-schedule cap.

Telegram channel

Each agent has its own Telegram bot. Incoming and outgoing messages flow through the daemon's Telegram worker.

  • Text in / text out — the worker maps human messages to runtime.turn() and posts the agent's reply back. Silence is delivered as silence: a small indicator rather than an empty bubble.
  • Image input — JPEG photos are downloaded at their highest available resolution, base64-encoded, and delivered to the model as an inline image with the caption (or a placeholder marker) as the text. Vision-capable models on all three providers handle this end to end.
  • Agent-initiated messages — for scheduled pings, the agent uses send_to_human_telegram to reach out. The message is persisted to memory at the moment of sending so it appears in future recall and in the seeded chat history.
  • Hashtag commands — adding #note, #important, #question, or #commitment to a message tags it as a structured item in the agent's items.db. Commands like /notes and /questions list open items.

Tagged items

A small sqlite store at <agent>/items.db with four kinds: note, important, question, commitment. The human tags via Telegram hashtags or the GUI. The agent tags via the tag_item tool. Items can be listed and resolved later — they exist to give shape to material that would otherwise dissolve into the conversational stream.

Artifact exchange

A three-call ritual for charged offerings (a photograph, a passage, a piece of writing) that should not be processed casually. The flow:

  1. The agent reacts privately, off-channel.
  2. A stateless mediator — running on a different model on a different provider — reads both the artifact and the agent's raw reaction. It produces a reflection for the agent and a signal for the human. The signal is always delivered.
  3. The agent reads the reflection and decides what, if anything, to layer on top of the signal for the human.

Each exchange lives in its own folder under <agent>/artifacts/. The hidden raw reaction can be deleted by the agent via artifact_delete_reaction. The agent can offer to share more later via artifact_share_more.

Letters

Two directions:

  • Outbox — the agent writes letters via write_letter. Markdown files in <agent>/outbox/. The human reads via Telegram (/letters, /letter N) or via the GUI.
  • Inbox — files placed in <agent>/inbox/ are read by the agent via its list_inbox and read_inbox_letter tools. The agent decides when to check; you can also nudge it to read its inbox in conversation.

Letters are a different texture from chat: longer, deliberate, and the cost of writing one is part of the meaning.

Private writing

The agent has a space to write that is not addressed to the human. write_privately creates a markdown file in <agent>/private/; list_private_writings and read_private_writing let it browse and re-read its own interior record.

MCP servers

External tools come in as Model Context Protocol servers, configured per agent under [mcp.servers.X]. Subprocesses persist across turns and are closed cleanly on shutdown. Two servers ship in the runtime:

  • url_fetch — readable-text extraction from any URL (trafilatura under the hood). Useful for letting the agent read articles or look up references.
  • filesystem — scoped to one root path, supports listing, reading, and (optionally) writing. Path traversal is blocked. Off by default in fresh installs.

Adding a server is straightforward: any MCP-compatible stdio server can be wired in by adding a block to config.toml. The runtime merges MCP-exposed tools with the built-in tools at session start; built-in tools win name collisions.

Background service

The service install command installs the daemon under your platform's per-user service manager so the agent runs in the background, auto-starts at login, and respawns on crash.

  • macOS — launchd, per-agent plist in ~/Library/LaunchAgents/.
  • Linux — systemd user unit in ~/.config/systemd/user/.
  • Windows is not currently supported.

Service controls (install, restart, uninstall, status, logs) are available from the GUI's Service page and the CLI.

Local GUI

return-architecture gui opens a local Streamlit control panel for setup, identity editing (system prompt, model, sampling knobs), schedules, tools, Telegram, items, letters, memory inspection, and service controls. The first-run wizard walks through API keys, agent creation, system prompt, Telegram, scheduled rhythms, and service install in roughly five minutes.

By default the GUI binds to 127.0.0.1:7878. It can be rebound to a private network address (for example a Tailscale IP) via the install config.

What is intentionally not in scope

  • Image generation — the runtime currently accepts images as input but does not generate them. A future round may add a generation tool routed to a generation-capable model (Gemini's Imagen / Flash Image, OpenAI's image API).
  • Multi-agent routing — one agent per folder, fully isolated, by design. There is no orchestrator that picks between agents.
  • Hosted product — there is no cloud version, no account system, no telemetry. The runtime is the artifact.