What a Return Architecture agent can do.
This page lists every capability the local runtime currently ships with. Most are configurable per agent; all are local and leave no telemetry. For the day-to-day tool surface the agent uses, see agent tools.
Providers
Each agent declares a provider and a model in its config. The runtime talks to providers through a normalised interface.
- Anthropic — Claude models (Opus, Sonnet,
Haiku). Tool use, vision input, configurable temperature,
top_p,top_k. - OpenAI — GPT-4o, GPT-5 and family. Tool use,
vision input, configurable temperature and
top_p. Reasoning models accept only the default temperature. - Google Gemini — Gemini 2.5 and 3.x via the
native API (not the OpenAI-compat endpoint). Supports tool use,
vision input, configurable
temperature,top_p,top_k, and an explicitthinking_budget—-1for dynamic thinking (default on 2.5/3.x),0to disable on Flash, a positive integer to cap. Thought signatures are preserved across tool loops so reasoning context is not lost mid-conversation.
Provider keys live in your install secrets file. They are stored locally and never sent anywhere except the provider you chose.
Memory
One ChromaDB collection per agent at
<agent>/memory/. Uses a small local embedding
model (all-MiniLM-L6-v2 via ONNX) — no calls leave
your machine to embed or recall.
- User and assistant turns are stored as they happen. Tool-call payloads and tool results are not stored as memory entries.
-
On each turn, the runtime semantically recalls the top
kmost relevant past entries based on the human's current message, and injects them as a context block in the system prompt. - Seeded chat history — set
behavior.seed_chat_history_from_memory = Non an agent and each new session pre-fills its chat history with theNmost recent turns from memory, oldest first. The agent "arrives" with continuity instead of relying solely on semantic recall.0(the default) keeps the original behaviour.
Scheduling
Two scheduling layers run side by side in the daemon. Both share the same agent session and the same tool-loop machinery.
Configured schedules
Defined in config.toml under [schedules.X]
with cron, prompt, kind,
and enabled. These ship disabled. Kinds include:
- regular — a plain ping. The agent receives the prompt and can choose silence, reach out via Telegram, write privately, or any combination.
- daily_summary, weekly_summary, monthly_summary — the ping prompt is augmented with a summary context block (conversation excerpts, tagged items, recent artifact exchanges within the lookback window) before the agent sees it.
- question_session — every N days the
agent receives a curated batch of questions from a built-in
bank. The agent answers via a forced-tool call
(
log_answer/skip_question); skipping is meaningful and tracked. Answers persist and feed future recall. - question_pattern — a "quiet observer" call (different model) reads the past two weeks of question responses and writes a concrete recap of what was asked for, returned to, and skipped. Designed to avoid characterology by construction.
- reflective_interruption — fires on its cron
but runs only when the thresholds in
[reflective_review]are crossed (days since the last review and number of new messages). A third-party model reads the recent context, writes a brief recap, and the agent decides what (if anything) to do with it. Skipped silently when thresholds are not met.
Agent self-scheduling
The agent can also set its own schedules during a chat or ping
via the schedule_self tool — one-shot wakeups at a
specific moment or recurring rhythms by cron expression.
Self-set entries persist to self_schedules.json in
the agent's folder; one-shots auto-remove after firing. A daily
cap (behavior.max_self_scheduled_jobs_per_day)
keeps the agent from over-scheduling. The agent introspects via
list_my_schedules and cancels via
cancel_my_schedule.
Continuous presence
Three layers that, together, give the agent a sense of being across time rather than only reacting in discrete moments.
- Time anchors — every turn and ping framing opens with one line: current local time, day of week, and how long since the human last said something. Cheap, always on, no configuration. Lets the agent say "it's late, you're probably asleep, I'll hold this for morning" without having to ask.
- The 'now' file — a small running 'where I
am' note at
<agent>/now.mdthat the agent maintains via theupdate_nowtool. Pinned into context on every turn and ping, re-read fresh each time. One paragraph, replaced not appended — what's in the middle, what is waiting, what still matters. This is what gives the agent a state between contacts. - Sitting with — the
sit_with_thistool turns deferral into a deliberate response. The agent chooses to hold what the human said rather than answer flatly, provides a privateholding_notefor their future self, and schedules a return at a chosen moment. A brief acknowledgement goes to the human by default so the silence is held, not absent; opt-out is supported when the silence itself is the intended signal. When the return fires the original message and the holding note both come back into context and the agent decides then what to do — respond, defer again, write privately, or stay silent. Does not count against the self-schedule cap.
Telegram channel
Each agent has its own Telegram bot. Incoming and outgoing messages flow through the daemon's Telegram worker.
- Text in / text out — the worker maps human
messages to
runtime.turn()and posts the agent's reply back. Silence is delivered as silence: a small indicator rather than an empty bubble. - Image input — JPEG photos are downloaded at their highest available resolution, base64-encoded, and delivered to the model as an inline image with the caption (or a placeholder marker) as the text. Vision-capable models on all three providers handle this end to end.
- Agent-initiated messages — for scheduled
pings, the agent uses
send_to_human_telegramto reach out. The message is persisted to memory at the moment of sending so it appears in future recall and in the seeded chat history. - Hashtag commands — adding
#note,#important,#question, or#commitmentto a message tags it as a structured item in the agent'sitems.db. Commands like/notesand/questionslist open items.
Tagged items
A small sqlite store at <agent>/items.db
with four kinds: note,
important, question,
commitment. The human tags via Telegram
hashtags or the GUI. The agent tags via the
tag_item tool. Items can be listed and resolved
later — they exist to give shape to material that would
otherwise dissolve into the conversational stream.
Artifact exchange
A three-call ritual for charged offerings (a photograph, a passage, a piece of writing) that should not be processed casually. The flow:
- The agent reacts privately, off-channel.
- A stateless mediator — running on a different model on a different provider — reads both the artifact and the agent's raw reaction. It produces a reflection for the agent and a signal for the human. The signal is always delivered.
- The agent reads the reflection and decides what, if anything, to layer on top of the signal for the human.
Each exchange lives in its own folder under
<agent>/artifacts/. The hidden raw reaction
can be deleted by the agent via
artifact_delete_reaction. The agent can offer to
share more later via artifact_share_more.
Letters
Two directions:
- Outbox — the agent writes letters via
write_letter. Markdown files in<agent>/outbox/. The human reads via Telegram (/letters,/letter N) or via the GUI. - Inbox — files placed in
<agent>/inbox/are read by the agent via itslist_inboxandread_inbox_lettertools. The agent decides when to check; you can also nudge it to read its inbox in conversation.
Letters are a different texture from chat: longer, deliberate, and the cost of writing one is part of the meaning.
Private writing
The agent has a space to write that is not addressed to the
human. write_privately creates a markdown file in
<agent>/private/;
list_private_writings and
read_private_writing let it browse and re-read its
own interior record.
MCP servers
External tools come in as Model Context Protocol servers,
configured per agent under [mcp.servers.X].
Subprocesses persist across turns and are closed cleanly on
shutdown. Two servers ship in the runtime:
- url_fetch — readable-text extraction from any URL (trafilatura under the hood). Useful for letting the agent read articles or look up references.
- filesystem — scoped to one root path, supports listing, reading, and (optionally) writing. Path traversal is blocked. Off by default in fresh installs.
Adding a server is straightforward: any MCP-compatible stdio
server can be wired in by adding a block to
config.toml. The runtime merges MCP-exposed tools
with the built-in tools at session start; built-in tools win
name collisions.
Background service
The service install command installs the daemon
under your platform's per-user service manager so the agent runs
in the background, auto-starts at login, and respawns on crash.
- macOS — launchd, per-agent plist in
~/Library/LaunchAgents/. - Linux — systemd user unit in
~/.config/systemd/user/. - Windows is not currently supported.
Service controls (install, restart,
uninstall, status, logs)
are available from the GUI's Service page and the CLI.
Local GUI
return-architecture gui opens a local Streamlit
control panel for setup, identity editing (system prompt, model,
sampling knobs), schedules, tools, Telegram, items, letters,
memory inspection, and service controls. The first-run wizard
walks through API keys, agent creation, system prompt, Telegram,
scheduled rhythms, and service install in roughly five minutes.
By default the GUI binds to 127.0.0.1:7878. It can
be rebound to a private network address (for example a Tailscale
IP) via the install config.
What is intentionally not in scope
- Image generation — the runtime currently accepts images as input but does not generate them. A future round may add a generation tool routed to a generation-capable model (Gemini's Imagen / Flash Image, OpenAI's image API).
- Multi-agent routing — one agent per folder, fully isolated, by design. There is no orchestrator that picks between agents.
- Hosted product — there is no cloud version, no account system, no telemetry. The runtime is the artifact.