Skip to content

Architecture

Deep technical reference for developers working on the Matrix codebase. For user-facing concepts, see Core Concepts. For the reasoning behind Matrix's approach, see Why Matrix.

Overview

Matrix is a multi-agent orchestration system built on git worktrees and LLM provider APIs. Each agent runs in an isolated git worktree on a dedicated branch, communicates via message queues, and persists its conversation as a JSONL event stream. A central HTTP daemon manages the lifecycle of all agents and exposes a web UI for real-time observation.

┌─────────────────────────────────────────────┐
│              Web UI (React)                 │
│       Real-time task tree + activity log    │
└──────────────────┬──────────────────────────┘
                   │ SSE (events) + REST (commands)
┌──────────────────▼──────────────────────────┐
│          Daemon (Hono HTTP on :7433)        │
│                                             │
│  Project Manager  Agent Lifecycle  Events   │
│  Task Tracker     Worktree Mgr    Config    │
└──────────────────┬──────────────────────────┘

┌──────────────────▼──────────────────────────┐
│          Provider Abstraction               │
│  Anthropic  │  OpenAI Responses  │  Legacy  │
└─────────────────────────────────────────────┘

The Agent Loop (runProviderLoop)

File: src/provider-shared.ts

This is the heart of the system — a single async function* generator that drives all agent execution. All three providers (Anthropic, OpenAI Responses API, OpenAI Legacy Chat Completions) use the same loop; provider-specific behavior is injected via the ProviderAdapter interface.

typescript
async function* runProviderLoop(
  adapter: ProviderAdapter,
  request: AgentRequest,
  sessionId: string,
  queue?: MessageQueue,
): AsyncGenerator<Event, AgentResult>

It yields Event objects consumed by the daemon for SSE broadcast and JSONL persistence, and returns an AgentResult when the agent exits.

Step-by-Step Flow

Initialize → [Abort?] ──aborted──→ Exit (AgentResult)

                ok

           [Compact?] ──yes──→ Rebuild context ─┐
                │                                │
                no                               │
                ▼                                ▼
           Call API (streaming) ←─────────────────┘


          Parse response

          [stop_reason?]
           ╱          ╲
      end_turn      tool_use
         │              │
         ▼              ▼
   Wait on queue   Execute tools
   (implicit yield)  (concurrent)
         │              │         ╲
      message       continue    done()/budget
         │              │              │
         └──→ [Abort?] ←┘         Exit ←┘

The full sequence:

  1. Initialize — resume from JSONL (convertEventsToMessages()), drain queue, build tool handlers
  2. Check abort — exit if AbortSignal fired
  3. Compact? — if tokens exceed threshold, extract checkpoint and rebuild context
  4. Call API — streaming request to provider, yield text_delta events
  5. Parse — extract text, tool uses, token usage; emit assistant_text, tool_call events
  6. stop_reason = end_turn → implicit yield: emit agent_idle, wait on queue, emit agent_active
  7. stop_reason = tool_use → execute all tools concurrently (Promise.all), emit tool_result events, drain cancellation point messages, check budget
  8. Exit — return AgentResult { exitReason, output, costUsd, turns }

Key Mechanics

  • Generator function: runProviderLoop() is an AsyncGenerator<Event, AgentResult>. It yields events during execution and returns the final result. The daemon drives the generator via consumeAgentEvents(), calling .next() until .done.

  • end_turn = implicit yield, never implicit done: When the model stops without calling tools (end_turn), the agent enters idle and waits for messages — it does NOT exit. The handleImplicitYield() helper emits agent_idle, blocks on queue.wait(), then emits agent_active. The agent stays alive until it explicitly calls done() or is interrupted. This means an agent that finishes responding but forgets to call done() simply waits for more input rather than silently exiting.

  • AbortSignal passthrough: The AbortSignal from stopAgent() is passed directly through to the provider's streaming API call. This means stop immediately interrupts AI generation mid-stream — not just between turns. The signal is checked at the top of each loop iteration, but it also cancels the in-flight HTTP request, so the agent doesn't wait for a full response before stopping.

  • Cancellation points: After tool execution but before the next API call, the loop drains pending queue messages. If messages arrived during tool execution, they get injected before the next turn.

  • Tool execution: All tools in a single turn execute concurrently via Promise.all(). The executeTool() function is the single execution path for every tool — built-in, orchestrator, and external MCP.

  • yield() tool_result: Returns just "resumed." — queue messages are delivered as independent text blocks in the same user turn, not embedded in the tool_result. This keeps the tool_result stable for caching.

  • done() tool_result: On resume after done, the tool_result contains only the preamble and working directory — wake messages are not re-embedded.

  • Exit reasons: Every agent exit is classified by an ExitReason enum:

    • done_passed / done_failed — the agent explicitly called done(). This is the agent's own decision.
    • interrupted — everything else: stop, reset, error, queue close, daemon restart.

    The distinction matters for the daemon's response: done_* means the agent finished and its status should be trusted. interrupted means the agent was cut short and may need to resume.


Provider Abstraction (ProviderAdapter)

File: src/provider-shared.ts

The ProviderAdapter interface (~18 hooks) abstracts all differences between the three provider APIs (Anthropic Messages, OpenAI Responses, OpenAI Legacy Chat Completions):

HookPurpose
getContextWindow(model)Model's context window size
getModelPricing(model)Per-million-token pricing
convertEventsToMessages(events)Reconstruct conversation from JSONL on resume
prepareTools(jsonTools: JsonTool[])Format frozen tool definitions for the provider
callAPI({model, messages, tools, …})The actual API call with retries, streaming
getResponseText(response)Extract text from response
getToolUses(response)Extract tool calls from response
getTokenUsage(response)Get input/output/cache token counts
getStopReason(response)"end_turn" or "tool_use"
supportsTokenCountingWhether exact token counting is available
countTokens(…)Exact token count (Anthropic only)
buildResponseEvents(response, isCompacting)Create JSONL events from response
addAssistantMessage(messages, response, …)Append assistant response to history
buildUserTurn(…)Format tool results + queue messages into a user turn
computeCost(…)Calculate USD cost
getOuterRetryDelayMs?(attempt, error)Custom delay before outer retry of failed API call
buildResult?(…)Build final AgentResult with provider-specific fields

The biggest divergence is buildUserTurn():

  • Anthropic: Single user message with tool_result + text + image blocks
  • OpenAI (both): Separate tool role messages per result, plus a user message for queue text/images

Tool definitions use a JsonTool format ({name, description, jsonSchema}) as the golden source. Tools are computed once at session start via Zod's z.toJSONSchema(), then frozen in a session_config event in JSONL. On resume or fork, providers use these frozen JsonTool[] directly — no Zod regeneration needed. This ensures cache stability: the exact same tool definitions hit the API on every call.

The Anthropic provider is the primary, production-tested path; the OpenAI Responses API provider is newer and functional; the OpenAI Legacy (Chat Completions) provider is maintained but less tested.

Why Not a Framework?

Frameworks like LangChain abstract tool execution, context management, and streaming. But for an orchestration system, these ARE the product. Tool execution timing controls multi-agent coordination. Event persistence enables resume across daemon restarts. Custom compaction preserves agent memory. Cost tracking needs per-task granularity. Message queues need precise timing for implicit yield and cancellation points. Full control over the run loop is a feature, not incidental complexity.


Event System

Files: src/events.ts, src/event-store.ts, src/daemon/event-system.ts

Event Types

All events are a discriminated union on type (Event type in src/events.ts). Every event carries taskId and ts. Events emitted during an agent loop also carry a traceId (ULID) identifying the specific runAgentForNode invocation — used to detect concurrent loops on the same session.

Ephemeral (SSE broadcast only, never persisted):

TypePurpose
text_deltaStreaming text chunks during API response
thinking_deltaStreaming extended thinking chunks
agent_idle / agent_activeAgent waiting/resuming
statusHuman-readable status messages
clarification_timeoutClarify timeout fired

Persisted (JSONL + SSE):

TypePurpose
usageToken usage snapshot (includes cacheCreationTokens, cacheReadTokens, outputTokens)
thinkingExtended thinking block (full content, persisted for replay)
messageUnified message format (user, task_complete, background, etc.)
assistant_textModel's text response
tool_call / tool_resultTool execution cycle
compact_markerCompaction checkpoint (events before this skipped on resume)
compact_started / compacted_resumeCompaction lifecycle
summarization_requestInstruction sent to model for checkpoint generation
orchestration_started / orchestration_completedSession lifecycle
task_startedChild agent session began
clarification_requested / clarification_answeredUser Q&A
messages_consumedIDs of messages materialized into conversation
fork_markerSession was forked from another agent
budget_warning / budget_exceededCost tracking
session_configFrozen session configuration (tools, system prompt) for cache stability
done_notifiedPhase 2 of done() completed (crash recovery marker)
error / agent_stoppedError states

The persistence decision is made by isPersistedByEmitEvent() — an exhaustive switch with compile-time enforcement via never default.

Tool display formatting is centralized in event-display.ts — a single source for rendering tool calls and results into human-readable output, used by both the CLI and web UI.

Event Flow

                          yield Event
Provider Loop ─────────────────────────→ consumeAgentEvents()

      │ emit(Event)

  emitEvent() ──→ broadcast() → SSE → Web UI

  [Persisted?]
   yes │  no
      ▼   ▼
  EventStore  (skip)
  .append()

Events flow through two paths:

  1. yield: Returned from the generator to consumeAgentEvents() for control flow.
  2. emit: The request.emit callback wired to emitEvent() for persistence and SSE broadcast.

Two-Phase Message Lifecycle

Messages use a two-phase lifecycle to prevent display reordering in the UI:

  1. Persist: A message event is written to JSONL when sent. The frontend sees it but defers display.
  2. Materialize: A messages_consumed event lists the IDs the agent actually consumed. The frontend places them in the correct position.

This matters because messages can arrive between tool executions, and without the two-phase protocol, the UI would show them in the wrong position relative to tool results.

EventStore

File: src/event-store.ts

JSONL-based persistence — one file per session.

  • Append-only with per-session serialization via write queue
  • Generation guard: clear() bumps a generation counter; writes enqueued before clear are silently dropped
  • Reads are synchronous (only during resume)
  • readActive(sessionId): Events after the last compact_marker
  • copySessionFrom(source, target): Copies active events + appends fork_marker

Event Converter

File: src/event-converter.ts

walkEventsToMessages() converts JSONL events back into provider message arrays for resume, using EventConverterCallbacks so each provider formats differently while sharing traversal logic. Old tool names in JSONL are mapped to current names via TOOL_NAME_ALIASES.

Cache Architecture

Anthropic's prompt caching uses a prefix-match strategy. The cache prefix order is: tools → system prompt → messages (not system → tools). This means tools must be stable for the cache to hit.

Matrix achieves this through session_config — a persisted event written once at session start that freezes the exact JsonTool[] definitions and system prompt. On resume or fork, providers read tools from the frozen session_config rather than regenerating from Zod schemas. Since the tools are byte-identical across restarts, the cache hit rate reaches 99%+.

The session_config event also stores cacheTtl — root agents get "1h" (long-lived, stable conversations), while regular child tasks use the default 5-minute ephemeral TTL. This is inherited via fork (the session_config event is copied to the child's JSONL).


Task Tree

File: src/task-tracker.ts

The task tree is a JSON structure persisted to tree.json. The TaskTracker class manages it. For user-facing lifecycle, see Core Concepts.

TreeNode: TaskNode | FolderNode

The task tree uses a discriminated union — every node is either a TaskNode (has lifecycle, branch, session) or a FolderNode (pure visual grouping):

typescript
type TreeNode = TaskNode | FolderNode;

function isFolder(node: TreeNode): node is FolderNode {
  return node.type === "folder";
}

function isTask(node: TreeNode): node is TaskNode {
  return node.type !== "folder";
}

FolderNode — minimal structure, no behavior:

typescript
interface FolderNode {
  id: string;
  title: string;
  parentId: string | null;
  children: string[];
  type: "folder";              // discriminator
}

TaskNode — full lifecycle:

typescript
interface TaskNode {
  id: string;                  // ULID
  title: string;
  description: string;
  status: TaskStatus;          // draft|pending|in_progress|verify|failed|closed
  branch: string | null;       // e.g. "mxd/01KMAB1234ABCDEF/task-a"
  parentId: string | null;
  children: string[];          // ordered list of child IDs
  worktreePath: string | null;
  costUsd: number;
  budgetUsd?: number;
  editedBy: "user" | "agent";
  color?: string;
  createdAt: string;
  updatedAt: string;
  type?: "task";               // discriminator (optional for backward compat)
  session?: TaskSession;       // RUNTIME-ONLY — not persisted
}

The session field holds the MessageQueue, AbortController, cwd, backgroundProcesses, foregroundExecutions, and optionally messages and allTools for debug introspection. It's stripped during save() and undefined on load().

Folder Transparency

Folders are transparent to task ownership. Two helpers handle this:

  • getTaskAbove(nodeId) — walks up the tree, skipping folders, to find the nearest task ancestor. Used for message routing (determining if a message is "upward" or "downward") and permission checks.
  • getTasksBelow(nodeId) — collects all descendant tasks, recursing through folders. Folders themselves are excluded from the result.

Adding FolderNode to the TreeNode union created 262 type errors — each one a location that had to decide: does this code care about tree structure (use parentId) or task ownership (use getTaskAbove)? The type system forced every caller to make this distinction explicitly.

Short ID Matching

tracker.get(nodeId) supports prefix matching (minimum 8 characters), letting agents reference tasks with shortened IDs to save tokens.


Tool Architecture

Files: src/tools/definitions.ts (built-in), src/orchestrator-tools.ts (orchestrator), src/tool-definition.ts (type)

Tool Definition

typescript
interface ToolDefinition<T = Record<string, unknown>> {
  name: string;
  description: string;
  inputSchema: Record<string, ZodType>;  // Zod → JSON Schema for API
  handler: (args: T, extra?: { toolCallId?: string }) => Promise<InternalToolResult>;
}

At session start, ToolDefinitions are converted to JsonTool — the frozen, provider-agnostic format:

typescript
interface JsonTool {
  name: string;
  description: string;
  jsonSchema: Record<string, unknown>;  // JSON Schema (from Zod)
}

JsonTool[] is persisted in the session_config event. On resume or fork, providers call prepareTools(jsonTools) to convert these frozen definitions into their native format — no Zod regeneration, byte-identical across restarts.

Tools are namespaced as mcp__<server>__<name> (e.g., mcp__mxd__bash).

Built-in Tools

createBuiltinTools() in src/tools/definitions.ts:

  • bash — Shell commands with CWD tracking and background process support
  • background — Manage background processes (list, status, kill)
  • read_file — Files with line numbers, offset/limit, image support
  • write_file — Create files with auto-directory creation
  • edit_file — String replacement with exact match
  • list_files — Glob-based file discovery
  • search — Regex search with multiple output modes

Orchestrator Tools

createOrchestratorTools() in src/orchestrator-tools.ts (18 tools):

  • Task management: create_task, update_task, delete_task, reset_task, close_task, reorder_tasks
  • Organization: create_folder, delete_folder, rename_folder
  • Communication: send_message, yield, done, clarify
  • Observation: get_tree, get_task
  • Cross-project: list_projects, send_message_to_project
  • Context: fork_task_context

Plus external MCP tools connected via McpClientManager (src/mcp-client.ts).

Hidden Tool: evaluate_script

When selfBootstrap mode is enabled, a hidden evaluate_script tool is added. It's not listed in the tool definitions — agents call it directly by name. The tool executes arbitrary JavaScript/TypeScript code for runtime introspection, with a ctx argument containing:

  • ctx.messages — live provider messages array
  • ctx.tracker — TaskTracker
  • ctx.queue — MessageQueue
  • ctx.deps — orchestrator deps
  • ctx.projectId, ctx.taskId, ctx.sessionId
  • ctx.daemonCtx — full DaemonContext
  • ctx.allTools — frozen JsonTool[] array

Console output (console.log()) is captured and returned alongside the function's return value. This is used for inspecting live state: comparing JSONL vs in-memory messages, checking provider state, or running quick experiments without creating files.

Single Execution Path

Every tool goes through executeTool() in src/provider-shared.ts:

typescript
async function executeTool(
  toolName: string,
  input: Record<string, unknown>,
  mcpHandlers: Map<string, ToolDefinition<any>>,
  toolCallId?: string,
): Promise<ToolExecResult>

One handler map, one lookup, one call. No special cases.

Task Operations

File: src/task-operations.ts

Task mutations (create, update, close, delete, reset, reorder) are implemented as 6 shared functions in task-operations.ts. Both the MCP tool handlers (agent-facing) and REST API routes (user-facing) are thin wrappers that call the same shared functions.

The behavioral difference between agent and user actions is controlled by an editedBy field ("agent" | "user"), not separate code paths. For example, parent chain notification (informing an ancestor that a task was modified) only fires when editedBy: "user" — agent edits don't trigger notifications since the agent already knows what it did.

MCP Namespace Constants

File: src/tool-names.ts

Tool names are defined as constants in tool-names.ts rather than hardcoded strings. This ensures consistency between tool definitions, handlers, and tests. Similarly, QueueMessage factories centralize message construction, and web/api.ts provides a URL builder for API endpoints.


Context Forking

Files: src/orchestrator-tools.ts (tool), src/event-store.ts (event copy), src/events.ts (fork_marker event)

Context forking lets one agent inherit another agent's full conversation history. It follows Unix fork() semantics: after the fork, the parent and child receive different tool_result messages — the parent sees "You are the PARENT," the child sees "You are the CHILD." This is how each agent knows its identity after the fork.

How It Works

  1. Agent A calls fork_task_context(sourceTaskId, targetTaskId)
  2. EventStore.copySessionFrom() copies all active events (after the last compact_marker) from the source session to the target session
  3. A fork_marker event is appended to the target's JSONL, containing the source task ID and target task metadata
  4. Any orphaned tool_calls in the copied events get synthetic tool_results so the message structure is clean
  5. Agent A receives: "fork_task_context completed. You are the PARENT. Forked source → target. Use send_message to start the child agent."
  6. When the child agent starts, it replays its JSONL and sees the full conversation history plus the fork marker, and its tool_result says: "This tool was executed by the parent agent. You are the CHILD."

Multi-Layer Forks

When a forked agent forks again (A → B → C), the child sees multiple fork_marker events in its JSONL. The rule: the LAST fork_marker defines identity. Everything before the last marker is background knowledge from upstream agents; the agent's own task description and working directory come from after the final marker.

Fork Sources

Forks aren't limited to self-forking. An agent can fork from:

  • Itself → child gets the parent's current session knowledge
  • A closed/completed task → child inherits that task's exploration and discoveries
  • A sibling task → child builds on a peer's work

Context Compaction

File: src/compaction.ts

When conversations exceed the context window, compaction compresses them into a structured checkpoint. For user-facing explanation, see Core Concepts.

Threshold

typescript
const COMPACT_BUFFER_RATIO_SMALL = 0.17;  // <1M windows: reserve ~17% buffer
const COMPACT_BUFFER_RATIO_LARGE = 0.08;  // 1M+ windows: reserve ~8% buffer (920K trigger)
const ratio = contextWindow >= 1_000_000 ? COMPACT_BUFFER_RATIO_LARGE : COMPACT_BUFFER_RATIO_SMALL;
const compressThreshold = contextWindow * (1 - ratio);
const lazyCountThreshold = compressThreshold - 16_000;

Smaller windows need more buffer (17%) because the checkpoint and rebuilt context are a larger fraction of the window. 1M+ windows can use a smaller buffer (8%) — a 920K trigger leaves room for a 64K checkpoint plus rebuilt context.

Anthropic (with exact token counting) does a cheap estimate first, then calls countTokens only if close. OpenAI relies on the estimate alone.

Compaction Flow

1. Token count exceeds threshold (or manual /compact)

2. Inject summarization instruction as user message

3. Model generates checkpoint with <summary>…</summary> tags
   containing 8 sections:
   ├── Story So Far (narrative of decisions and discoveries)
   ├── Current Phase
   ├── Completed Work
   ├── Tree Mental Model
   ├── Rejected Approaches & Lessons
   ├── Key Context
   ├── Pending Work
   └── User Messages Reference (verbatim user words)

4. extractCheckpoint() pulls text from <summary> tags
   Appends system context: working directory + resume instructions

5. buildCompactedContext() combines:
   ├── Fresh memory.md (re-read from disk)
   └── Checkpoint text

6. Conversation replaced with single user message

7. compact_marker event persisted to JSONL
   └── readActive() skips everything before the marker on resume

Key detail: Memory.md is re-read from disk after compaction — the agent may have updated it during the session, and compaction is the mechanism to get fresh institutional knowledge into the compressed context.


Memory Internals

File: .mxd/memory.md in each git worktree

For user-facing explanation, see Core Concepts.

Loading

Memory.md is read from disk and included in the first message header at agent launch. After compaction, it's re-read and included in the rebuilt context. The header format:

Working directory: /path/to/worktree

# .mxd/memory.md (Preloaded, do not read again)
<contents of memory.md>

This header is always how context enters the conversation — no special code paths for fresh start vs resume vs post-compaction. The prepareAgentMessage() function in src/daemon/agent-lifecycle.ts constructs it.

Merging Through Git

Each agent has its own copy in its worktree. When branches merge, memory merges through git. The append-only rule prevents agents from modifying inherited entries. Higher-level agents curate the merged result — trimming trivial notes, consolidating related entries, floating important knowledge to the top.


Configuration System

File: src/config.ts

Three-layer configuration with lower layers overriding higher:

Global (~/.mxd/config.json)
  └── Repo (.mxd/config.json)                          ← committed to git
       └── Local (~/.mxd/projects/<id>/config.json)    ← per-project

Config Shape

typescript
interface MatrixConfig {
  authGroups?: Record<string, AuthGroup>;  // provider credentials
  defaultAuth?: string;                    // auth group name for root agent
  model?: string;                          // default: "claude-sonnet-4-6"
  childAuth?: string;                      // auth group name for child agents
  childModel?: string;                     // model for child agents
  budgetUsd?: number;                      // per-task cost limit
  maxDepth?: number;                       // max task tree depth
  clarifyTimeoutMs?: number;               // auto-resolve clarify after timeout
  mcpServers?: Record<string, McpServerConfig>;  // external tool servers
  port?: number;                           // default: 7433
  sessionKeep?: number;                    // session JSONL retention count
  selfBootstrap?: boolean;                 // internal: enables evaluate_script tool
  auth?: WebAuthnConfig;                   // web UI auth settings (legacy name; actual auth is RSA-OAEP)
}

interface AuthGroup {
  provider: "anthropic" | "openai";
  anthropicApiKey?: string;
  claudeOauthToken?: string;
  openaiApiKey?: string;
  openaiBaseUrl?: string;      // for OpenAI-compatible APIs
}

interface McpServerConfig {
  command: string;             // executable to run
  args?: string[];             // command-line arguments
  env?: Record<string, string>;  // environment variables
}

Resolution

resolveConfig(global, repo, local) merges layers:

  • Scalars: Local overrides repo overrides global (highest priority layer wins)
  • mcpServers / authGroups / auth: Union merge; same-named entries use the highest priority layer's value

Agent Lifecycle

File: src/daemon/agent-lifecycle.ts

Launch Sequence

All agents — root and children — launch through a single function: runAgentForNode(). There is no separate root vs child launch path.

1. ensureRootNode() — create or reuse root TaskNode
2. Create MessageQueue
3. Load persisted messages → enqueue
4. Check for interrupted Phase 2 of done() → recover if needed
5. createAgentContext():
   ├── resolveProjectConfig()
   ├── getProjectProvider()
   ├── McpClientManager.connectAll()
   ├── createOrchestratorTools()
   └── createBuiltinTools()
6. Read active events from EventStore (for resume)
7. Fix orphaned tool_calls (buildSessionRepair)
8. Create TaskSession → attach to node
9. runProviderLoop() returns AsyncGenerator
10. consumeAgentEvents() drives the generator (fire-and-forget)
11. Enqueue first user message with header (memory + working directory)
12. Post-completion: Phase 2 of done() (status update, tree broadcast, parent notification)

Auto-Resume on Restart

On daemon restart, every in_progress agent is evaluated independently based on its JSONL state — root and children alike. There is no "mark children failed, resume root" cascade. Each agent is assessed on its own:

  • Yielding (last event is a pending yield tool_call) → Resume with provider loop bypass. Zero API calls — the agent goes straight to queue.wait() and only wakes when a message arrives. This is the cheapest possible resume.
  • Interrupted (has orphaned non-yield tool_calls) → Write synthetic tool_result events for orphaned calls, then normal resume with JSONL replay.
  • Done (status is verify, failed, or closed) → Skip, already finished.

This independent assessment means a tree of 10 agents doesn't trigger 10 API calls on restart. Only the agents that were mid-execution resume actively; yielding agents sleep until needed.

Stop Cascade

stopAgent() performs a full cascade: signal abort → close queues → cleanup background processes → save tree → write synthetic tool_results for orphaned calls → emit agent_stopped.

Child tasks are stopped using real interrupt: the child's message queue is closed and its abort signal is fired, triggering immediate termination of the in-flight API call. This is a true cascade — not a fake text message telling the agent to stop — ensuring all descendants terminate promptly.

Critically, children stay in_progress — they are not force-failed. They were interrupted, not broken. On the next daemon restart, autoResume will detect them from their JSONL state and resume each one independently. This avoids the old pattern where stopping the root agent would mark all children as failed, requiring manual recovery.

Child Agent Lifecycle

Child agents launch via runChildAgentInBackground():

  1. Compute depth via parent chain
  2. Create MessageQueue, attach TaskSession
  3. createAgentContext() scoped to child's worktree
  4. runChildCore() drives the generator
  5. Post-completion: update cost, check budget, update status, notify parent

Parent notification is crash-safe: all child→parent send_message calls go through deliverMessage(), which persists to the JSONL before enqueuing. If the daemon crashes between the child sending a message and the parent consuming it, the message survives on disk and is recovered on restart.

Parent notification has two paths:

  • done() called: The done tool handler delivers to the parent via deliverMessage()
  • Crash/budget exceeded: Fallback via findParentQueue() — walks up the tree to the nearest running ancestor

Two-Phase done()

The done() tool uses a two-phase commit to ensure crash safety:

Phase 1 (agent-side, inside executeTool):

  1. Validate no children have active sessions (prevents premature done)
  2. Close the agent's message queue
  3. Return acknowledgment as tool_result: "Done acknowledged (passed)" or "Done acknowledged (failed)"
  4. The provider loop exits with done_passed or done_failed exit reason

Phase 2 (daemon-side, in runAgentForNode post-loop):

  1. Update node status (verify for passed, failed for failed)
  2. Write done_notified event to JSONL (crash recovery marker)
  3. Broadcast tree change via SSE
  4. Notify parent via deliverMessage() with a task_complete message
  5. Save tree to disk

Crash recovery: If the daemon crashes between Phase 1 and Phase 2, the done tool_call exists in JSONL without a done_notified event. On restart, findInterruptedDonePhase2() detects this pattern and replays Phase 2. If the crash happens between done_notified and tree save, the status_stale recovery path fixes the node status from the JSONL evidence.

traceId: Detecting Concurrent Loops

Every event emitted by runAgentForNode carries a traceId — a ULID generated at loop start. This detects the most dangerous corruption pattern: two agent loops running on the same session simultaneously, interleaving events in the JSONL.

The root cause: during auto-resume, Phase 2 crash recovery calls deliverMessage(task_complete) to notify the parent. Without the quiet: true flag, this eager-launches the parent agent — but auto-resume is about to launch it too. Two loops, one JSONL, interleaved events, permanent API 400 errors.

The fix: all deliverMessage calls during startup use quiet: true (persist to disk only, no eager launch). The traceId infrastructure provides detection: if a session's events contain more than one traceId, something went wrong.

EventStore Generation Guard

File: src/event-store.ts

When a task is reset, the EventStore is cleared — but an async agent cleanup might still try to write. A three-layer defense prevents JSONL reappearance:

  1. stopTask awaits loop exit: The agent fully stops before clear
  2. Generation guard: clear() bumps a per-session generation counter. Writes capture the generation at enqueue time; if it changes before execution, the write is silently dropped
  3. awaitLoopExit: Handles the gap between launchingNodes and agentLoopPromises

The generation guard is pure discard — it never mutates data, never retries, never creates new corruption.


Message Queue

File: src/message-queue.ts

A simple async queue for inter-agent communication.

Message Types

typescript
type QueueMessage =
  | { source: "user"; … }
  | { source: "tree_change"; … }
  | { source: "task_complete"; taskId; success; output; … }
  | { source: "task_message"; fromTaskId; content; … }
  | { source: "clarify_response"; answer; … }
  | { source: "user_message_forwarded"; … }
  | { source: "cross_project"; fromProjectId; content; … }
  | { source: "background_complete"; … }
  | { source: "compact" }

Delivery

deliverMessage() in agent-lifecycle.ts is the single delivery path:

  1. Try direct: If session.queue exists and is open → enqueue()
  2. Persist to disk: Write to pending-messages/ directory
  3. Auto-launch: For child nodes, create worktree and launch agent. Persisted messages load on startup.

Queue = cache, disk = durable storage.


Worktree Manager

File: src/worktree-manager.ts

Configurable Base Branch

The root task node stores the project's base branch at initialization time. All worktree creation uses this stored baseBranch — the system prompt and agent tooling are branch-agnostic (no hardcoded main or master assumption). This means Matrix works correctly with any branch naming convention.

Branch Naming

mxd/<full-taskId>/<slugified-title>

Worktree Creation

1. git worktree add -b <branch> <path> <baseBranch>
2. git config --worktree core.hooksPath /dev/null  ← disable hooks
3. Run .mxd/hooks/setup_worktree.sh          ← install deps, etc.

Hooks are disabled per-worktree because child agents must not trigger the parent project's pre-commit hooks.

Setup Hook

The setup hook (.mxd/hooks/setup_worktree.sh) is required — worktree creation fails if it's missing. It handles environment setup that new worktrees need: installing dependencies, copying .env files, running build steps.

On mxd init, Matrix creates a .mxd/hooks/setup_worktree.sh.example file with auto-detected content (detects bun/npm/yarn from lockfiles). This .example file is committed to git but is not the active hook — it's a template. The user must:

  1. Review the .example file
  2. Customize it for their project
  3. Save as setup_worktree.sh (without .example) and make executable

This deliberate step prevents auto-generated hooks from silently doing the wrong thing. If the hook installs wrong dependencies or skips an env file, every sub task fails on startup — an expensive mistake when running many agents in parallel.


Daemon

File: src/daemon.ts

The daemon is a Hono HTTP server (default port 7433).

Core Context

typescript
interface DaemonContext {
  config: DaemonConfig;
  pm: ProjectManager;
  trackers: Map<string, TaskTracker>;
  sseClients: Set<SSEClient>;
  pendingClarifications: Map<string, PendingClarification[]>;
  eventStores: Map<string, EventStore>;
  restartingProjects: Set<string>;
  launchingNodes: Set<string>;         // prevents duplicate launches during setup
  streamingText: Map<string, string>;  // partial text per node for SSE batch
  agentLoopPromises: Map<string, Promise<void>>;  // tracked for stopTask/resetTask
  requestCount: number;
  startupReady: boolean;
  globalConfig: MatrixConfig;
}

Route Groups

  • Tasks (routes/tasks.ts): CRUD on task nodes, the unified message endpoint (POST /tasks/:nodeId/message), per-task stop/fork/events
  • Agent (routes/agent.ts): project-level agent status, stop, compact, restart, clarify, session management, background process control
  • Projects (routes/projects.ts): register/deregister, project events and clarifications
  • Config (routes/config.ts): read/write configuration at global, repo, and local layers
  • SSE (routes/sse.ts): real-time event stream
  • Auth (routes/auth.ts): RSA-OAEP challenge-response authentication

All agent interaction goes through the unified POST /tasks/:nodeId/message endpoint — the same path handles starting a new agent, sending a message to a running one, or resuming a stopped one.

Startup

1. createApp() → build context, register routes
2. Load projects and config
3. Start Bun.serve()
4. Register SIGTERM/SIGINT handlers
5. runEventMigrations()
6. autoResumeProjects()
7. markReady()

Design Principles

These principles guide contributor decisions. They aren't arbitrary. Each one addresses the core problem: AI is disconnected from reality and from intention. Tests ground it in reality. Tasks ground it in decisions. For the user-facing motivation, see Why Matrix.

Cache Invariant: All State Is a Cache of Disk

Kill the daemon at any point. Restart. Everything resumes. The task tree rebuilds from tree.json. Conversations rebuild from JSONL event files. Running agents are detected and reconnected. Nothing lives only in memory.

This combats hallucination at the infrastructure level — disk state is the objective truth that agents resume from, not their potentially confused in-memory representation. It's also a development enabler — when agents modify the system itself (self-bootstrapping), the daemon restarts frequently. Cheap restarts require disk as source of truth.

Single Path Principle

One code path for each operation. No fallbacks, no dual implementations.

  • executeTool() is the ONE path for all tool execution
  • emitEvent() is the ONE path for all event emission
  • runProviderLoop() is the ONE loop for both providers
  • task-operations.ts has the ONE set of shared functions for task mutations — MCP tools and REST routes are thin wrappers
  • event-display.ts is the ONE source for tool display formatting

Fallbacks mask bugs and amplify tunnel vision. If path A fails and path B succeeds, you never fix path A. Worse: mental model residue — when old and new systems coexist, agents interpret the new as a variant of the old, leading to wrong reasoning. This is one of the laziness patterns that single-path design prevents.

Event Sourcing

Every state change is an event. Events persist to JSONL. Events broadcast via SSE. The JSONL file IS the conversation — not a log of it, but the source of truth that gets replayed on resume.

Methodology Injection

Every agent receives the same system prompt. There are only two roles — root orchestrator and worker — and all agents share the exact same stable prompt. Root agents discover their role from get_tree (their node is at the top). The prompt covers: worker workflow, git discipline, code quality, debugging protocol, orchestration philosophy, memory system, and communication patterns. Strategy goes in the system prompt (WHEN and WHY), mechanics go in tool descriptions (HOW). This separation means you can change workflow without touching tools, and vice versa.

Key system prompt principles:

  • Task descriptions include WHY — agents get motivation context, not just instructions. Without WHY, agents hesitate at edge cases and make conservative choices.
  • Ask when uncertain, never silently fallback — wrong guesses waste more time than questions. Agents are explicitly instructed to send_message(requestReply=true) rather than silently making a conservative choice.
  • Incremental merge — workers commit early and often; orchestrators merge individual commits without waiting for done().
  • close_task rejects in-progress tasks — you cannot close a task that's still running, preventing accidental resource cleanup.
  • Branch-agnostic — no hardcoded branch names; the system works with whatever branch the project uses.

Ownership Framing

"The task above" and "sub task" — not "parent agent" and "child agent." send_message works in both directions with different reach: upward, an agent can message any ancestor in its parent chain (not just the direct parent); downward, only direct sub tasks. create_task can target any position in the tree, because creating a task is recording intent, not executing code. Other mutation operations (update, delete, close, reset) remain scoped to the agent's own sub tree. This asymmetry reflects how real teams communicate: anyone can escalate upward or propose ideas anywhere, but you can only manage your own direct reports. See Why Matrix for the user-facing explanation.

AI-Friendly Technology Choices

Every technology choice optimizes for: how fast does the AI get feedback when it makes a mistake? This is the hallucination countermeasure applied at the tooling level — faster feedback means shorter hallucination windows.

  • TypeScript strict mode — type errors at compile time, not runtime
  • Bun — fast test runner, fast startup (milliseconds vs seconds)
  • Biome — single tool for lint + format, one config file
  • Event-driven architecture — modules are independent, debuggable in isolation
  • Pure functions — testable with simple input/output assertions, no mocks

Anti-patterns: no magic frameworks (convention-over-configuration helps humans, not AI), no obscure libraries (hallucinated APIs are hard to diagnose), no heavy configuration (AI wastes disproportionate time on config files).

Testing Quality Principles

These principles implement "tests are reality for code" at the engineering level:

  • Mutation resistance — a test that can't catch code mutations is worthless. If you can change the implementation and the test still passes, the test proves nothing. Tests must assert on behavior that matters, not incidental structure.
  • Coverage realism — test through real lifecycle paths, not isolated function mocks. A test that calls the actual provider loop with a mock API catches more bugs than a test that mocks every layer boundary.
  • Expect failures — if you never see a test fail during development, something is wrong. Tests should be written or modified before the implementation that makes them pass. A test you've never seen fail might not be testing what you think.

Agent Laziness Patterns

AI agents exhibit predictable anti-patterns that undermine code quality. Understanding them is essential because the system prompt is specifically designed to counteract each one. These patterns are symptoms of AI's disconnection from intention. The agent extends rather than rethinks, because it doesn't feel the consequences of its decisions.

1. Fear of Large Changes

Agents gravitate toward minimal patches. Asked to refactor a module, they'll modify the surface — rename a variable, add a wrapper — while leaving the underlying problem intact. This produces code that looks updated but hasn't actually improved.

Countermeasure: The system prompt explicitly encourages agents to make the changes the task requires, not the smallest possible diff. Agents are trained to understand that large, correct changes are preferable to small, incomplete ones.

2. Unnecessary Fallbacks

When fixing a bug or adding a feature, agents often add "just in case" code paths — try/catch blocks that swallow errors, fallback values that mask failures, compatibility shims that preserve broken behavior. This directly violates the single path principle: fallbacks mask bugs and create code that appears to work while silently doing the wrong thing.

Countermeasure: The system prompt instructs agents to prefer single code paths and avoid defensive programming for scenarios that can't happen. Tests catch real failures; fallbacks hide them.

3. Not Communicating Proactively

When agents get stuck or uncertain, they tend to work silently — guessing at an approach rather than asking for clarification. This wastes tokens on wrong approaches and produces code based on incorrect assumptions.

Countermeasure: The system prompt emphasizes that asking is cheap, rework is expensive. Agents are instructed to use send_message and clarify proactively rather than guessing.

4. Not Questioning Architecture

Given existing code, agents default to extending the current patterns — even when the patterns are the problem. This is tunnel vision in action: the agent sees the code as given rather than as something it can question.

Countermeasure: The system prompt teaches agents that architecture is replaceable. If there's a simpler way to pass the tests, propose it. Tests survive the architecture, not the other way around.

5. "Unification" That Adds a Third Path

When asked to merge two approaches, agents often create a "unified" version that coexists alongside the originals — three paths instead of one. The old code isn't removed, the new code wraps it, and complexity increases.

Countermeasure: The system prompt explicitly warns against naming things "unified," "improved," or "new" — these names signal that the old version still exists. Real unification means one path replaces two.

Pattern Recognition

These patterns compound. An agent that fears large changes (1) adds a fallback instead of fixing the root cause (2), doesn't ask whether the approach is right (3, 4), and creates a "new unified handler" that wraps the old one (5). The result: five layers of indirection where one simple implementation would suffice. Recognizing the pattern is the first step to breaking it.


Testing Infrastructure

Matrix's test suite is built on custom infrastructure that turns the test framework into a contract enforcer. Tests as physical reality, applied to the system that builds other systems. Rather than mocking at layer boundaries, tests drive the full agent lifecycle through a mock API that validates protocol invariants automatically.

Mock Instruction DSL

File: src/test-utils/

Tests describe mock API conversations using a JSON-based instruction DSL. Each "turn" specifies what the mock API should return and optionally what it should assert about the request:

typescript
const turns: Turn[] = [
  {
    // Assert the agent's request contains these tool results
    assert: { toolResults: ["file contents..."] },
    // Respond with this tool call
    toolCalls: [{ name: "mcp__mxd__bash", input: { command: "npm test" } }],
  },
  {
    // Assert the bash output was fed back
    assert: { toolResults: [/Tests: \d+ passed/] },
    // Respond with done
    toolCalls: [{ name: "mcp__mxd__done", input: { status: "passed", summary: "All tests pass" } }],
  },
];

Each turn is a complete request-response cycle: the mock validates the incoming request against assert blocks, then returns the specified response. This makes tests readable as conversations — you can see what the agent sends and what it receives at each step.

Turns support variable capture — extracting values from requests to use in later assertions or responses. Per-conversation turn queues allow multiple agents to have independent instruction sequences in the same test.

Prefix Consistency Validation

On every API call, the mock automatically validates that the new conversation is a strict prefix extension of the previous call's conversation. In other words, the mock checks that all previous messages are still present and unchanged, with new messages only appended at the end.

If this check fails, it means one of two things:

  • A JSONL rebuild bug — the conversation was reconstructed incorrectly after a restart
  • A cache miss — the provider's prompt cache won't hit because the prefix changed

Both are serious bugs. By validating prefix consistency on every API call, the mock catches conversation corruption that would otherwise surface as mysterious cache misses or subtle behavioral changes in production.

ValidatingMockAPI

The ValidatingMockAPI wraps the instruction DSL with protocol-level validation that runs on every API call:

  • Turn alternation — messages must alternate between user and assistant roles
  • Tool use / tool result pairing — every tool_use block must have a corresponding tool_result, and vice versa
  • No duplicate messages — the same message cannot appear twice in a conversation
  • Prefix consistency — as described above

Test Framework as Contract Enforcer

These validations run automatically on every test that uses the mock API. A test author doesn't need to write assertions for protocol correctness — the infrastructure enforces it. This means every test is also a protocol compliance test, even if its explicit assertions are about something else entirely.


What's Not Implemented Yet

Security Sandbox

DANGER

No file system sandbox, no network restrictions, no command allowlist. Agents have full system access. Acceptable for local development; must be solved for hosted deployment.

Cost Controls

Basic per-task budgetUsd exists (warnings at 80%, stop at 100%). Missing: per-tree budgets, loop detection, idle detection.

Failure Defense

  • findOrphanedToolCalls() for interrupted executions
  • Missing: infinite loop detection, branch drift detection, automatic conflict resolution

Released under the MIT License.