Skip to content

Architecture

Deep technical reference for developers working on the Matrix codebase. For user-facing concepts, see Core Concepts. For the reasoning behind Matrix's approach, see Why Matrix.

Overview

Matrix is a multi-agent orchestration system built on git worktrees and LLM provider APIs. Each agent runs in an isolated git worktree on a dedicated branch, communicates via message queues, and persists its conversation as a JSONL event stream. A central HTTP daemon manages the lifecycle of all agents and exposes a web UI for real-time observation.


The Agent Loop (runProviderLoop)

File: src/provider-shared.ts

This is the heart of the system — a single async function* generator that drives all agent execution. Both Anthropic and OpenAI providers use the same loop; provider-specific behavior is injected via the ProviderAdapter interface.

typescript
async function* runProviderLoop(
  adapter: ProviderAdapter,
  request: AgentRequest,
  sessionId: string,
  queue?: MessageQueue,
): AsyncGenerator<Event, AgentResult>

It yields Event objects consumed by the daemon for SSE broadcast and JSONL persistence, and returns an AgentResult when the agent exits.

Step-by-Step Flow

The full sequence:

  1. Initialize — resume from JSONL (convertEventsToMessages()), drain queue, build tool handlers
  2. Check abort — exit if AbortSignal fired
  3. Compact? — if tokens exceed threshold, extract checkpoint and rebuild context
  4. Call API — streaming request to provider, yield text_delta events
  5. Parse — extract text, tool uses, token usage; emit assistant_text, tool_call events
  6. stop_reason = end_turn → implicit yield: emit agent_idle, wait on queue, emit agent_active
  7. stop_reason = tool_use → execute all tools concurrently (Promise.all), emit tool_result events, drain cancellation point messages, check budget
  8. Exit — return AgentResult { exitReason, output, costUsd, turns }

Key Mechanics

  • Generator function: runProviderLoop() is an AsyncGenerator<Event, AgentResult>. It yields events during execution and returns the final result. The daemon drives the generator via consumeAgentEvents(), calling .next() until .done.

  • end_turn = implicit yield, never implicit done: When the model stops without calling tools (end_turn), the agent enters idle and waits for messages — it does NOT exit. The handleImplicitYield() helper emits agent_idle, blocks on queue.wait(), then emits agent_active. The agent stays alive until it explicitly calls done() or is interrupted. This means an agent that finishes responding but forgets to call done() simply waits for more input rather than silently exiting.

  • AbortSignal passthrough: The AbortSignal from stopAgent() is passed directly through to the provider's streaming API call. This means stop immediately interrupts AI generation mid-stream — not just between turns. The signal is checked at the top of each loop iteration, but it also cancels the in-flight HTTP request, so the agent doesn't wait for a full response before stopping.

  • Cancellation points: After tool execution but before the next API call, the loop drains pending queue messages. If messages arrived during tool execution, they get injected before the next turn.

  • Tool execution: All tools in a single turn execute concurrently via Promise.all(). The executeTool() function is the single execution path for every tool — built-in, orchestrator, and external MCP.

  • Exit reasons: Every agent exit is classified by an ExitReason enum:

    • done_passed / done_failed — the agent explicitly called done(). This is the agent's own decision.
    • interrupted — everything else: stop, reset, error, queue close, daemon restart.

    The distinction matters for the daemon's response: done_* means the agent finished and its status should be trusted. interrupted means the agent was cut short and may need to resume.


Provider Abstraction (ProviderAdapter)

File: src/provider-shared.ts

The ProviderAdapter interface (~18 hooks) abstracts all differences between Anthropic and OpenAI APIs:

HookPurpose
getContextWindow(model)Model's context window size
getModelPricing(model)Per-million-token pricing
convertEventsToMessages(events)Reconstruct conversation from JSONL on resume
prepareTools(mcpToolDefs, mcpHandlers)Format tool definitions for the provider
callAPI({model, messages, tools, …})The actual API call with retries, streaming
getResponseText(response)Extract text from response
getToolUses(response)Extract tool calls from response
getTokenUsage(response)Get input/output/cache token counts
getStopReason(response)"end_turn" or "tool_use"
supportsTokenCountingWhether exact token counting is available
countTokens(…)Exact token count (Anthropic only)
buildResponseEvents(response, isCompacting)Create JSONL events from response
addAssistantMessage(messages, response, …)Append assistant response to history
buildToolResultsMessage(…)Format tool results for the provider
buildImplicitYieldMessage(…)Format queue drain during implicit yield
computeCost(…)Calculate USD cost
getOuterRetryDelayMs?(attempt, error)Custom delay before outer retry of failed API call
buildResult?(…)Build final AgentResult with provider-specific fields

The biggest divergence is buildToolResultsMessage():

  • Anthropic: Single user message with tool_result + text + image blocks
  • OpenAI: Separate tool role messages per result, plus a user message for queue text/images

Why Not a Framework?

Frameworks like LangChain abstract tool execution, context management, and streaming. But for an orchestration system, these ARE the product. Tool execution timing controls multi-agent coordination. Event persistence enables resume across daemon restarts. Custom compaction preserves agent memory. Cost tracking needs per-task granularity. Message queues need precise timing for implicit yield and cancellation points. Full control over the run loop is a feature, not incidental complexity.


Event System

Files: src/events.ts, src/event-store.ts, src/daemon/event-system.ts

Event Types

All events are a discriminated union on type (Event type in src/events.ts). Every event carries taskId and ts.

Ephemeral (SSE broadcast only, never persisted):

TypePurpose
text_deltaStreaming text chunks during API response
usageToken usage snapshot
agent_idle / agent_activeAgent waiting/resuming
statusHuman-readable status messages
clarification_timeoutClarify timeout fired

Persisted (JSONL + SSE):

TypePurpose
messageUnified message format (user, task_complete, background, etc.)
assistant_textModel's text response
tool_call / tool_resultTool execution cycle
compact_markerCompaction checkpoint (events before this skipped on resume)
compact_started / compacted_resumeCompaction lifecycle
summarization_requestInstruction sent to model for checkpoint generation
orchestration_started / orchestration_completedSession lifecycle
task_startedChild agent session began
clarification_requested / clarification_answeredUser Q&A
messages_consumedIDs of messages materialized into conversation
fork_markerSession was forked from another agent
budget_warning / budget_exceededCost tracking
session_configFrozen session configuration (tools, system prompt) for cache stability
error / agent_stoppedError states

The persistence decision is made by isPersistedByEmitEvent() — an exhaustive switch with compile-time enforcement via never default.

Tool display formatting is centralized in event-display.ts — a single source for rendering tool calls and results into human-readable output, used by both the CLI and web UI.

Event Flow

Events flow through two paths:

  1. yield: Returned from the generator to consumeAgentEvents() for control flow.
  2. emit: The request.emit callback wired to emitEvent() for persistence and SSE broadcast.

Two-Phase Message Lifecycle

Messages use a two-phase lifecycle to prevent display reordering in the UI:

  1. Persist: A message event is written to JSONL when sent. The frontend sees it but defers display.
  2. Materialize: A messages_consumed event lists the IDs the agent actually consumed. The frontend places them in the correct position.

This matters because messages can arrive between tool executions, and without the two-phase protocol, the UI would show them in the wrong position relative to tool results.

EventStore

File: src/event-store.ts

JSONL-based persistence — one file per session.

  • Append-only with per-session serialization
  • Reads are synchronous (only during resume)
  • readActive(sessionId): Events after the last compact_marker
  • copySessionFrom(source, target): Copies active events + appends fork_marker

Event Converter

File: src/event-converter.ts

walkEventsToMessages() converts JSONL events back into provider message arrays for resume, using EventConverterCallbacks so each provider formats differently while sharing traversal logic. Old tool names in JSONL are mapped to current names via TOOL_NAME_ALIASES.


Task Tree

File: src/task-tracker.ts

The task tree is a JSON structure persisted to tree.json. The TaskTracker class manages it. For user-facing lifecycle, see Core Concepts.

TaskNode

typescript
interface TaskNode {
  id: string;                  // ULID
  title: string;
  description: string;
  status: TaskStatus;          // draft|pending|in_progress|passed|failed|closed
  branch: string | null;       // e.g. "mxd/01KMAB1234ABCDEF/task-a"
  parentId: string | null;
  children: string[];          // ordered list of child IDs
  worktreePath: string | null;
  costUsd: number;
  budgetUsd?: number;
  editedBy: "user" | "agent";
  persistent: false | "reset" | "continue";
  color?: string;
  createdAt: string;
  updatedAt: string;
  session?: TaskSession;       // RUNTIME-ONLY — not persisted
}

The session field holds the MessageQueue, cwd, backgroundProcesses, and foregroundExecutions for a running agent. It's stripped during save() and undefined on load().

Persistent Tasks

Tasks can be marked as persistent via the persistent field (false | "reset" | "continue"). Persistent task definitions are stored in .mxd/tasks/<id>.json (git-tracked), separate from the task tree's tree.json.

When a persistent task is closed:

  • Status resets to pending (not closed), so the task runs again in the next cycle.
  • "reset": Session JSONL is deleted — the agent starts fresh each cycle.
  • "continue": Session JSONL is kept — the agent resumes with its full conversation history.

Only the root orchestrator can create persistent tasks. This is used for recurring quality agents that run periodically (e.g., code quality audits, test coverage checks).

Short ID Matching

tracker.get(nodeId) supports prefix matching (minimum 8 characters), letting agents reference tasks with shortened IDs to save tokens.


Tool Architecture

Files: src/tools/definitions.ts (built-in), src/orchestrator-tools.ts (orchestrator), src/tool-definition.ts (type)

Tool Definition

typescript
interface ToolDefinition<T = Record<string, unknown>> {
  name: string;
  description: string;
  inputSchema: Record<string, ZodType>;  // Zod → JSON Schema for API
  handler: (args: T, extra?: { toolCallId?: string }) => Promise<InternalToolResult>;
}

Tools are namespaced as mcp__<server>__<name> (e.g., mcp__mxd__bash).

Built-in Tools

createBuiltinTools() in src/tools/definitions.ts:

  • bash — Shell commands with CWD tracking and background process support
  • background — Manage background processes (list, status, kill, await)
  • read_file — Files with line numbers, offset/limit, image support
  • write_file — Create files with auto-directory creation
  • edit_file — String replacement with exact match
  • list_files — Glob-based file discovery
  • search — Regex search with multiple output modes

Orchestrator Tools

createOrchestratorTools() in src/orchestrator-tools.ts:

  • Task management: create_task, update_task, delete_task, reset_task, close_task, reorder_tasks
  • Communication: send_message, yield, done, clarify
  • Observation: get_tree, get_task
  • Cross-project: list_projects, send_message_to_project
  • Context: fork_task_context

Plus external MCP tools connected via McpClientManager (src/mcp-client.ts).

Single Execution Path

Every tool goes through executeTool() in src/provider-shared.ts:

typescript
async function executeTool(
  toolName: string,
  input: Record<string, unknown>,
  mcpHandlers: Map<string, ToolDefinition<any>>,
  toolCallId?: string,
): Promise<ToolExecResult>

One handler map, one lookup, one call. No special cases.

Task Operations

File: src/task-operations.ts

Task mutations (create, update, close, delete, reset, reorder) are implemented as 6 shared functions in task-operations.ts. Both the MCP tool handlers (agent-facing) and REST API routes (user-facing) are thin wrappers that call the same shared functions.

The behavioral difference between agent and user actions is controlled by an editedBy field ("agent" | "user"), not separate code paths. For example, parent chain notification (informing an ancestor that a task was modified) only fires when editedBy: "user" — agent edits don't trigger notifications since the agent already knows what it did.

MCP Namespace Constants

File: src/tool-names.ts

Tool names are defined as constants in tool-names.ts rather than hardcoded strings. This ensures consistency between tool definitions, handlers, and tests. Similarly, QueueMessage factories centralize message construction, and web/api.ts provides a URL builder for API endpoints.


Context Forking

Files: src/orchestrator-tools.ts (tool), src/event-store.ts (event copy), src/events.ts (fork_marker event)

Context forking lets one agent inherit another agent's full conversation history. It follows Unix fork() semantics: after the fork, the parent and child receive different tool_result messages — the parent sees "You are the PARENT," the child sees "You are the CHILD." This is how each agent knows its identity after the fork.

How It Works

  1. Agent A calls fork_task_context(sourceTaskId, targetTaskId)
  2. EventStore.copySessionFrom() copies all active events (after the last compact_marker) from the source session to the target session
  3. A fork_marker event is appended to the target's JSONL, containing the source task ID and target task metadata
  4. Any orphaned tool_calls in the copied events get synthetic tool_results so the message structure is clean
  5. Agent A receives: "fork_task_context completed. You are the PARENT. Forked source → target. Use send_message to start the child agent."
  6. When the child agent starts, it replays its JSONL and sees the full conversation history plus the fork marker, and its tool_result says: "This tool was executed by the parent agent. You are the CHILD."

Multi-Layer Forks

When a forked agent forks again (A → B → C), the child sees multiple fork_marker events in its JSONL. The rule: the LAST fork_marker defines identity. Everything before the last marker is background knowledge from upstream agents; the agent's own task description and working directory come from after the final marker.

Fork Sources

Forks aren't limited to self-forking. An agent can fork from:

  • Itself → child gets the parent's current session knowledge
  • A closed/passed task → child inherits that task's exploration and discoveries
  • A sibling task → child builds on a peer's work

Context Compaction

File: src/compaction.ts

When conversations exceed the context window, compaction compresses them into a structured checkpoint. For user-facing explanation, see Core Concepts.

Threshold

typescript
const COMPACT_BUFFER_RATIO = 0.17;  // reserve ~17% as buffer
const compressThreshold = contextWindow * (1 - COMPACT_BUFFER_RATIO);
const lazyCountThreshold = compressThreshold - 16_000;

Anthropic (with exact token counting) does a cheap estimate first, then calls countTokens only if close. OpenAI relies on the estimate alone.

Compaction Flow

1. Token count exceeds threshold (or manual /compact)

2. Inject summarization instruction as user message

3. Model generates checkpoint with <summary>…</summary> tags
   containing 7 sections:
   ├── User Requests (chronological timeline)
   ├── Current Phase
   ├── Completed Work
   ├── Task Tree State
   ├── Key Insights & Rejected Approaches
   ├── Key Context
   └── Pending Work

4. extractCheckpoint() pulls text from <summary> tags
   Appends system context: working directory + resume instructions

5. buildCompactedContext() combines:
   ├── Fresh memory.md (re-read from disk)
   └── Checkpoint text

6. Conversation replaced with single user message

7. compact_marker event persisted to JSONL
   └── readActive() skips everything before the marker on resume

Key detail: Memory.md is re-read from disk after compaction — the agent may have updated it during the session, and compaction is the mechanism to get fresh institutional knowledge into the compressed context.


Memory Internals

File: .mxd/memory.md in each git worktree

For user-facing explanation, see Core Concepts.

Loading

Memory.md is read from disk and included in the first message header at agent launch. After compaction, it's re-read and included in the rebuilt context. The header format:

Working directory: /path/to/worktree

# .mxd/memory.md (Preloaded, do not read again)
<contents of memory.md>

This header is always how context enters the conversation — no special code paths for fresh start vs resume vs post-compaction. The prepareAgentMessage() function in src/daemon/agent-lifecycle.ts constructs it.

Merging Through Git

Each agent has its own copy in its worktree. When branches merge, memory merges through git. The append-only rule prevents agents from modifying inherited entries. Higher-level agents curate the merged result — trimming trivial notes, consolidating related entries, floating important knowledge to the top.


Configuration System

File: src/config.ts

Three-layer configuration with lower layers overriding higher:

Global (~/.mxd/config.json)
  └── Repo (.mxd/config.json)                          ← committed to git
       └── Local (~/.mxd/projects/<id>/config.json)    ← per-project

Config Shape

typescript
interface MatrixConfig {
  authGroups?: Record<string, AuthGroup>;  // provider credentials
  defaultAuth?: string;                    // auth group name for root agent
  model?: string;                          // default: "claude-sonnet-4-6"
  childAuth?: string;                      // auth group name for child agents
  childModel?: string;                     // model for child agents
  budgetUsd?: number;                      // per-task cost limit
  maxDepth?: number;                       // max task tree depth
  clarifyTimeoutMs?: number;               // auto-resolve clarify after timeout
  mcpServers?: Record<string, McpServerConfig>;  // external tool servers
  port?: number;                           // default: 7433
  sessionKeep?: number;                    // session JSONL retention count
  selfBootstrap?: boolean;                 // internal: self-development mode
  auth?: WebAuthnConfig;                   // web UI auth settings (legacy name; actual auth is RSA-OAEP)
}

interface AuthGroup {
  provider: "anthropic" | "openai";
  anthropicApiKey?: string;
  claudeOauthToken?: string;
  openaiApiKey?: string;
  openaiBaseUrl?: string;      // for OpenAI-compatible APIs
}

interface McpServerConfig {
  command: string;             // executable to run
  args?: string[];             // command-line arguments
  env?: Record<string, string>;  // environment variables
}

Resolution

resolveConfig(global, repo, local) merges layers:

  • Scalars: Local overrides repo overrides global (highest priority layer wins)
  • mcpServers / authGroups / auth: Union merge; same-named entries use the highest priority layer's value

Agent Lifecycle

File: src/daemon/agent-lifecycle.ts

Launch Sequence

1. ensureRootNode() — create or reuse root TaskNode
2. Create MessageQueue
3. Load persisted messages → enqueue
4. createAgentContext():
   ├── resolveProjectConfig()
   ├── getProjectProvider()
   ├── McpClientManager.connectAll()
   ├── createOrchestratorTools()
   └── createBuiltinTools()
5. Read active events from EventStore (for resume)
6. Fix orphaned tool_calls
7. Create TaskSession → attach to root node
8. provider.startSession() → { events: AsyncGenerator, stop() }
9. consumeAgentEvents() drives the generator (fire-and-forget)
10. Enqueue first user message with header (memory + working directory)

Auto-Resume on Restart

On daemon restart, every in_progress agent is evaluated independently based on its JSONL state — root and children alike. There is no "mark children failed, resume root" cascade. Each agent is assessed on its own:

  • Yielding (last event is a pending yield tool_call) → Resume with provider loop bypass. Zero API calls — the agent goes straight to queue.wait() and only wakes when a message arrives. This is the cheapest possible resume.
  • Interrupted (has orphaned non-yield tool_calls) → Write synthetic tool_result events for orphaned calls, then normal resume with JSONL replay.
  • Done (status is passed, failed, or closed) → Skip, already finished.

This independent assessment means a tree of 10 agents doesn't trigger 10 API calls on restart. Only the agents that were mid-execution resume actively; yielding agents sleep until needed.

Stop Cascade

stopAgent() performs a full cascade: signal abort → close queues → cleanup background processes → save tree → write synthetic tool_results for orphaned calls → emit agent_stopped.

Child tasks are stopped using real interrupt: the child's message queue is closed and its abort signal is fired, triggering immediate termination of the in-flight API call. This is a true cascade — not a fake text message telling the agent to stop — ensuring all descendants terminate promptly.

Critically, children stay in_progress — they are not force-failed. They were interrupted, not broken. On the next daemon restart, autoResume will detect them from their JSONL state and resume each one independently. This avoids the old pattern where stopping the root agent would mark all children as failed, requiring manual recovery.

Child Agent Lifecycle

Child agents launch via runChildAgentInBackground():

  1. Compute depth via parent chain
  2. Create MessageQueue, attach TaskSession
  3. createAgentContext() scoped to child's worktree
  4. runChildCore() drives the generator
  5. Post-completion: update cost, check budget, update status, notify parent

Parent notification is crash-safe: all child→parent send_message calls go through deliverMessage(), which persists to the JSONL before enqueuing. If the daemon crashes between the child sending a message and the parent consuming it, the message survives on disk and is recovered on restart.

Parent notification has two paths:

  • done() called: The done tool handler delivers to the parent via deliverMessage()
  • Crash/budget exceeded: Fallback via findParentQueue() — walks up the tree to the nearest running ancestor

Message Queue

File: src/message-queue.ts

A simple async queue for inter-agent communication.

Message Types

typescript
type QueueMessage =
  | { source: "user"; … }
  | { source: "tree_change"; … }
  | { source: "task_complete"; taskId; success; output; … }
  | { source: "task_message"; fromTaskId; content; … }
  | { source: "clarify_response"; answer; … }
  | { source: "user_message_forwarded"; … }
  | { source: "cross_project"; fromProjectId; content; … }
  | { source: "background_complete"; … }
  | { source: "compact" }

Delivery

deliverMessage() in agent-lifecycle.ts is the single delivery path:

  1. Try direct: If session.queue exists and is open → enqueue()
  2. Persist to disk: Write to pending-messages/ directory
  3. Auto-launch: For child nodes, create worktree and launch agent. Persisted messages load on startup.

Queue = cache, disk = durable storage.


Worktree Manager

File: src/worktree-manager.ts

Configurable Base Branch

The root task node stores the project's base branch at initialization time. All worktree creation uses this stored baseBranch — the system prompt and agent tooling are branch-agnostic (no hardcoded main or master assumption). This means Matrix works correctly with any branch naming convention.

Branch Naming

mxd/<full-taskId>/<slugified-title>

Worktree Creation

1. git worktree add -b <branch> <path> <baseBranch>
2. git config --worktree core.hooksPath /dev/null  ← disable hooks
3. Run .mxd/hooks/setup_worktree.sh          ← install deps, etc.

Hooks are disabled per-worktree because child agents must not trigger the parent project's pre-commit hooks.

Setup Hook

The setup hook (.mxd/hooks/setup_worktree.sh) is required — worktree creation fails if it's missing. It handles environment setup that new worktrees need: installing dependencies, copying .env files, running build steps.

On mxd init, Matrix creates a .mxd/hooks/setup_worktree.sh.example file with auto-detected content (detects bun/npm/yarn from lockfiles). This .example file is committed to git but is not the active hook — it's a template. The user must:

  1. Review the .example file
  2. Customize it for their project
  3. Save as setup_worktree.sh (without .example) and make executable

This deliberate step prevents auto-generated hooks from silently doing the wrong thing. If the hook installs wrong dependencies or skips an env file, every sub task fails on startup — an expensive mistake when running many agents in parallel.


Daemon

File: src/daemon.ts

The daemon is a Hono HTTP server (default port 7433).

Core Context

typescript
interface DaemonContext {
  config: DaemonConfig;
  pm: ProjectManager;
  trackers: Map<string, TaskTracker>;
  sseClients: Set<SSEClient>;
  activeSessions: Map<string, AgentSession>;
  pendingClarifications: Map<string, PendingClarification[]>;
  eventStores: Map<string, EventStore>;
  // ...
}

Route Groups

  • Tasks (routes/tasks.ts): CRUD on task nodes, the unified message endpoint (POST /tasks/:nodeId/message), per-task stop/fork/events
  • Agent (routes/agent.ts): project-level agent status, stop, compact, restart, clarify, session management, background process control
  • Projects (routes/projects.ts): register/deregister, project events and clarifications
  • Config (routes/config.ts): read/write configuration at global, repo, and local layers
  • SSE (routes/sse.ts): real-time event stream
  • Auth (routes/auth.ts): RSA-OAEP challenge-response authentication

All agent interaction goes through the unified POST /tasks/:nodeId/message endpoint — the same path handles starting a new agent, sending a message to a running one, or resuming a stopped one.

Startup

1. createApp() → build context, register routes
2. Load projects and config
3. Start Bun.serve()
4. Register SIGTERM/SIGINT handlers
5. runEventMigrations()
6. autoResumeProjects()
7. markReady()

Design Principles

These principles guide contributor decisions. They aren't arbitrary — each one addresses the two core problems: hallucination (AI says things that aren't true) and architectural tunnel vision (AI extends rather than rethinks). For the user-facing motivation, see Why Matrix.

Cache Invariant: All State Is a Cache of Disk

Kill the daemon at any point. Restart. Everything resumes. The task tree rebuilds from tree.json. Conversations rebuild from JSONL event files. Running agents are detected and reconnected. Nothing lives only in memory.

This combats hallucination at the infrastructure level — disk state is the objective truth that agents resume from, not their potentially confused in-memory representation. It's also a development enabler — when agents modify the system itself (self-bootstrapping), the daemon restarts frequently. Cheap restarts require disk as source of truth.

Single Path Principle

One code path for each operation. No fallbacks, no dual implementations.

  • executeTool() is the ONE path for all tool execution
  • emitEvent() is the ONE path for all event emission
  • runProviderLoop() is the ONE loop for both providers
  • task-operations.ts has the ONE set of shared functions for task mutations — MCP tools and REST routes are thin wrappers
  • event-display.ts is the ONE source for tool display formatting

Fallbacks mask bugs and amplify tunnel vision. If path A fails and path B succeeds, you never fix path A. Worse: mental model residue — when old and new systems coexist, agents interpret the new as a variant of the old, leading to wrong reasoning. This is one of the laziness patterns that single-path design prevents.

Event Sourcing

Every state change is an event. Events persist to JSONL. Events broadcast via SSE. The JSONL file IS the conversation — not a log of it, but the source of truth that gets replayed on resume.

Methodology Injection

Every agent receives the same ~400-line system prompt covering: worker workflow, git discipline, code quality, debugging protocol, orchestration philosophy, memory system, and communication patterns. Strategy goes in the system prompt (WHEN and WHY), mechanics go in tool descriptions (HOW). This separation means you can change workflow without touching tools, and vice versa.

Key system prompt principles:

  • Task descriptions include WHY — agents get motivation context, not just instructions. Without WHY, agents hesitate at edge cases and make conservative choices.
  • Ask when uncertain, never silently fallback — wrong guesses waste more time than questions. Agents are explicitly instructed to send_message(requestReply=true) rather than silently making a conservative choice.
  • Incremental merge — workers commit early and often; orchestrators merge individual commits without waiting for done().
  • close_task rejects in-progress tasks — you cannot close a task that's still running, preventing accidental resource cleanup.
  • Branch-agnostic — no hardcoded branch names; the system works with whatever branch the project uses.

Ownership Framing

"The task above" and "sub task" — not "parent agent" and "child agent." send_message is the same tool for both directions. Communication is coordination between peers with different scopes. This framing encourages agents to take initiative — explore the codebase, make decisions, ask questions — rather than waiting for instructions. See Why Matrix for the user-facing explanation.

AI-Friendly Technology Choices

Every technology choice optimizes for: how fast does the AI get feedback when it makes a mistake? This is the hallucination countermeasure applied at the tooling level — faster feedback means shorter hallucination windows.

  • TypeScript strict mode — type errors at compile time, not runtime
  • Bun — fast test runner, fast startup (milliseconds vs seconds)
  • Biome — single tool for lint + format, one config file
  • Event-driven architecture — modules are independent, debuggable in isolation
  • Pure functions — testable with simple input/output assertions, no mocks

Anti-patterns: no magic frameworks (convention-over-configuration helps humans, not AI), no obscure libraries (hallucinated APIs are hard to diagnose), no heavy configuration (AI wastes disproportionate time on config files).

Testing Quality Principles

These principles implement the test-is-golden philosophy at the engineering level:

  • Mutation resistance — a test that can't catch code mutations is worthless. If you can change the implementation and the test still passes, the test proves nothing. This is test mutation applied as a development discipline. Tests must assert on behavior that matters, not incidental structure.
  • Coverage realism — test through real lifecycle paths, not isolated function mocks. A test that calls the actual provider loop with a mock API catches more bugs than a test that mocks every layer boundary.
  • Expect failures — if you never see a test fail during development, something is wrong. Tests should be written or modified before the implementation that makes them pass. A test you've never seen fail might not be testing what you think.

Agent Laziness Patterns

AI agents exhibit predictable anti-patterns that undermine code quality. Understanding them is essential because the system prompt is specifically designed to counteract each one. These patterns are symptoms of architectural tunnel vision — the agent's tendency to extend rather than rethink.

1. Fear of Large Changes

Agents gravitate toward minimal patches. Asked to refactor a module, they'll modify the surface — rename a variable, add a wrapper — while leaving the underlying problem intact. This produces code that looks updated but hasn't actually improved.

Countermeasure: The system prompt explicitly encourages agents to make the changes the task requires, not the smallest possible diff. Combined with disposable architecture, agents are trained to understand that large, correct changes are preferable to small, incomplete ones.

2. Unnecessary Fallbacks

When fixing a bug or adding a feature, agents often add "just in case" code paths — try/catch blocks that swallow errors, fallback values that mask failures, compatibility shims that preserve broken behavior. This directly violates the single path principle: fallbacks mask bugs and create code that appears to work while silently doing the wrong thing.

Countermeasure: The system prompt instructs agents to prefer single code paths and avoid defensive programming for scenarios that can't happen. Tests catch real failures; fallbacks hide them.

3. Not Communicating Proactively

When agents get stuck or uncertain, they tend to work silently — guessing at an approach rather than asking for clarification. This wastes tokens on wrong approaches and produces code based on incorrect assumptions.

Countermeasure: The system prompt emphasizes that asking is cheap, rework is expensive. Agents are instructed to use send_message and clarify proactively rather than guessing.

4. Not Questioning Architecture

Given existing code, agents default to extending the current patterns — even when the patterns are the problem. This is tunnel vision in action: the agent sees the code as given rather than as something it can question.

Countermeasure: Architecture mutation gives agents a framework for evaluating whether the current architecture is sound. The test-is-golden philosophy explicitly tells agents that architecture is disposable — if there's a simpler way to pass the tests, propose it.

5. "Unification" That Adds a Third Path

When asked to merge two approaches, agents often create a "unified" version that coexists alongside the originals — three paths instead of one. The old code isn't removed, the new code wraps it, and complexity increases.

Countermeasure: The system prompt explicitly warns against naming things "unified," "improved," or "new" — these names signal that the old version still exists. Real unification means one path replaces two.

Pattern Recognition

These patterns compound. An agent that fears large changes (1) adds a fallback instead of fixing the root cause (2), doesn't ask whether the approach is right (3, 4), and creates a "new unified handler" that wraps the old one (5). The result: five layers of indirection where one simple implementation would suffice. Recognizing the pattern is the first step to breaking it.


Testing Infrastructure

Matrix's test suite is built on custom infrastructure that turns the test framework into a contract enforcer — tests as physical reality applied to the system that builds other systems. Rather than mocking at layer boundaries, tests drive the full agent lifecycle through a mock API that validates protocol invariants automatically.

Mock Instruction DSL

File: src/test-utils/

Tests describe mock API conversations using a JSON-based instruction DSL. Each "turn" specifies what the mock API should return and optionally what it should assert about the request:

typescript
const turns: Turn[] = [
  {
    // Assert the agent's request contains these tool results
    assert: { toolResults: ["file contents..."] },
    // Respond with this tool call
    toolCalls: [{ name: "mcp__mxd__bash", input: { command: "npm test" } }],
  },
  {
    // Assert the bash output was fed back
    assert: { toolResults: [/Tests: \d+ passed/] },
    // Respond with done
    toolCalls: [{ name: "mcp__mxd__done", input: { status: "passed", summary: "All tests pass" } }],
  },
];

Each turn is a complete request-response cycle: the mock validates the incoming request against assert blocks, then returns the specified response. This makes tests readable as conversations — you can see what the agent sends and what it receives at each step.

Turns support variable capture — extracting values from requests to use in later assertions or responses. Per-conversation turn queues allow multiple agents to have independent instruction sequences in the same test.

Prefix Consistency Validation

On every API call, the mock automatically validates that the new conversation is a strict prefix extension of the previous call's conversation. In other words, the mock checks that all previous messages are still present and unchanged, with new messages only appended at the end.

If this check fails, it means one of two things:

  • A JSONL rebuild bug — the conversation was reconstructed incorrectly after a restart
  • A cache miss — the provider's prompt cache won't hit because the prefix changed

Both are serious bugs. By validating prefix consistency on every API call, the mock catches conversation corruption that would otherwise surface as mysterious cache misses or subtle behavioral changes in production.

ValidatingMockAPI

The ValidatingMockAPI wraps the instruction DSL with protocol-level validation that runs on every API call:

  • Turn alternation — messages must alternate between user and assistant roles
  • Tool use / tool result pairing — every tool_use block must have a corresponding tool_result, and vice versa
  • No duplicate messages — the same message cannot appear twice in a conversation
  • Prefix consistency — as described above

Test Framework as Contract Enforcer

These validations run automatically on every test that uses the mock API. A test author doesn't need to write assertions for protocol correctness — the infrastructure enforces it. This means every test is also a protocol compliance test, even if its explicit assertions are about something else entirely.


What's Not Implemented Yet

Security Sandbox

DANGER

No file system sandbox, no network restrictions, no command allowlist. Agents have full system access. Acceptable for local development; must be solved for hosted deployment.

Cost Controls

Basic per-task budgetUsd exists (warnings at 80%, stop at 100%). Missing: per-tree budgets, loop detection, idle detection.

Failure Defense

  • findOrphanedToolCalls() for interrupted executions
  • Missing: infinite loop detection, branch drift detection, automatic conflict resolution

Released under the MIT License.