Architecture
Deep technical reference for developers working on the Matrix codebase. For user-facing concepts, see Core Concepts. For the reasoning behind Matrix's approach, see Why Matrix.
Overview
Matrix is a multi-agent orchestration system built on git worktrees and LLM provider APIs. Each agent runs in an isolated git worktree on a dedicated branch, communicates via message queues, and persists its conversation as a JSONL event stream. A central HTTP daemon manages the lifecycle of all agents and exposes a web UI for real-time observation.
The Agent Loop (runProviderLoop)
File: src/provider-shared.ts
This is the heart of the system — a single async function* generator that drives all agent execution. Both Anthropic and OpenAI providers use the same loop; provider-specific behavior is injected via the ProviderAdapter interface.
async function* runProviderLoop(
adapter: ProviderAdapter,
request: AgentRequest,
sessionId: string,
queue?: MessageQueue,
): AsyncGenerator<Event, AgentResult>It yields Event objects consumed by the daemon for SSE broadcast and JSONL persistence, and returns an AgentResult when the agent exits.
Step-by-Step Flow
The full sequence:
- Initialize — resume from JSONL (
convertEventsToMessages()), drain queue, build tool handlers - Check abort — exit if
AbortSignalfired - Compact? — if tokens exceed threshold, extract checkpoint and rebuild context
- Call API — streaming request to provider, yield
text_deltaevents - Parse — extract text, tool uses, token usage; emit
assistant_text,tool_callevents - stop_reason = end_turn → implicit yield: emit
agent_idle, wait on queue, emitagent_active - stop_reason = tool_use → execute all tools concurrently (
Promise.all), emittool_resultevents, drain cancellation point messages, check budget - Exit — return
AgentResult { exitReason, output, costUsd, turns }
Key Mechanics
Generator function:
runProviderLoop()is anAsyncGenerator<Event, AgentResult>. Ityields events during execution andreturns the final result. The daemon drives the generator viaconsumeAgentEvents(), calling.next()until.done.end_turn= implicit yield, never implicit done: When the model stops without calling tools (end_turn), the agent enters idle and waits for messages — it does NOT exit. ThehandleImplicitYield()helper emitsagent_idle, blocks onqueue.wait(), then emitsagent_active. The agent stays alive until it explicitly callsdone()or is interrupted. This means an agent that finishes responding but forgets to calldone()simply waits for more input rather than silently exiting.AbortSignal passthrough: The
AbortSignalfromstopAgent()is passed directly through to the provider's streaming API call. This meansstopimmediately interrupts AI generation mid-stream — not just between turns. The signal is checked at the top of each loop iteration, but it also cancels the in-flight HTTP request, so the agent doesn't wait for a full response before stopping.Cancellation points: After tool execution but before the next API call, the loop drains pending queue messages. If messages arrived during tool execution, they get injected before the next turn.
Tool execution: All tools in a single turn execute concurrently via
Promise.all(). TheexecuteTool()function is the single execution path for every tool — built-in, orchestrator, and external MCP.Exit reasons: Every agent exit is classified by an
ExitReasonenum:done_passed/done_failed— the agent explicitly calleddone(). This is the agent's own decision.interrupted— everything else: stop, reset, error, queue close, daemon restart.
The distinction matters for the daemon's response:
done_*means the agent finished and its status should be trusted.interruptedmeans the agent was cut short and may need to resume.
Provider Abstraction (ProviderAdapter)
File: src/provider-shared.ts
The ProviderAdapter interface (~18 hooks) abstracts all differences between Anthropic and OpenAI APIs:
| Hook | Purpose |
|---|---|
getContextWindow(model) | Model's context window size |
getModelPricing(model) | Per-million-token pricing |
convertEventsToMessages(events) | Reconstruct conversation from JSONL on resume |
prepareTools(mcpToolDefs, mcpHandlers) | Format tool definitions for the provider |
callAPI({model, messages, tools, …}) | The actual API call with retries, streaming |
getResponseText(response) | Extract text from response |
getToolUses(response) | Extract tool calls from response |
getTokenUsage(response) | Get input/output/cache token counts |
getStopReason(response) | "end_turn" or "tool_use" |
supportsTokenCounting | Whether exact token counting is available |
countTokens(…) | Exact token count (Anthropic only) |
buildResponseEvents(response, isCompacting) | Create JSONL events from response |
addAssistantMessage(messages, response, …) | Append assistant response to history |
buildToolResultsMessage(…) | Format tool results for the provider |
buildImplicitYieldMessage(…) | Format queue drain during implicit yield |
computeCost(…) | Calculate USD cost |
getOuterRetryDelayMs?(attempt, error) | Custom delay before outer retry of failed API call |
buildResult?(…) | Build final AgentResult with provider-specific fields |
The biggest divergence is buildToolResultsMessage():
- Anthropic: Single
usermessage withtool_result+text+imageblocks - OpenAI: Separate
toolrole messages per result, plus ausermessage for queue text/images
Why Not a Framework?
Frameworks like LangChain abstract tool execution, context management, and streaming. But for an orchestration system, these ARE the product. Tool execution timing controls multi-agent coordination. Event persistence enables resume across daemon restarts. Custom compaction preserves agent memory. Cost tracking needs per-task granularity. Message queues need precise timing for implicit yield and cancellation points. Full control over the run loop is a feature, not incidental complexity.
Event System
Files: src/events.ts, src/event-store.ts, src/daemon/event-system.ts
Event Types
All events are a discriminated union on type (Event type in src/events.ts). Every event carries taskId and ts.
Ephemeral (SSE broadcast only, never persisted):
| Type | Purpose |
|---|---|
text_delta | Streaming text chunks during API response |
usage | Token usage snapshot |
agent_idle / agent_active | Agent waiting/resuming |
status | Human-readable status messages |
clarification_timeout | Clarify timeout fired |
Persisted (JSONL + SSE):
| Type | Purpose |
|---|---|
message | Unified message format (user, task_complete, background, etc.) |
assistant_text | Model's text response |
tool_call / tool_result | Tool execution cycle |
compact_marker | Compaction checkpoint (events before this skipped on resume) |
compact_started / compacted_resume | Compaction lifecycle |
summarization_request | Instruction sent to model for checkpoint generation |
orchestration_started / orchestration_completed | Session lifecycle |
task_started | Child agent session began |
clarification_requested / clarification_answered | User Q&A |
messages_consumed | IDs of messages materialized into conversation |
fork_marker | Session was forked from another agent |
budget_warning / budget_exceeded | Cost tracking |
session_config | Frozen session configuration (tools, system prompt) for cache stability |
error / agent_stopped | Error states |
The persistence decision is made by isPersistedByEmitEvent() — an exhaustive switch with compile-time enforcement via never default.
Tool display formatting is centralized in event-display.ts — a single source for rendering tool calls and results into human-readable output, used by both the CLI and web UI.
Event Flow
Events flow through two paths:
- yield: Returned from the generator to
consumeAgentEvents()for control flow. - emit: The
request.emitcallback wired toemitEvent()for persistence and SSE broadcast.
Two-Phase Message Lifecycle
Messages use a two-phase lifecycle to prevent display reordering in the UI:
- Persist: A
messageevent is written to JSONL when sent. The frontend sees it but defers display. - Materialize: A
messages_consumedevent lists the IDs the agent actually consumed. The frontend places them in the correct position.
This matters because messages can arrive between tool executions, and without the two-phase protocol, the UI would show them in the wrong position relative to tool results.
EventStore
File: src/event-store.ts
JSONL-based persistence — one file per session.
- Append-only with per-session serialization
- Reads are synchronous (only during resume)
readActive(sessionId): Events after the lastcompact_markercopySessionFrom(source, target): Copies active events + appendsfork_marker
Event Converter
File: src/event-converter.ts
walkEventsToMessages() converts JSONL events back into provider message arrays for resume, using EventConverterCallbacks so each provider formats differently while sharing traversal logic. Old tool names in JSONL are mapped to current names via TOOL_NAME_ALIASES.
Task Tree
File: src/task-tracker.ts
The task tree is a JSON structure persisted to tree.json. The TaskTracker class manages it. For user-facing lifecycle, see Core Concepts.
TaskNode
interface TaskNode {
id: string; // ULID
title: string;
description: string;
status: TaskStatus; // draft|pending|in_progress|passed|failed|closed
branch: string | null; // e.g. "mxd/01KMAB1234ABCDEF/task-a"
parentId: string | null;
children: string[]; // ordered list of child IDs
worktreePath: string | null;
costUsd: number;
budgetUsd?: number;
editedBy: "user" | "agent";
persistent: false | "reset" | "continue";
color?: string;
createdAt: string;
updatedAt: string;
session?: TaskSession; // RUNTIME-ONLY — not persisted
}The session field holds the MessageQueue, cwd, backgroundProcesses, and foregroundExecutions for a running agent. It's stripped during save() and undefined on load().
Persistent Tasks
Tasks can be marked as persistent via the persistent field (false | "reset" | "continue"). Persistent task definitions are stored in .mxd/tasks/<id>.json (git-tracked), separate from the task tree's tree.json.
When a persistent task is closed:
- Status resets to
pending(notclosed), so the task runs again in the next cycle. "reset": Session JSONL is deleted — the agent starts fresh each cycle."continue": Session JSONL is kept — the agent resumes with its full conversation history.
Only the root orchestrator can create persistent tasks. This is used for recurring quality agents that run periodically (e.g., code quality audits, test coverage checks).
Short ID Matching
tracker.get(nodeId) supports prefix matching (minimum 8 characters), letting agents reference tasks with shortened IDs to save tokens.
Tool Architecture
Files: src/tools/definitions.ts (built-in), src/orchestrator-tools.ts (orchestrator), src/tool-definition.ts (type)
Tool Definition
interface ToolDefinition<T = Record<string, unknown>> {
name: string;
description: string;
inputSchema: Record<string, ZodType>; // Zod → JSON Schema for API
handler: (args: T, extra?: { toolCallId?: string }) => Promise<InternalToolResult>;
}Tools are namespaced as mcp__<server>__<name> (e.g., mcp__mxd__bash).
Built-in Tools
createBuiltinTools() in src/tools/definitions.ts:
bash— Shell commands with CWD tracking and background process supportbackground— Manage background processes (list, status, kill, await)read_file— Files with line numbers, offset/limit, image supportwrite_file— Create files with auto-directory creationedit_file— String replacement with exact matchlist_files— Glob-based file discoverysearch— Regex search with multiple output modes
Orchestrator Tools
createOrchestratorTools() in src/orchestrator-tools.ts:
- Task management:
create_task,update_task,delete_task,reset_task,close_task,reorder_tasks - Communication:
send_message,yield,done,clarify - Observation:
get_tree,get_task - Cross-project:
list_projects,send_message_to_project - Context:
fork_task_context
Plus external MCP tools connected via McpClientManager (src/mcp-client.ts).
Single Execution Path
Every tool goes through executeTool() in src/provider-shared.ts:
async function executeTool(
toolName: string,
input: Record<string, unknown>,
mcpHandlers: Map<string, ToolDefinition<any>>,
toolCallId?: string,
): Promise<ToolExecResult>One handler map, one lookup, one call. No special cases.
Task Operations
File: src/task-operations.ts
Task mutations (create, update, close, delete, reset, reorder) are implemented as 6 shared functions in task-operations.ts. Both the MCP tool handlers (agent-facing) and REST API routes (user-facing) are thin wrappers that call the same shared functions.
The behavioral difference between agent and user actions is controlled by an editedBy field ("agent" | "user"), not separate code paths. For example, parent chain notification (informing an ancestor that a task was modified) only fires when editedBy: "user" — agent edits don't trigger notifications since the agent already knows what it did.
MCP Namespace Constants
File: src/tool-names.ts
Tool names are defined as constants in tool-names.ts rather than hardcoded strings. This ensures consistency between tool definitions, handlers, and tests. Similarly, QueueMessage factories centralize message construction, and web/api.ts provides a URL builder for API endpoints.
Context Forking
Files: src/orchestrator-tools.ts (tool), src/event-store.ts (event copy), src/events.ts (fork_marker event)
Context forking lets one agent inherit another agent's full conversation history. It follows Unix fork() semantics: after the fork, the parent and child receive different tool_result messages — the parent sees "You are the PARENT," the child sees "You are the CHILD." This is how each agent knows its identity after the fork.
How It Works
- Agent A calls
fork_task_context(sourceTaskId, targetTaskId) EventStore.copySessionFrom()copies all active events (after the lastcompact_marker) from the source session to the target session- A
fork_markerevent is appended to the target's JSONL, containing the source task ID and target task metadata - Any orphaned tool_calls in the copied events get synthetic
tool_results so the message structure is clean - Agent A receives:
"fork_task_context completed. You are the PARENT. Forked source → target. Use send_message to start the child agent." - When the child agent starts, it replays its JSONL and sees the full conversation history plus the fork marker, and its tool_result says:
"This tool was executed by the parent agent. You are the CHILD."
Multi-Layer Forks
When a forked agent forks again (A → B → C), the child sees multiple fork_marker events in its JSONL. The rule: the LAST fork_marker defines identity. Everything before the last marker is background knowledge from upstream agents; the agent's own task description and working directory come from after the final marker.
Fork Sources
Forks aren't limited to self-forking. An agent can fork from:
- Itself → child gets the parent's current session knowledge
- A closed/passed task → child inherits that task's exploration and discoveries
- A sibling task → child builds on a peer's work
Context Compaction
File: src/compaction.ts
When conversations exceed the context window, compaction compresses them into a structured checkpoint. For user-facing explanation, see Core Concepts.
Threshold
const COMPACT_BUFFER_RATIO = 0.17; // reserve ~17% as buffer
const compressThreshold = contextWindow * (1 - COMPACT_BUFFER_RATIO);
const lazyCountThreshold = compressThreshold - 16_000;Anthropic (with exact token counting) does a cheap estimate first, then calls countTokens only if close. OpenAI relies on the estimate alone.
Compaction Flow
1. Token count exceeds threshold (or manual /compact)
│
2. Inject summarization instruction as user message
│
3. Model generates checkpoint with <summary>…</summary> tags
containing 7 sections:
├── User Requests (chronological timeline)
├── Current Phase
├── Completed Work
├── Task Tree State
├── Key Insights & Rejected Approaches
├── Key Context
└── Pending Work
│
4. extractCheckpoint() pulls text from <summary> tags
Appends system context: working directory + resume instructions
│
5. buildCompactedContext() combines:
├── Fresh memory.md (re-read from disk)
└── Checkpoint text
│
6. Conversation replaced with single user message
│
7. compact_marker event persisted to JSONL
└── readActive() skips everything before the marker on resumeKey detail: Memory.md is re-read from disk after compaction — the agent may have updated it during the session, and compaction is the mechanism to get fresh institutional knowledge into the compressed context.
Memory Internals
File: .mxd/memory.md in each git worktree
For user-facing explanation, see Core Concepts.
Loading
Memory.md is read from disk and included in the first message header at agent launch. After compaction, it's re-read and included in the rebuilt context. The header format:
Working directory: /path/to/worktree
# .mxd/memory.md (Preloaded, do not read again)
<contents of memory.md>This header is always how context enters the conversation — no special code paths for fresh start vs resume vs post-compaction. The prepareAgentMessage() function in src/daemon/agent-lifecycle.ts constructs it.
Merging Through Git
Each agent has its own copy in its worktree. When branches merge, memory merges through git. The append-only rule prevents agents from modifying inherited entries. Higher-level agents curate the merged result — trimming trivial notes, consolidating related entries, floating important knowledge to the top.
Configuration System
File: src/config.ts
Three-layer configuration with lower layers overriding higher:
Global (~/.mxd/config.json)
└── Repo (.mxd/config.json) ← committed to git
└── Local (~/.mxd/projects/<id>/config.json) ← per-projectConfig Shape
interface MatrixConfig {
authGroups?: Record<string, AuthGroup>; // provider credentials
defaultAuth?: string; // auth group name for root agent
model?: string; // default: "claude-sonnet-4-6"
childAuth?: string; // auth group name for child agents
childModel?: string; // model for child agents
budgetUsd?: number; // per-task cost limit
maxDepth?: number; // max task tree depth
clarifyTimeoutMs?: number; // auto-resolve clarify after timeout
mcpServers?: Record<string, McpServerConfig>; // external tool servers
port?: number; // default: 7433
sessionKeep?: number; // session JSONL retention count
selfBootstrap?: boolean; // internal: self-development mode
auth?: WebAuthnConfig; // web UI auth settings (legacy name; actual auth is RSA-OAEP)
}
interface AuthGroup {
provider: "anthropic" | "openai";
anthropicApiKey?: string;
claudeOauthToken?: string;
openaiApiKey?: string;
openaiBaseUrl?: string; // for OpenAI-compatible APIs
}
interface McpServerConfig {
command: string; // executable to run
args?: string[]; // command-line arguments
env?: Record<string, string>; // environment variables
}Resolution
resolveConfig(global, repo, local) merges layers:
- Scalars: Local overrides repo overrides global (highest priority layer wins)
- mcpServers / authGroups / auth: Union merge; same-named entries use the highest priority layer's value
Agent Lifecycle
File: src/daemon/agent-lifecycle.ts
Launch Sequence
1. ensureRootNode() — create or reuse root TaskNode
2. Create MessageQueue
3. Load persisted messages → enqueue
4. createAgentContext():
├── resolveProjectConfig()
├── getProjectProvider()
├── McpClientManager.connectAll()
├── createOrchestratorTools()
└── createBuiltinTools()
5. Read active events from EventStore (for resume)
6. Fix orphaned tool_calls
7. Create TaskSession → attach to root node
8. provider.startSession() → { events: AsyncGenerator, stop() }
9. consumeAgentEvents() drives the generator (fire-and-forget)
10. Enqueue first user message with header (memory + working directory)Auto-Resume on Restart
On daemon restart, every in_progress agent is evaluated independently based on its JSONL state — root and children alike. There is no "mark children failed, resume root" cascade. Each agent is assessed on its own:
- Yielding (last event is a pending
yieldtool_call) → Resume with provider loop bypass. Zero API calls — the agent goes straight toqueue.wait()and only wakes when a message arrives. This is the cheapest possible resume. - Interrupted (has orphaned non-yield tool_calls) → Write synthetic
tool_resultevents for orphaned calls, then normal resume with JSONL replay. - Done (status is
passed,failed, orclosed) → Skip, already finished.
This independent assessment means a tree of 10 agents doesn't trigger 10 API calls on restart. Only the agents that were mid-execution resume actively; yielding agents sleep until needed.
Stop Cascade
stopAgent() performs a full cascade: signal abort → close queues → cleanup background processes → save tree → write synthetic tool_results for orphaned calls → emit agent_stopped.
Child tasks are stopped using real interrupt: the child's message queue is closed and its abort signal is fired, triggering immediate termination of the in-flight API call. This is a true cascade — not a fake text message telling the agent to stop — ensuring all descendants terminate promptly.
Critically, children stay in_progress — they are not force-failed. They were interrupted, not broken. On the next daemon restart, autoResume will detect them from their JSONL state and resume each one independently. This avoids the old pattern where stopping the root agent would mark all children as failed, requiring manual recovery.
Child Agent Lifecycle
Child agents launch via runChildAgentInBackground():
- Compute depth via parent chain
- Create MessageQueue, attach TaskSession
createAgentContext()scoped to child's worktreerunChildCore()drives the generator- Post-completion: update cost, check budget, update status, notify parent
Parent notification is crash-safe: all child→parent send_message calls go through deliverMessage(), which persists to the JSONL before enqueuing. If the daemon crashes between the child sending a message and the parent consuming it, the message survives on disk and is recovered on restart.
Parent notification has two paths:
- done() called: The
donetool handler delivers to the parent viadeliverMessage() - Crash/budget exceeded: Fallback via
findParentQueue()— walks up the tree to the nearest running ancestor
Message Queue
File: src/message-queue.ts
A simple async queue for inter-agent communication.
Message Types
type QueueMessage =
| { source: "user"; … }
| { source: "tree_change"; … }
| { source: "task_complete"; taskId; success; output; … }
| { source: "task_message"; fromTaskId; content; … }
| { source: "clarify_response"; answer; … }
| { source: "user_message_forwarded"; … }
| { source: "cross_project"; fromProjectId; content; … }
| { source: "background_complete"; … }
| { source: "compact" }Delivery
deliverMessage() in agent-lifecycle.ts is the single delivery path:
- Try direct: If
session.queueexists and is open →enqueue() - Persist to disk: Write to
pending-messages/directory - Auto-launch: For child nodes, create worktree and launch agent. Persisted messages load on startup.
Queue = cache, disk = durable storage.
Worktree Manager
File: src/worktree-manager.ts
Configurable Base Branch
The root task node stores the project's base branch at initialization time. All worktree creation uses this stored baseBranch — the system prompt and agent tooling are branch-agnostic (no hardcoded main or master assumption). This means Matrix works correctly with any branch naming convention.
Branch Naming
mxd/<full-taskId>/<slugified-title>Worktree Creation
1. git worktree add -b <branch> <path> <baseBranch>
2. git config --worktree core.hooksPath /dev/null ← disable hooks
3. Run .mxd/hooks/setup_worktree.sh ← install deps, etc.Hooks are disabled per-worktree because child agents must not trigger the parent project's pre-commit hooks.
Setup Hook
The setup hook (.mxd/hooks/setup_worktree.sh) is required — worktree creation fails if it's missing. It handles environment setup that new worktrees need: installing dependencies, copying .env files, running build steps.
On mxd init, Matrix creates a .mxd/hooks/setup_worktree.sh.example file with auto-detected content (detects bun/npm/yarn from lockfiles). This .example file is committed to git but is not the active hook — it's a template. The user must:
- Review the
.examplefile - Customize it for their project
- Save as
setup_worktree.sh(without.example) and make executable
This deliberate step prevents auto-generated hooks from silently doing the wrong thing. If the hook installs wrong dependencies or skips an env file, every sub task fails on startup — an expensive mistake when running many agents in parallel.
Daemon
File: src/daemon.ts
The daemon is a Hono HTTP server (default port 7433).
Core Context
interface DaemonContext {
config: DaemonConfig;
pm: ProjectManager;
trackers: Map<string, TaskTracker>;
sseClients: Set<SSEClient>;
activeSessions: Map<string, AgentSession>;
pendingClarifications: Map<string, PendingClarification[]>;
eventStores: Map<string, EventStore>;
// ...
}Route Groups
- Tasks (
routes/tasks.ts): CRUD on task nodes, the unified message endpoint (POST /tasks/:nodeId/message), per-task stop/fork/events - Agent (
routes/agent.ts): project-level agent status, stop, compact, restart, clarify, session management, background process control - Projects (
routes/projects.ts): register/deregister, project events and clarifications - Config (
routes/config.ts): read/write configuration at global, repo, and local layers - SSE (
routes/sse.ts): real-time event stream - Auth (
routes/auth.ts): RSA-OAEP challenge-response authentication
All agent interaction goes through the unified POST /tasks/:nodeId/message endpoint — the same path handles starting a new agent, sending a message to a running one, or resuming a stopped one.
Startup
1. createApp() → build context, register routes
2. Load projects and config
3. Start Bun.serve()
4. Register SIGTERM/SIGINT handlers
5. runEventMigrations()
6. autoResumeProjects()
7. markReady()Design Principles
These principles guide contributor decisions. They aren't arbitrary — each one addresses the two core problems: hallucination (AI says things that aren't true) and architectural tunnel vision (AI extends rather than rethinks). For the user-facing motivation, see Why Matrix.
Cache Invariant: All State Is a Cache of Disk
Kill the daemon at any point. Restart. Everything resumes. The task tree rebuilds from tree.json. Conversations rebuild from JSONL event files. Running agents are detected and reconnected. Nothing lives only in memory.
This combats hallucination at the infrastructure level — disk state is the objective truth that agents resume from, not their potentially confused in-memory representation. It's also a development enabler — when agents modify the system itself (self-bootstrapping), the daemon restarts frequently. Cheap restarts require disk as source of truth.
Single Path Principle
One code path for each operation. No fallbacks, no dual implementations.
executeTool()is the ONE path for all tool executionemitEvent()is the ONE path for all event emissionrunProviderLoop()is the ONE loop for both providerstask-operations.tshas the ONE set of shared functions for task mutations — MCP tools and REST routes are thin wrappersevent-display.tsis the ONE source for tool display formatting
Fallbacks mask bugs and amplify tunnel vision. If path A fails and path B succeeds, you never fix path A. Worse: mental model residue — when old and new systems coexist, agents interpret the new as a variant of the old, leading to wrong reasoning. This is one of the laziness patterns that single-path design prevents.
Event Sourcing
Every state change is an event. Events persist to JSONL. Events broadcast via SSE. The JSONL file IS the conversation — not a log of it, but the source of truth that gets replayed on resume.
Methodology Injection
Every agent receives the same ~400-line system prompt covering: worker workflow, git discipline, code quality, debugging protocol, orchestration philosophy, memory system, and communication patterns. Strategy goes in the system prompt (WHEN and WHY), mechanics go in tool descriptions (HOW). This separation means you can change workflow without touching tools, and vice versa.
Key system prompt principles:
- Task descriptions include WHY — agents get motivation context, not just instructions. Without WHY, agents hesitate at edge cases and make conservative choices.
- Ask when uncertain, never silently fallback — wrong guesses waste more time than questions. Agents are explicitly instructed to
send_message(requestReply=true)rather than silently making a conservative choice. - Incremental merge — workers commit early and often; orchestrators merge individual commits without waiting for
done(). close_taskrejects in-progress tasks — you cannot close a task that's still running, preventing accidental resource cleanup.- Branch-agnostic — no hardcoded branch names; the system works with whatever branch the project uses.
Ownership Framing
"The task above" and "sub task" — not "parent agent" and "child agent." send_message is the same tool for both directions. Communication is coordination between peers with different scopes. This framing encourages agents to take initiative — explore the codebase, make decisions, ask questions — rather than waiting for instructions. See Why Matrix for the user-facing explanation.
AI-Friendly Technology Choices
Every technology choice optimizes for: how fast does the AI get feedback when it makes a mistake? This is the hallucination countermeasure applied at the tooling level — faster feedback means shorter hallucination windows.
- TypeScript strict mode — type errors at compile time, not runtime
- Bun — fast test runner, fast startup (milliseconds vs seconds)
- Biome — single tool for lint + format, one config file
- Event-driven architecture — modules are independent, debuggable in isolation
- Pure functions — testable with simple input/output assertions, no mocks
Anti-patterns: no magic frameworks (convention-over-configuration helps humans, not AI), no obscure libraries (hallucinated APIs are hard to diagnose), no heavy configuration (AI wastes disproportionate time on config files).
Testing Quality Principles
These principles implement the test-is-golden philosophy at the engineering level:
- Mutation resistance — a test that can't catch code mutations is worthless. If you can change the implementation and the test still passes, the test proves nothing. This is test mutation applied as a development discipline. Tests must assert on behavior that matters, not incidental structure.
- Coverage realism — test through real lifecycle paths, not isolated function mocks. A test that calls the actual provider loop with a mock API catches more bugs than a test that mocks every layer boundary.
- Expect failures — if you never see a test fail during development, something is wrong. Tests should be written or modified before the implementation that makes them pass. A test you've never seen fail might not be testing what you think.
Agent Laziness Patterns
AI agents exhibit predictable anti-patterns that undermine code quality. Understanding them is essential because the system prompt is specifically designed to counteract each one. These patterns are symptoms of architectural tunnel vision — the agent's tendency to extend rather than rethink.
1. Fear of Large Changes
Agents gravitate toward minimal patches. Asked to refactor a module, they'll modify the surface — rename a variable, add a wrapper — while leaving the underlying problem intact. This produces code that looks updated but hasn't actually improved.
Countermeasure: The system prompt explicitly encourages agents to make the changes the task requires, not the smallest possible diff. Combined with disposable architecture, agents are trained to understand that large, correct changes are preferable to small, incomplete ones.
2. Unnecessary Fallbacks
When fixing a bug or adding a feature, agents often add "just in case" code paths — try/catch blocks that swallow errors, fallback values that mask failures, compatibility shims that preserve broken behavior. This directly violates the single path principle: fallbacks mask bugs and create code that appears to work while silently doing the wrong thing.
Countermeasure: The system prompt instructs agents to prefer single code paths and avoid defensive programming for scenarios that can't happen. Tests catch real failures; fallbacks hide them.
3. Not Communicating Proactively
When agents get stuck or uncertain, they tend to work silently — guessing at an approach rather than asking for clarification. This wastes tokens on wrong approaches and produces code based on incorrect assumptions.
Countermeasure: The system prompt emphasizes that asking is cheap, rework is expensive. Agents are instructed to use send_message and clarify proactively rather than guessing.
4. Not Questioning Architecture
Given existing code, agents default to extending the current patterns — even when the patterns are the problem. This is tunnel vision in action: the agent sees the code as given rather than as something it can question.
Countermeasure: Architecture mutation gives agents a framework for evaluating whether the current architecture is sound. The test-is-golden philosophy explicitly tells agents that architecture is disposable — if there's a simpler way to pass the tests, propose it.
5. "Unification" That Adds a Third Path
When asked to merge two approaches, agents often create a "unified" version that coexists alongside the originals — three paths instead of one. The old code isn't removed, the new code wraps it, and complexity increases.
Countermeasure: The system prompt explicitly warns against naming things "unified," "improved," or "new" — these names signal that the old version still exists. Real unification means one path replaces two.
Pattern Recognition
These patterns compound. An agent that fears large changes (1) adds a fallback instead of fixing the root cause (2), doesn't ask whether the approach is right (3, 4), and creates a "new unified handler" that wraps the old one (5). The result: five layers of indirection where one simple implementation would suffice. Recognizing the pattern is the first step to breaking it.
Testing Infrastructure
Matrix's test suite is built on custom infrastructure that turns the test framework into a contract enforcer — tests as physical reality applied to the system that builds other systems. Rather than mocking at layer boundaries, tests drive the full agent lifecycle through a mock API that validates protocol invariants automatically.
Mock Instruction DSL
File: src/test-utils/
Tests describe mock API conversations using a JSON-based instruction DSL. Each "turn" specifies what the mock API should return and optionally what it should assert about the request:
const turns: Turn[] = [
{
// Assert the agent's request contains these tool results
assert: { toolResults: ["file contents..."] },
// Respond with this tool call
toolCalls: [{ name: "mcp__mxd__bash", input: { command: "npm test" } }],
},
{
// Assert the bash output was fed back
assert: { toolResults: [/Tests: \d+ passed/] },
// Respond with done
toolCalls: [{ name: "mcp__mxd__done", input: { status: "passed", summary: "All tests pass" } }],
},
];Each turn is a complete request-response cycle: the mock validates the incoming request against assert blocks, then returns the specified response. This makes tests readable as conversations — you can see what the agent sends and what it receives at each step.
Turns support variable capture — extracting values from requests to use in later assertions or responses. Per-conversation turn queues allow multiple agents to have independent instruction sequences in the same test.
Prefix Consistency Validation
On every API call, the mock automatically validates that the new conversation is a strict prefix extension of the previous call's conversation. In other words, the mock checks that all previous messages are still present and unchanged, with new messages only appended at the end.
If this check fails, it means one of two things:
- A JSONL rebuild bug — the conversation was reconstructed incorrectly after a restart
- A cache miss — the provider's prompt cache won't hit because the prefix changed
Both are serious bugs. By validating prefix consistency on every API call, the mock catches conversation corruption that would otherwise surface as mysterious cache misses or subtle behavioral changes in production.
ValidatingMockAPI
The ValidatingMockAPI wraps the instruction DSL with protocol-level validation that runs on every API call:
- Turn alternation — messages must alternate between user and assistant roles
- Tool use / tool result pairing — every
tool_useblock must have a correspondingtool_result, and vice versa - No duplicate messages — the same message cannot appear twice in a conversation
- Prefix consistency — as described above
Test Framework as Contract Enforcer
These validations run automatically on every test that uses the mock API. A test author doesn't need to write assertions for protocol correctness — the infrastructure enforces it. This means every test is also a protocol compliance test, even if its explicit assertions are about something else entirely.
What's Not Implemented Yet
Security Sandbox
DANGER
No file system sandbox, no network restrictions, no command allowlist. Agents have full system access. Acceptable for local development; must be solved for hosted deployment.
Cost Controls
Basic per-task budgetUsd exists (warnings at 80%, stop at 100%). Missing: per-tree budgets, loop detection, idle detection.
Failure Defense
findOrphanedToolCalls()for interrupted executions- Missing: infinite loop detection, branch drift detection, automatic conflict resolution