Skip to content

Core Concepts

This page explains what you get when you use Matrix — the features and workflows from a user's perspective. For internal implementation details, see Architecture.


Agent Basics

An AI agent is a Large Language Model (LLM) running in a loop with access to tools. You give it a goal, and it autonomously observes, reasons, acts, and repeats until the goal is achieved.

Observe → Think → Act → Check Result → Done?
  ↑                                      │
  └──────────── No ──────────────────────┘
                                    Yes → Done

This is different from a chatbot. A chatbot is a single request-response. An agent keeps going autonomously — observing, thinking, acting — until it reaches its goal or gets stuck.

Tools

LLMs can only generate text. Tools bridge text generation to real-world actions. The LLM generates a structured "tool call" (e.g., bash("npm test")), the system executes it, and the output is fed back to the LLM for the next decision.

Tools aren't just convenience — they're how AI gets grounded in reality. Every tool call produces an objective result that the AI cannot hallucinate: a test passes or fails, a file exists or doesn't, a command succeeds or errors. This is what makes autonomous AI programming possible — tools are physical reality for AI.

Matrix gives agents the same tools a human developer uses:

CategoryToolsPurpose
File operationsread_file, write_file, edit_file, list_files, searchRead and modify code
Commandsbash, backgroundRun tests, build, install deps, manage background processes
Task managementcreate_task, update_task, get_tree, get_task, doneManage the task tree
Communicationsend_message, yield, clarifyCoordinate with other agents and users
Organizationcreate_folder, delete_folder, rename_folderVisual grouping in the task tree
Cross-projectlist_projects, send_message_to_projectTalk to agents in other projects
Lifecyclefork_task_context, close_task, reset_task, delete_task, reorder_tasksManage agent lifecycle

Plus any external tools connected via MCP (Model Context Protocol) servers.


Task Tree

When you give Matrix a goal, it doesn't just hand it to a single agent. Complex work gets decomposed into a task tree — a hierarchy of tasks that agents work on in parallel. Each worker runs tests independently in its own worktree, so the test feedback loop operates at every level of the tree simultaneously.

Add auth system
├── JWT middleware      (branch A)
├── Login endpoint     (branch B)
└── Auth tests         (branch C)

Here's how it works:

  1. A root orchestrator receives your goal and reads the codebase to plan an approach.
  2. It creates sub-tasks, each assigned to a separate agent on its own git branch.
  3. Workers run in parallel — no waiting for one to finish before starting the next.
  4. When a worker finishes, the orchestrator reviews and merges its branch.
  5. If a worker fails, the orchestrator can retry with new instructions or restructure the approach.

Recursive Decomposition

The task tree is genuinely recursive — any agent can create sub-agents. A worker assigned "build the authentication system" can decide it's too complex and become a sub-orchestrator, spawning its own workers for JWT middleware, login endpoints, and session management. There's no hard limit on nesting depth.

Root
├── Auth
│   ├── JWT
│   ├── Login
│   └── Sessions
├── Payment
│   ├── Stripe
│   └── Webhooks
└── Tests

Task Lifecycle

Tasks flow through these statuses:

draft → pending → in_progress ─┬→ verify → closed
                                │     ↑       ↑
                                └→ failed ─────┘

                                (retry → in_progress)
StatusMeaning
draftIdea captured, not ready for execution. Cannot be started.
pendingReady to execute, waiting to be started.
in_progressAgent is actively working.
verifyAgent called done("passed") — work is complete, awaiting review and merge by the task above. Note: the agent calls done("passed") but the node status becomes verify, not passed.
failedAgent called done("failed") or was interrupted. Can be retried with send_message (preserves context) or reset_task (fresh start).
closedBranch merged, worktree cleaned up via close_task. Node preserved in tree for history. Can be reactivated with send_message.

The done() tool uses a two-phase commit: Phase 1 (agent-side) validates that no children have active sessions and emits the tool call. Phase 2 (daemon-side) updates the node status, broadcasts the tree change, and notifies the parent via deliverMessage. If the daemon crashes between phases, Phase 2 is recovered on restart from the JSONL — the done tool_call is replayed and the status update completes.

Draft Tasks

Draft tasks are how intention gets captured in real time. Any moment — during a coding session, while reviewing output, between meetings — you send a one-line message and a draft is created. No format, no approval, no ceremony.

Drafts can't be executed (their status is draft). They're notes in the task tree — lightweight anchors for ideas that haven't formed yet. When an idea matures, promote it to pending and it becomes a real task. If it doesn't, it stays or gets deleted. Either way, nothing was lost.

Traditional spec-driven workflows require you to define all requirements upfront in a formal document. But requirements emerge during development, not before it. Draft tasks let you capture intention as it surfaces, from usage, from frustration, from "wait, this should work differently." The task itself is the intention artifact. No separate spec to drift.

Incremental Merge

The orchestrator doesn't have to wait for a worker to call done() before using its work. Workers are encouraged to commit early and commit often, sending progress updates via send_message. The orchestrator can merge individual commits from a worker's branch at any time — cherry-picking completed pieces while the worker continues on the rest.

This matters for large tasks. Instead of an all-or-nothing merge at the end, work flows upward continuously. If a worker completes 3 out of 5 sub-features and then fails on the 4th, the orchestrator already has the first 3 merged. It can assign the remaining work to a new agent without losing progress.

Failure Handling

When a worker agent fails:

  1. The worker explains why — not a stack trace, but a description written by an agent that understands the problem.
  2. The orchestrator decides what to do — resume with new instructions, reset for a fresh approach, or restructure the task tree.
  3. Context is preserved — a resumed worker keeps its full conversation history. No cold start.

Folder Nodes

As a task tree grows, visual organization becomes important. Folder nodes provide grouping without adding lifecycle complexity. A folder has only an id, title, parentId, and children — no status, no branch, no session, no cost.

Root
├── 📁 Backend
│   ├── JWT middleware       (task)
│   ├── Login endpoint       (task)
│   └── Session management   (task)
├── 📁 Frontend
│   ├── Auth page            (task)
│   └── Dashboard            (task)
└── Integration tests        (task)

Folders are transparent to ownership. When the system determines which task owns a node (for message routing, permission checks, or budget tracking), it walks up the tree and skips folders — getTaskAbove() returns the nearest task ancestor, not the nearest node. Similarly, getTasksBelow() collects all descendant tasks through any number of nested folders. Tasks inside a folder belong to the nearest task ancestor above the folder, not to the folder itself.

Manage folders with three tools: create_folder, delete_folder (must be empty), and rename_folder.

Discussion Mode

Not every interaction is a command. When a user talks directly to an agent — asking questions, discussing approaches, giving feedback — the agent should yield() to wait for the next message, not call done(). Calling done() during a conversation ends the agent's session; the next message requires a costly restart with context rebuild.

The agent holds two things simultaneously: (1) the original task it owes done() to, and (2) the live conversation with the user. When the discussion settles and the task is complete, then the agent calls done() with a summary covering both the work and the discussion.

User Message Forwarding

When a user sends a message to any agent in the tree, every ancestor up to root receives a user_message_forwarded CC notification. This is automatic — no configuration, no subscriptions.

Root sees all user interactions across the entire tree in real time: corrections, approvals, redirections, questions. This makes the task tree an information router — the user's decisions are visible at every level that matters, without drowning higher-level agents in full context.


Git Worktree Isolation

When multiple agents work in parallel, they need isolation. If two agents edit the same file at the same time in the same directory, they'll overwrite each other's changes.

Each agent gets its own git worktree — a separate directory linked to the same repository, with its own branch:

Repository (shared)
├── .worktrees/
│   ├── task-A-jwt-middleware/     ← Agent 1's worktree (branch A)
│   ├── task-B-login-endpoint/    ← Agent 2's worktree (branch B)
│   └── task-C-auth-tests/        ← Agent 3's worktree (branch C)
└── main working tree/            ← Orchestrator's worktree (main)

Agents read and write freely without conflicting with each other. When done, branches merge back — just like developers using feature branches. No custom sync protocol needed — just git.

The base branch is configurable — it's stored on the root task node at project initialization. Worktrees are created from this base branch, and the system prompt is branch-agnostic (no hardcoded main assumption).


Memory System

.mxd/memory.md is a file in your project that persists institutional knowledge across agent sessions.

Agents write to memory during their work — pitfalls discovered, API quirks, architectural decisions, patterns that worked. When branches merge, memory merges through git. Higher-level agents curate the merged result, floating important knowledge to the top and trimming the noise.

If the test suite defines what the software does, memory explains why — why a particular approach was chosen, why an alternative was rejected, what the pitfalls are. Together, tests and memory form the project's complete institutional knowledge: one executable, one narrative.

Why This Matters

  • Survives across sessions: Stop and restart — agents load memory on startup.
  • Survives compaction: When conversations get compressed, memory stays intact on disk and gets re-read.
  • Grows with the project: Every task adds new discoveries. Over time, memory becomes your project's accumulated wisdom.
  • Natural selection: Sub-agents write freely. Parent agents curate at merge. By the time knowledge reaches the main branch, it's been filtered through multiple levels — the same way institutional knowledge works in human organizations.

Analogy

Memory is the team wiki that agents actually read and update. Not documentation that goes stale — living knowledge maintained by the agents who use it every day.


Context Compaction

Long sessions fill up the context window. When it gets too large, Matrix compresses the conversation into a structured checkpoint. The agent loses the exact wording of early messages but retains the essential knowledge to continue.

Most tools generate a summary: "what happened." Matrix generates a narrative: "what we learned and why."

The checkpoint has 8 sections:

  1. Story So Far — not a timeline of events, but a narrative of how understanding evolved. Why decisions were made, what was tried and rejected, how the approach changed. The journey is the knowledge.
  2. Current Phase — what's happening right now
  3. Completed Work — what's done, with commit references
  4. Tree Mental Model — what the agent thinks about each active task, not just the tree data
  5. Rejected Approaches & Lessons — explicitly preserved so the agent doesn't repeat past mistakes
  6. Key Context — environment, accounts, file locations
  7. Pending Work — what still needs to happen
  8. User Messages Reference — verbatim or close paraphrase of everything the user said

Section 8 is critical. The user's exact words are the highest-signal content in any session. Other tools summarize them away. Matrix preserves them because user decisions are what give agents conviction to follow through instead of hedging.

The re-compaction rule: strengthen the narrative, don't drop it. Once is an incident, twice is a pattern, three times is architecture. Each compaction distills decisions, it doesn't destroy them. The 800K session becomes a 20K checkpoint that preserves the reasoning chain while dropping the mechanical tool calls. The mechanical work is in git. The thinking is in the checkpoint.

This connects directly to the real gap: every task's last checkpoint IS a decision artifact.

After compaction, memory.md is re-read from disk, so any updates the agent made during the session are included in the fresh context.


Cross-Project Communication

This is the feature that sets Matrix apart. Most AI coding tools are project-scoped — your API server and your frontend are separate universes. Matrix connects them.

One Daemon, All Projects

A single Matrix daemon (port 7433) manages every project on your machine. Register projects with mxd init, and the daemon tracks them all.

bash
mxd init /path/to/api-server
mxd init /path/to/web-frontend
mxd init /path/to/shared-library

# All three projects, one daemon, one UI
# Target any project from anywhere with -p
mxd send -p api-server "Add rate limiting to all endpoints"

Every project has its own task tree, agents, and memory. But they all live under the same roof.

Talking Across Projects

An agent in one project can message an agent in another:

send_message_to_project(projectId, "What's the API endpoint format for user authentication?")

The message arrives in the other project's orchestrator queue. If no agent is running, one is automatically launched to respond. This isn't a request-response API — it's a conversation. The receiving agent can ask follow-ups, consult its own memory, or relay the question to its sub-tasks.

What This Enables

  • Coordinated releases: An agent in one project triggers dependent updates across downstream projects.
  • Knowledge sharing: Instead of maintaining API docs that go stale, the API project's agent IS the documentation. Other projects ask it directly.
  • Multi-repo refactoring: Rename a concept across your entire codebase — not just one repository, but every repository that touches it.
  • Cross-project awareness: When your shared library deprecates a function, downstream projects' agents start adapting their code in parallel.

The Dashboard

The web UI at localhost:7433 uses a tab-based layout (VSCode-style preview/pin tabs). Click a task in the sidebar to open a preview tab; double-click or start editing to pin it. Each tab shows one task's full story — activity log, description, or tool output. A view mode switcher toggles between activity (live streaming conversation) and description (the task's goal and context). The sidebar is collapsible and resizable, showing the task tree hierarchy with status colors.

Switch between projects seamlessly. Watch agents across your entire codebase working simultaneously — the API server's agents writing endpoints while the frontend's agents build components that consume them.


Fork Context

When a new agent spawns for a sub-task, it normally starts cold — no knowledge of what the parent agent discovered. Context forking solves this.

fork_task_context copies the parent agent's full conversation history into the child's session. The child starts with everything the parent knows: files read, patterns discovered, decisions made. It has its own identity and task, but it doesn't waste time re-reading files the parent already explored.

This is especially powerful combined with compaction — a parent that has worked on a long task can fork its accumulated knowledge to a new child, even after multiple compaction cycles.


What It Looks Like in Practice

Simple Task

bash
mxd send "Fix the bug in user authentication"
# Agent works autonomously — analyzes code, makes changes, runs tests, commits

Complex Multi-Task

bash
mxd send "Refactor the payment module to use Stripe instead of PayPal"
# The orchestrator will:
# 1. Analyze the codebase
# 2. Create sub-tasks (remove PayPal, add Stripe types, implement API, update tests)
# 3. Spawn worker agents in parallel on separate git worktrees
# 4. Merge results and run the full test suite

Interactive Collaboration

bash
mxd send "Build the new dashboard page"
# ... agent starts working ...
mxd send "Use Recharts instead of Chart.js, and make all charts responsive"
# Agent receives the message and adjusts its approach

Cross-Project Coordination

bash
# In the API library project — make breaking changes
mxd send "Release v3: rename getUserById to getUser, change to camelCase"

# In each frontend project — migrate
mxd send -p web-frontend "Migrate from api-lib v2 to v3 — ask the api-lib project for details"
# Frontend agents use send_message_to_project to get migration details
# from the agent that actually made the changes

For how these features are built internally, see Architecture.For why Matrix takes this approach, see Why Matrix.Ready to set up? See Getting Started.

Released under the MIT License.