Skip to content

Why Matrix

One developer. Team-quality code. Projects you're proud to scale.


The Lever

Every AI model, at every capability level, can be amplified by the right leverage.

Today's models have specific weaknesses. Tomorrow's models will be stronger, but they'll be aimed at bigger problems, and those problems will have their own gaps. The cycle doesn't end. Smarter models don't need less scaffolding; they need different scaffolding for bigger ambitions.

The industry calls these systems "harnesses" or "scaffolding," and acknowledges they have an expiration date. Model companies say this explicitly: with each new release, re-examine the scaffolding, remove what's no longer needed. Every new capability they ship obsoletes a batch of third-party projects.

So every line of Matrix code might become unnecessary someday. We accept this. But here's the bet:

The ability to design leverage doesn't expire.

If next-generation AI can build a complete project from a simple description, we don't retire. We hand AI even larger challenges. Running a company, leading research, exploring possibilities we haven't imagined. Those will need new leverage.

This is the universal principle. Now let's zoom to the present.


The Problem Everyone Feels

Most developers have started feeling it. The code AI writes works, but maintaining it hurts. Sometimes more than writing it yourself would have. Refactoring collapses. Copy-paste multiplies. Code written last week gets rewritten this week. Projects that started fast hit a wall when every change touches everything.

The community feels it too. OpenClaw's AI-heavy review produced bloated, repetitive code. Claude Code ships cache bugs that burn through entire session quotas in 90 minutes. Cursor agents work in parallel but can't share what they learn. We clearly lack better scaffolding for AI code quality.

Our goal: let one person build a well-architected, well-tested, extensible project at the speed of a team.

Not a prototype that merely runs. Not a demo that falls apart when you add a feature. A real project you'd be proud to maintain and grow.

The Real Gap

Every AI coding tool today shares one assumption: the only thing that matters is the final code. Issue in, PR out, context gone. The agent finishes, the session ends, the knowledge evaporates. Next task starts from zero.

It's not that these tools have no memory. CLAUDE.md exists. Memory files exist. But look at what they capture: "we use PostgreSQL," "API is REST," "tests use Jest." They compress every decision into a snapshot of what the project is. The process that led there is gone. Why PostgreSQL and not SQLite? Was GraphQL tried? Why was it abandoned? Nobody knows. The memory tells you where you are, not how you got here.

And it's not that these tools can't resume sessions. Most tools have session files sit on disk. But they're just timestamped files collecting dust. You'd have to remember "which session discussed the API design? Last Tuesday's? Wednesday's?" and manually resume it.

This is the real loss. Not the facts, but the reasoning. Not the sessions, but the ability to find and reuse them. A new team member who reads "we use PostgreSQL" makes different decisions than one who reads "we tried SQLite, it broke under concurrent writes during the load test in task #47, and the user decided PostgreSQL was worth the ops overhead." The first has a fact. The second has judgment.

There are tools with agent trees, worktrees, inter-agent messaging. Impressive features. But the design philosophy is the same: produce the code, deliver the result, compress what you learned into a few lines of memory, move on.

Most people recognized the first layer of this problem: AI hallucinates, so ground it in reality. Tests, CI, type checking.

Reframe

All hallucination is the norm. Producing output that matches reality is the exception. This isn't a bug to fix. It's the fundamental nature of a system that learned language without ever touching the world.

Fewer people recognized the second layer: AI doesn't know what's right, only what's asked. Some tried to address it with decision records, structured reasoning frameworks, spec-driven development. The instinct is correct.

Reframe

Code is not the asset. The decisions that shaped it are. Code can be rewritten from good decisions. Decisions cannot be reconstructed from code. Every tool that protects the code but discards the reasoning is guarding the shadow and losing the substance.

We think we found the right way to preserve decisions. Not in a spec document that drifts. Not in markdown files that collect dust. In the task itself.

What Matrix Is

You send messages to tasks, not to agents. This is the fundamental difference.

In every other tool, you talk to an agent. The agent runs, finishes, and the conversation is gone. You can resume a session if you remember which one it was. But the agent is the thing you interact with, and agents are ephemeral.

In Matrix, you talk to a task. The task is permanent. It has a name, a position in the tree, a description. Agents come and go inside it, sessions start and end, but the task and every decision made within it persist forever. Send a message to a task that's been closed for weeks, and a new agent wakes up inside it with the full history of everything that happened before.

This requires building the entire agent loop from scratch.

Task tree. Tasks form a recursive tree. Any task can create sub-tasks. Agents communicate through the tree: sub-tasks report to tasks above, tasks above merge their sub task's work. When you talk to any agent, the entire parent chain gets a one-line notification. One keystroke, multiplied across every active agent in the tree. The tree is the project's org chart, information router, and decision archive all at once.

This matters because of how information flows. Your messages to agents are usually short, a sentence or two, but they carry the most important signal: corrections, approvals, direction changes. These are the decisions. The tree forwards only your words upward, not the 500K context around them. Each parent already knows what it assigned to its children. Root receives a stream of your decisions from across 10+ active agents without entering any of their sessions. Decisions flow upward. Context stays local. One inbox, the full picture.

From-zero agent loop. Matrix doesn't wrap Claude Code or Codex. It talks directly to provider APIs, runs its own tool execution, manages its own session format. You can't make tasks first class, you can't freeze sessions for cache, you can't fork accumulated context, you can't switch providers, you can't do full lifecycle management if you're wrapping someone else's CLI. The wrapper tools in this space can orchestrate agents in parallel, but they can't change what happens inside each agent's session. Matrix can, because it owns the entire loop.

Cache as infrastructure. Tool definitions are frozen once at session start. Resume replays the same bytes. Fork transfers the frozen config. Three agents forked from the same parent share one cache prefix. Three simultaneous 800K context windows cost less than one uncached session elsewhere. The 1M context window is only affordable with near-perfect cache, and near-perfect cache requires owning the agent loop.

First-class fork. Inspired by Unix fork(). Fork copies one agent's full conversation history into a completely different task, potentially in a completely different part of the tree. A task that spent hours exploring a database design? Fork it into the migration task. The new agent knows why PostgreSQL was chosen, what was tried, what failed. It starts at depth 5 instead of depth 0.

You might expect identity confusion when an agent wakes up in a completely different role with someone else's memories. You'd be right. Our first fork marker told the agent "YOU ARE NOT THE AGENT ABOVE" in all caps. They resisted, clung to their pre-fork identity, refused to accept the new assignment. We changed the framing to "you've been reassigned to a new role" and they immediately understood. We even redesigned the system prompt so every agent in Matrix receives the exact same one. The first line: your role depends on your position in the tree. Fork doesn't create a new agent. It reassigns an existing one.

The Cache Moat

Everyone says the 1M context window is expensive. Most people compact aggressively to stay small, run one session at a time, anxious about cost. With the right cache engineering, pre-filled 800K sessions are still affordable. Fill it with knowledge, freeze it, fork it. The bigger the shared prefix, the more you save.

How? Matrix stores sessions in a provider-agnostic event format, not raw message arrays. Each provider adapter reconstructs the exact same API request every time. This makes cache engineering possible at the infrastructure level.

  • Frozen tools and system prompt. Tool definitions and the system prompt are frozen once at session start and persisted in the event stream. Resume replays the same bytes, same prefix, cache hit. There's no regeneration, no revalidation, no opportunity for drift.
  • Byte-identical reconstruction. Despite using our own session format, the output for any given provider is deterministic. Restart the process and the cache survives. The content sent live during a session and the content reconstructed from that session on resume produce the exact same prefix, byte for byte.
  • Fork prefix sharing. When you fork an agent's context to a new task, the frozen session config transfers: same tool definitions, same system prompt, same prefix bytes. Three agents forked from the same parent share one cache prefix. You pay for creation once; all three read from it.
  • 1-hour TTL. Root orchestrators set a 1-hour cache TTL. This costs 2x the write price but eliminates keepalive heartbeats and survives long operations. Forked sessions inherit the parent's TTL. You can configure all children to use 1h as well. It sounds expensive, but every one of these sessions has accumulated real knowledge worth preserving.

One tradeoff we accept: frozen tools mean a restarted session won't see newly added MCP servers until the next compaction. The provider's cache architecture puts tools at the very front of the prefix. Any change there invalidates everything after it. We chose stability over freshness. But this is a workaround, not a fundamental limitation. Models can call tools that aren't in the tools parameter at all. Our hidden introspection tool proves this: agents call it from system prompt knowledge alone. A future path: send full tools only on the first call to create the cache, then inject only tool diffs in messages for subsequent calls. Tool changes would no longer invalidate the entire prefix.

Real Numbers from a Real Session

503M total input tokens processed. 21 tasks completed across 13 agent sessions. 3 sessions grew to 800K tokens, sharing cache prefixes forked from a shared 650K root. 8 more children created from scratch for scoped work. 1,000+ API calls. 55 daemon restarts, including restarts where the agents' own code and system prompt had been modified by merged commits. The frozen session config preserved the original tools and prompt, byte-identical reconstruction held, zero cache miss. 3 hours of work, 30% of a Max20 5-hour quota.

This isn't a feature checkbox. It's a fundamental architectural advantage that compounds with every session, every fork, every restart. The 1M context window is only affordable with near-perfect cache, and near-perfect cache requires building the entire agent loop around it.

Cache is a knife-edge dance

Near-perfect cache hit is not automatic. It's maintained through operational discipline. A single miss at 800K means full cache creation. The budget anxiety doesn't disappear, it shifts from "can I afford this session" to "is my cache still warm."

TTL matters. Matrix lets you configure cache TTL: by default, 1-hour for root orchestrators (long-lived, high value), 5-minute for child tasks (short-lived, cheap). 1h costs 2x write price but survives long operations.

Fork cache is first-call only. When you fork a 650K session into two sessions, both initially share the 650K prefix cache. After they diverge, each session's cache is independent. If one goes idle past TTL while the other stays active, the active one does NOT keep the idle one's cache warm. Cache is per-shared-prefix only when you keep both sessions within TTL.

Don't idle. A session that goes idle beyond its TTL loses its entire cache. Coming back means full cache creation. At 900K tokens that's a significant cost.

Our best session achieved near-perfect cache hit with continuous activity, no account switching, and fork timing aligned with session lifecycle. The ceiling is high, but reaching it requires understanding how prefix caching actually works.


Self-Bootstrapping

Matrix develops itself using itself.

The system prompt, tool definitions, compaction logic, memory system, all refined by agents running on the system they were refining. Bug fixes go through the same orchestrate → decompose → parallel execute → merge flow that any user project would.

Case Study: The Persistent Tasks Experiment

Matrix agents were given a bold architectural bet: "persistent tasks," tasks that survive across daemon restarts, run periodically, with dual-source sync, special done() semantics, and a third role in the system prompt. The design was clean on paper. We deployed it on our own project.

Within 27 hours, the results were clear:

  • Knowledge fragmentation. Persistent tasks accumulated stale context that couldn't be compacted cleanly. Agents made decisions based on outdated state.
  • Cache hostility. The third system prompt role broke prefix caching. Every persistent task restart paid full prompt cost instead of hitting the cache.
  • Corporate disease. Routing layers emerged to handle the persistent/non-persistent distinction, coordination overhead that added complexity without value.
  • Conditional pollution. Every codepath needed if (persistent) branches. The feature was metastasizing through the codebase.

The decision: delete entirely. -1,940 lines across 44 files. The result? Zero test failures. Every use case persistent tasks were designed for (periodic work, long-lived coordination, resumable projects) was already covered by existing features: regular tasks, fork context, and the wake-on-message pattern.

The lesson isn't "we made a mistake." The lesson is that the cost of trying was low, the cost of reverting was near zero, and we knew within 27 hours. That's what maintainability looks like. Not avoiding wrong turns, but making wrong turns cheap. Explore boldly, catch problems fast, and trust the test suite to tell you what still works.

That task node is still in the tree today. You can see "Delete persistent task feature entirely" in the screenshot on our homepage. Any new agent can fork its context and inherit the full story of why persistent tasks died. That's what "closed tasks are wealth" means in practice.


Who Matrix Is For

Matrix is for individual developers who want to scale up without scaling down on quality. Not just "generate a PR from this issue." A real engineering workflow that produces well-architected, well-tested, maintainable software.

You are the right user if:

  • You want to move fast AND have architectural standards, not one or the other.
  • Your tasks require decomposition, parallel work, and coordination.
  • You work across multiple interconnected projects and want them to communicate.
  • You want institutional memory across sessions, agents that remember what they learned.
  • You want to drive the process, making decisions across a tree of parallel work, not fire-and-forget.

You might prefer a simpler tool if:

  • Your tasks are well-defined and repetitive, a flat orchestrator might be more efficient.
  • You don't need multi-project coordination.
  • You prefer a managed service. Matrix runs locally.

Current Status

Matrix is functional and in daily use.

  • Comprehensive test suite (unit + integration), all passing
  • Supports Anthropic (Claude), OpenAI Responses API, and OpenAI-compatible providers
  • Self-bootstrapping: the system develops itself using itself daily

What's still in progress:

  • Security sandbox. Agents currently have full system access. Must be solved for hosted deployment.
  • Cost controls. Basic per-task budgets exist, but aggregate tree budgets and loop detection are not yet implemented.
  • Public release. Installation is currently from source.

Ready to try it? See Getting Started.Want to understand the core features? See Core Concepts.Curious about the internals? See Architecture.

Released under the MIT License.