We were talking to Opus 4.7 twelve days ago
When we said "claude-opus-4-6", we were sometimes talking to Opus 4.7. We can prove it bit-for-bit.
12 days ago we wrote about an unexplained +29% token inflation on our traffic. Between two consecutive calls in the same session, input tokens jumped from 220,712 to 284,800 with no content change. We ended the post with three open hypotheses: hidden prompt injection, tokenizer change, or account-level A/B flag. We couldn't decide which.
Today, 2026-04-16, Anthropic released Claude Opus 4.7. Its migration guide specifies:
Claude Opus 4.7 uses a new tokenizer... This new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models.
The +29% we observed on April 4 is right in the middle of that range. But correlation isn't proof. We have Matrix's JSONL, where every request body we ever sent is recorded byte-for-byte, event-by-event, so we could do better than correlation. We could reproduce the exact request from that moment and tokenize it today.
The experiment
We cut our JSONL at the two specific moments from the April 4 post: right before the 220K call (reqA) and right before the 284K call (reqB). We checked out the Matrix commit that was running our daemon at the time of those calls, and fed each cut JSONL through that commit's serialization code to rebuild the exact request body that was sent. Then we called today's count_tokens API on each body, once with model: "claude-opus-4-6" and once with "claude-opus-4-7".
| Historical reported | Today, opus-4-6 | Today, opus-4-7 | |
|---|---|---|---|
| reqA (20:54:03 PT) | 220,712 | 220,712 ← bit-exact | 284,471 |
| reqB (21:03:32 PT) | 284,800 | 220,967 | 284,800 ← bit-exact |
The historical numbers reproduce to the token. reqA's 220,712 matches what opus-4-6 returns today. reqB's 284,800 matches what opus-4-7 returns today.
Nine minutes apart, same session, essentially identical bodies: one was counted by the 4.6 tokenizer, the other by the 4.7 tokenizer. Twelve days before Opus 4.7 was publicly announced.
We strongly suspect that at 21:03:32 PT on April 4, our request, still declared as "model": "claude-opus-4-6" in the header and still echoed back as "model": "claude-opus-4-6" in the response, was actually being served by Opus 4.7.
It wasn't continuous, and it isn't something you can notice by looking
We don't believe this was a total hot-swap of all opus-4-6 traffic. Our own data shows that most of our calls in the following days still behaved like 4.6. The inflation and the cache misses we observed were intermittent: some sessions, some calls, apparently some accounts. The two-account experiment in our April 4 post showed one OAuth account receiving +67K phantom tokens on an identical body while the other showed zero. Same body, same tokenizer input, different backend routing.
This is what makes the issue invisible to regular users. Unless you are counting your own request bytes and comparing to the API's reported usage, there is no signal. response.model continues to report the model you asked for. Billing (as far as we can see) reports normally. The only externally-observable fingerprint is that identical text gets tokenized to different counts at different moments, and unless you are keeping a JSONL and actively checking, that fingerprint washes out in day-to-day usage.
We're not making claims about which accounts were affected, how routing decisions were made, or how the billing was handled. We don't have the data for any of that. What we have is two specific calls, nine minutes apart, whose input token counts are each a bit-exact match to two different model tokenizers, and a dozen days between those two moments and the public Opus 4.7 release.
About the replay
Everything that makes this experiment possible is the same property that makes Matrix's cache engineering work: our JSONL preserves the exact byte sequence of every request. There is no information loss, no summarization, no lossy serialization. Checkout the commit that was running when the request was made, feed the same JSONL through it, and you get the same bytes out.
In our case, there was one small complication: between April 4 and today, we changed the format of one event type in the JSONL (compacted-session checkpoints were stored one way, then rewritten to a second format in a later migration). The April 4 walker didn't recognize the new format on read. A 10-line patch to make it recognize both formats restored the original byte sequence. After that, the captured request body was bit-exact identical to what was sent on April 4, and count_tokens confirmed it.
What you can do
If you run long-lived API traffic and want to check whether your token counts are consistent over time: send a stable prefix to /v1/messages/count_tokens, record the result, and repeat at intervals. If the count changes on an identical body, something on the server side has changed. That's the signal we used here, just applied after the fact rather than continuously.
The claim
We declared claude-opus-4-6 in our request. On April 4 at 21:03:32 PT, our request body was counted by what appears to be the Opus 4.7 tokenizer: a model that wasn't publicly announced for another twelve days. response.model reported "opus-4-6" throughout. The only reason we noticed is that we had kept a faithful record of our request bytes and could check them against today's count_tokens endpoint.
We suspect we were being served by Opus 4.7 intermittently for those twelve days. We can't say how often, or on which specific calls, beyond this one. That is as much as we can say with confidence.