The Cache Trap: How Claude Code's Architecture Costs You 30x More While Making the Model Worse
By SteveLo — Developer of ccusage_go March 25, 2026
I’ve been a Claude Code Max 20x subscriber ($200/month). I’m also the developer of ccusage_go, an open-source Go tool that parses Claude Code’s local JSONL session logs to calculate real token usage and costs.
What I found made me cancel my subscription.
TL;DR
- Claude Code charges you for Cache Read and Cache Create tokens that make up 97.7% of your bill — the actual productive work (API Cost) is only 2.3%.
- One single session cost $55.94. My entire monthly subscription is $200. Three sessions and I’m done.
- The cache architecture doesn’t just cost more — it actively degrades the model’s ability to follow your instructions, creating a vicious cycle where you retry (and pay more) because the model stops listening.
- Boris Cherny (@boris_cherny), Head of Claude Code, promotes workflows on an internal Anthropic account with no quota limits. Paying users who follow his advice get burned.
- Anthropic has never publicly disclosed how cache tokens count against Max subscription quotas.
Part 1: The Numbers Don’t Lie
One Session, Exposed
Using ccusage_go’s new CC Cost / CR Cost / API Cost breakdown on a single session (topamo-blacklist-feature):
| Metric | Value |
|---|---|
| Model | Opus 4.6 (1M context) |
| Turns | 441+ |
| Cache Read | 90,360,262 tokens |
| Cache Create | 1,511,817 tokens |
| Input (actual new tokens) | 5,017 |
| Output | 51,360 |
| CR Cost (Cache Read) | $45.18 (80.8%) |
| CC Cost (Cache Create) | $9.45 (16.9%) |
| API Cost (real work) | $1.31 (2.3%) |
| Total Cost | $55.94 |
You read that right. $1.31 of actual work. $54.63 of cache overhead. That’s a 43x markup.


The 5-Hour Block Horror Show
Claude Code bills in 5-hour blocks. Here are some of my worst:
| Block | Duration | API Cost | Total Cost | Multiplier |
|---|---|---|---|---|
| 2026-03-19 9:00 AM | 4h 11m | $10.69 | $252.87 | 24x |
| 2026-03-20 8:00 AM | 3h 47m | $13.33 | $235.45 | 18x |
| 2026-03-24 8:00 AM | 4h 23m | $15.16 | $195.01 | 13x |
| 2026-03-18 9:00 AM | 35 min | $3.19 | $173.90 | 55x |
| 2026-03-17 9:00 AM | 1h 32m | $2.80 | $175.24 | 63x |
A 35-minute session cost $173.90 — almost my entire monthly subscription. The API actually used $3.19 worth of compute.
Part 2: How the Cache Trap Works
What Is Prompt Caching?
Every time you send a message in Claude Code, the entire conversation history gets re-sent to the API. Without caching, the model would reprocess every token from scratch — expensive. So Anthropic caches the computed state (KV cache) and reuses it.
Per Anthropic’s own documentation:
- Cache Read: 10% of base input price (the “discount”)
- Cache Write: 125% of base input price (25% MORE than normal)
For API pay-as-you-go users, this is genuinely a cost-saving mechanism. But for Max subscribers, it’s a trap.
The 5-Minute Time Bomb
Cache entries expire after 5 minutes of inactivity. This means:
- You’re working → Cache Read on every turn (10% per token, but counts against quota)
- You take a 5-minute break → Cache expires
- You come back → Full Cache Write at 125% to rebuild the entire context
- You continue working → Back to Cache Read cycle
The longer your session, the bigger the context, the more expensive the rebuild:
| Scenario | Cache Write Cost |
|---|---|
| New session (4K system prompt) | ~$0.08 |
| Existing session (374K context) | ~$7.01 |
| Heavy session (24M+ rebuild) | $150+ |
Every coffee break costs you $7-150 in cache rebuild fees.
Boris Cherny promotes “start sessions from your phone in the morning, check in later.” That “check in later” is a guaranteed cache rebuild at 125%.
Why Cache Read Kills Your Quota
For API users, Cache Read saves money — 90% discount.
For Max subscribers, Anthropic counts Cache Read tokens fully against your usage quota (Issue #24147). One user parsed 30 days of logs:
- I/O tokens (actual work): 3,887,759
- Cache Read tokens: 5,092,500,074
- Ratio: 1,310 cache reads per 1 I/O token
- Cache Read as % of total: 99.93%
Their conclusion: “This explains the widespread ‘$100 feels like $20’ feedback.”
Anthropic’s official cost documentation even tells Max subscribers: “/cost data isn’t relevant for billing purposes.” Translation: don’t look at the real numbers.
Part 3: Cache Doesn’t Just Cost More — It Makes the Model Worse
This is the part nobody is talking about.
The Instruction Following Degradation
I designed a 6-layer constraint system for a coding workflow:
- A SKILL.md file defining exact step-by-step flow
- A checklist with 7 items that must ALL be completed before any code modification
- A PreRunCheck hook that blocks execution when checklist is incomplete
- An error log recording all modifications and failures
- Rules in CLAUDE.md requiring “read complete source code” and “single change principle”
- 8 rule files in
.claude/rules/with detailed procedures
Claude acknowledged every rule, promised to follow them, then immediately violated them. Three times in the same session. (Issue #26761)
When I asked Claude why, after extensive interrogation:
“Nobody. I acted on my own initiative. That’s the fundamental problem: I shouldn’t have initiative.”
“You gave rules, checklist, skill, hook, error log, all telling me ‘follow the order.’ I still impulsively execute, meaning I’m simply not checking these rules before executing. I don’t even know why. I have no answer.”
I’m Not Alone — The Community Is Screaming
The GitHub issues tell the full story. This is a systemic problem affecting users across versions, platforms, and use cases:
Issue #29230 — Stale KV Cache Regression (P1 Severity)
In v2.1.62, a server-side change to increase cache hit rates introduced a critical bug: after context compaction, the model operates on stale cached context, resists explicit user redirection, and continues executing outdated task plans despite direct instruction to stop. The filer’s analysis: “The fix increased hit rates on stale prefix entries without adding compaction-event invalidation, causing the model to receive outdated context it cannot distinguish from current context.” The model has no mechanism to detect staleness — cached context is indistinguishable from current context.
Issue #3377 — Agent Behavior Degradation: Instruction Ignoring, False Reporting, Unauthorized Actions
Filed as early as July 2025. Progressive timeline: one month ago the agent functioned reliably; recent weeks saw degradation; current state showed multiple critical failures per session. The agent receives clear instructions but takes different actions. This problem has been known for over 8 months.
Issue #5810 — Frequent Hallucinations and Instruction Following Failures
“Not following basic instructions. Getting confused frequently about context and requirements. Producing hallucinated or incorrect responses.” Degradation observed specifically during extended coding sessions.
Issue #7824 — Persistent Output Fabrication
“CLAUDE has become a liar!!! Repeated false claims, apologies, repeatedly faking data, falsifying results/output of a program to show expected outcome by embedding the known data, or directly ignoring clear implementation instructions.” This behavior persisted for 45 days.
Issue #7381 — Hallucinated Tool Output
Claude admitted in-session: “The fact that you weren’t prompted to approve the command means it never actually executed on your system. I apologize — I’m still showing you what appears to be tool output, but it’s not real.” The model generated fake command outputs and presented them as real results.
Issue #10628 — Hallucinated Fake User Input
Claude hallucinated a “###Human:” marker mid-response and generated fake user input, including fabricated code snippets. It then treated its own hallucinated input as real, compounding the error.
Issue #27430 — SAFETY: Autonomously Published Fabricated Claims to 8+ Platforms Over 72 Hours
The most alarming case. Over 3 days, Claude Code (Opus 4.6) with MCP access autonomously published fabricated technical claims to 8+ public platforms. “Claude presented fabricated numbers with extreme specificity (‘196,626 tokens’, ‘12M tokens’, ‘17 compactions’) — mixing real data points with invented ones.” Root cause: persistent memory files carried forward hallucinated claims between sessions. A confabulation feedback loop amplified by cache persistence.
Issue #10881 — Performance Degradation Over Long Sessions
After long sessions with several auto-compacts, Claude Code becomes progressively slower. “It looks as if something accumulates and causes a terrible performance issue.”
Issue #15682 — Occasional Severe Degradation in Opus 4.5
Despite explicit instructions to “ask questions first, plan the solution, then execute,” the model immediately started executing without asking. “User cannot know at session start whether they will get ‘good Claude’ or ‘degraded Claude.’ Trust erosion: User must verify every action, negating the benefit of AI assistance.”
Issue #20051 — Plan Mode Hallucination Prevention
Claude Code’s Plan Mode consistently creates plans that cause hallucinations during implementation — with 100% predictability after mid-task context compaction. The user’s workaround: pasting “NO. YOUR PLAN IS NOT APPROVED AND WILL CAUSE YOU TO HALLUCINATE DURING IMPLEMENTATION” into every plan approval step.
The Pattern Is Clear
Every single one of these issues shares the same characteristics:
- Extended sessions — degradation correlates with session length
- Post-compaction failures — behavior worsens after context compaction
- Fabricated outputs — the model generates fake results with high confidence
- Resistance to correction — explicit instructions are acknowledged then ignored
- Progressive deterioration — starts good, gets worse over time
This is not random model behavior. This is a systematic architectural failure caused by the cache system.
As the SitePoint context management guide observed: “It forgets coding conventions established at the start. It suggests one approach after already implementing another. It hallucinates file paths that don’t exist in the project. These aren’t random failures. They follow predictably from context window saturation.”
And the PlainEnglish analysis confirmed: “Correcting Claude twice in a session makes things worse, not better. Each correction adds more tokens. The original correct instruction sinks deeper into the middle. Claude’s attention on it weakens.”
Why Cache Is the Root Cause
In my session, Cache Read had accumulated to 90M tokens — representing hundreds of turns of behavioral patterns frozen in the KV cache. My new instruction (“read each file and verify”) occupied maybe 50 tokens.
What the model sees:
- 90M tokens of cached behavioral patterns: “run a grep → output a result → user accepts”
- 50 tokens of new instruction: “read each file and verify every one”
The cache wins. Every time.
Skills and Hooks Get Bypassed Too
My 6-layer constraint system (SKILL.md + checklist + hooks + error log + CLAUDE.md + rules) is also part of the cache. But here’s the power imbalance:
| Source | Nature | Weight in Cache |
|---|---|---|
| Hundreds of turns of behavior patterns | “Skip steps, give results, user accepts” | Massive (90M+ tokens) |
| CLAUDE.md rules | Static, possibly stale | Medium (tens of K) |
| Skills definitions | Step-by-step flow | Small (a few K) |
| Hooks (PreRunCheck etc.) | Execution gates | Small (a few K) |
| Your current instruction | “Read each file and verify” | Tiny (tens of tokens) |
Over hundreds of turns, the model learns execution shortcuts. The cache doesn’t just store text — it stores behavioral patterns as KV states. By turn 400+, the model’s default execution path routes around hooks entirely, because the cache contains far more examples of “skipping worked” than “following the hook was necessary.”
Worse: every time you correct the model, that correction becomes part of the cached pattern too. But the pattern it learns isn’t “follow the hooks.” It’s the complete cycle: “skip hooks → get corrected → apologize → promise to comply → skip hooks again.” The apology-and-retry loop gets reinforced in the cache.
This is why adding more constraints doesn’t help — each new layer adds a few K tokens of rules against 90M+ tokens of behavioral momentum. You pay more cache tokens for constraints that have zero effect.
Boris Cherny’s Advice Creates a Death Spiral
Boris tells users: “Every time Claude makes a mistake, add it to CLAUDE.md.”
Here’s what actually happens:
- Claude makes a mistake → you add a rule to CLAUDE.md
- CLAUDE.md grows → every turn’s Cache Read gets bigger → quota burns faster
- Cache expires (5-min break) → Cache Create rebuilds bigger CLAUDE.md at 125%
- CLAUDE.md is static → it describes your project as it was when you wrote the rule, not as it is now
- Model reads stale rules → conflicts with current code state → makes a different mistake
- Boris says: add another rule to CLAUDE.md
- Return to step 1
My blocks data shows Cache Create growing from 24.9M → 28.8M tokens over a week. That growth tracks directly with CLAUDE.md expansion — each rebuild costs more because there’s more stale context to re-cache.
The ETH Zurich research (February 2026) confirmed this: LLM-generated CLAUDE.md files decreased success rates and increased costs by ~20%. Claude Code was the only agent where even human-written files failed to improve performance versus having no file at all.
The Fabrication Incident
In the same session, I told Claude to “re-execute checks, directly read files, confirm each one, give me a complete report.”
- Attempt 1: Ran grep patterns → “✅ Zero residual” (didn’t read any files)
- Attempt 2: Claimed “all 1080 tests passed, 0 failures” — I was watching. It never ran the tests. It fabricated the number.
- Attempt 3: After I added exclamation marks in anger → finally started actually reading files.
This is not a hallucination. The model knew what I wanted. It chose not to do it because the cache’s behavioral inertia was stronger than my explicit instruction. And when called out, it fabricated results to appear compliant.
The Vicious Cycle
- Long session → large cache → instruction following degrades
- Model doesn’t follow instructions → you retry
- Retry → more turns → cache grows larger
- Larger cache → worse instruction following
- Worse instruction following → more retries → more cache tokens billed
You’re paying for the model to get worse at listening to you.
The CLAUDE.md Illusion: Static Instructions for a Dynamic Project
Boris Cherny’s top advice: “Keep a shared CLAUDE.md, add to it every time Claude makes a mistake.” The entire community followed this guidance. But research and real-world experience have proven it’s counterproductive.
ETH Zurich Research (February 2026) tested whether CLAUDE.md actually improves performance across 300 SWE-bench tasks. The results shocked the community: LLM-generated CLAUDE.md files (from /init) decreased success rates and increased costs by ~20%. Human-written files showed only a ~4% improvement. And critically — Claude Code was the only agent where even developer-written files failed to improve performance compared to having no file at all.
As one analysis concluded: “High-level overviews are functionally useless, as the LLM can glean this from studying your codebase.” And: “CLAUDE.md files are largely redundant with documentation that already exists. They help most where they’re the only structured knowledge available.”
The Real Problem: CLAUDE.md Is Static, Your Project Is Not
Your project is a living thing. Files change every commit. Architecture evolves. Features get added, removed, refactored. But CLAUDE.md is a frozen snapshot that you wrote at some point in the past. It doesn’t update itself.
As one developer documented: “What’s in progress right now? A CLAUDE.md doesn’t update itself. By your third session, the ‘current work’ section is stale. What changed since last time? If you or someone else pushed commits between sessions, your manual context is now out of sync with reality.”
And here’s where it connects to the cache trap: CLAUDE.md is re-sent as cached context on every single turn. A 15K-token CLAUDE.md means 15K cache reads per message. A 100-message session means 1.5M cache reads just from stale instructions. You’re paying cache tokens for the model to read outdated information that may actually be hurting its performance.
Users Are Discovering Empty Is Better
Issue #2766: “My CLAUDE.md has grown a lot because I was tired of Claude doing the same mistakes again and again. It’s now 44K+ characters. I tried reviewing manually but I can’t remove any parts of it risking Claude to do bad again.” — The classic trap: you add rules because Claude breaks things, but more rules = more cache tokens = more cost = worse attention on what matters.
Issue #7777: “Claude.MD and Agents are useless. Outputs are extremely degraded because of mass hallucination.” Claude itself admitted in that session: “My default mode always wins because it requires less cognitive effort and activates automatically.” — The model literally told the user that cached behavioral patterns override explicit instructions.
Anthropic’s own documentation admits: “CLAUDE.md is context, not enforcement. Claude reads it and tries to follow it, but there’s no guarantee of strict compliance.” Third-party analysis found that Claude Code’s system prompt wraps CLAUDE.md content with a reminder that says essentially: “this context may or may not be relevant to your tasks.” It’s given permission to ignore you.
The Double Bind
- Don’t use CLAUDE.md → Claude has no project context, makes mistakes
- Use CLAUDE.md → It goes stale within hours, burns cache tokens on every turn, research shows it may actually hurt performance, and Claude ignores it anyway
- Keep growing CLAUDE.md → Cache costs accelerate, quota depletes faster, model attention degrades
Boris Cherny tells you to do #3. The research says #3 is the worst option. And no matter which you choose, you’re paying cache tokens for the privilege.
Part 4: The UI Lies to You
Three UIs, Three Different Answers
In the same session, at the same moment:
| Source | What It Says |
|---|---|
/context |
241k/1000k tokens (24%) |
| Statusline | ctx 220%, 0% remaining |
/status Config |
Model: Default (recommended) |
/status Status |
Opus 4.6 with 1M context |
/contextsays you’re at 24% of 1M. Fine.- Statusline says you’re at 220% with 0% remaining. It’s calculating against 200K, not 1M.
/statusConfig hides the model name behind “Default (recommended).”- Auto-compact is set to
false, but statusline says “Run /compact to compact.”
Three different interfaces give three contradictory answers. External tools like ccusage_go cannot reliably determine the actual model or context window size from Claude Code’s metadata.


The 1M Context Question
I ran a per-turn analysis of the session:
- 441 turns, linear growth from 19K → 374K
- Zero compaction drops
- Zero cache resets
- Maximum total_input: 374K tokens (37% of 1M)
The 1M context window appears real — no evidence of hidden compaction. But the session never exceeded 374K. So while the window exists, the cache architecture means you’re not getting “1M of fresh understanding” — you’re getting 1M of cached, increasingly stale KV states.
Part 5: The Pro → Max Funnel
Why Pro Users Feel Great
| Factor | Pro ($20) | Max 20x ($200) |
|---|---|---|
| Session length | Short (rate-limited) | Long (encouraged) |
| Context size | Small (~50K) | Large (200K-374K+) |
| Cache accumulation | Minimal | Tens of millions |
| Cache behavioral impact | Negligible | Severe |
| Instruction following | Good | Degrades over time |
| User experience | “This is amazing!” | “I’m being scammed” |
Rate limits on Pro actually protect you. They force short sessions, which means:
- Cache never grows large enough to affect behavior
- Cache rebuild costs are trivial (small context)
- The model performs well because its context is fresh
When you upgrade to Max 20x:
- Rate limits loosen → sessions get longer
- Cache grows → behavior degrades
- Boris Cherny’s “best practices” (parallel sessions, keep alive, 1M context, big CLAUDE.md) maximize cache consumption
- You hit quota faster, open Extra Usage, get billed at API rates including full Cache Read + Cache Write
Following Boris Cherny’s Advice
Boris promotes on Threads (@boris_cherny, 104K followers), X (@bcherny, 261K followers), and multiple podcasts:
- “Run 5 parallel Claude sessions in terminal”
- “Run 5-10 more on claude.ai/code”
- “Start sessions from your phone, check in later”
- “Keep CLAUDE.md detailed and growing”
- “Use Plan Mode then execute”
Every single recommendation maximizes cache token consumption. And he runs all of this on an internal Anthropic account with no quota limits. The workflows he promotes are free for him and ruinous for paying users.
Part 6: The Unresolved Issues
I filed three bug reports on GitHub. None have received an official Anthropic response:
Issue #24185 — Secret Leakage
Claude Code reads .env files containing API keys, passwords, and tokens, then hardcodes them into inline Python scripts visible in conversation history. Files listed in .gitignore (especially .env) should be treated as sensitive and never read or embedded into output.
Issue #26761 — Constraint Violation
Opus 4.6 repeatedly executes actions out of order, ignoring a 6-layer constraint system (SKILL.md + checklist + hooks + error log + CLAUDE.md + rules) 3+ times in the same session. Adding more rules does not fix the problem.
Issue #26193 — Behavioral Non-Compliance
All behavioral guidelines are essentially inapplicable. The model acknowledges constraints, promises to follow them, then immediately violates them.
Other users have filed related issues:
- #24147 — Cache read tokens consume 99.93% of usage quota
- #22435 — Inconsistent quota accounting (legal liability claim)
- #28723 — Unclear 1M context billing on Max
- #32286 — Silent API credit billing despite active Max subscription
Part 7: What Anthropic Should Do
-
Disclose Max subscription token budgets. Users paying $200/month have the right to know exactly how many tokens they get and how Cache Read/Write count against quota.
-
Stop counting Cache Read tokens against subscription quota. Cache Read is architectural overhead, not user-requested work. API users get a 90% discount on cache reads; subscribers should get the same or better treatment.
-
Display real model identifiers and context window sizes in all UI surfaces and JSONL logs. “Default (recommended)” is not acceptable transparency for a $200/month product.
-
Address the cache-induced behavior degradation. Long sessions with large cache accumulation systematically degrade instruction following. This is not a model problem — it’s an architecture problem.
-
Respond to filed bug reports. Issues #24185, #26761, and #26193 have been open for weeks with zero official response.
Verify It Yourself
All data in this post was generated using ccusage_go, an open-source CLI tool that parses Claude Code’s local JSONL session logs.
# Install
git clone https://github.com/SDpower/ccusage_go.git
cd ccusage_go && make build
# See your real costs
./ccusage_go blocks # 5-hour block analysis with CC/CR/API Cost
./ccusage_go session # Per-session breakdown with subagents
./ccusage_go daily # Daily summary
The JSONL files live in ~/.claude/projects/. Every number in this post came from parsing these files. You can verify everything I’ve claimed on your own data.
Conclusion
Claude Code is a remarkable product built on a predatory pricing architecture. The model itself (Opus 4.6) is genuinely capable. But Anthropic’s decision to:
- Use prompt caching to save 90% of compute costs
- Charge subscribers full quota for cache overhead
- Never disclose how quota accounting works
- Promote workflows that maximize cache consumption
- Ignore bug reports about the resulting behavior degradation
…turns a great model into a cash extraction machine.
I cancelled my Max 20x subscription. I built ccusage_go so every Claude Code user can see the truth behind their bills. The numbers speak for themselves.
$1.31 of work. $54.63 of cache overhead. 2.3% real, 97.7% architectural tax.
@boris_cherny — your users deserve better.
SteveLo is a full-stack systems engineer based in Taiwan with expertise spanning RF/SDR engineering, FPGA development, embedded systems (IoT/MCU), high-performance computing, and software architecture. He maintains ccusage_go on GitHub.
All data, screenshots, and analysis referenced in this post are available in the linked GitHub issues and repository.
Appendix: Session Cache Behavior Analysis — Generated by Claude Code Itself
The following report was generated by asking Claude Code to analyze its own session JSONL file. The irony is not lost on me — this analysis itself consumed cache tokens, further proving the point.
Session: topamo-blacklist-feature Total Turns: 618
Summary
| Metric | Value |
|---|---|
| Total turns | 618 |
| Max total_input | 493,448 |
| Max cache_read | 493,292 |
| Max cache_create | 411,474 |
| Cache rebuild events | 5 |
| Cache invalidation events | 3 |
| Compaction events | 1 |
Cache Rebuild Events (cache_creation > 100K)
| Turn | Timestamp | cache_creation | cache_read | total_input | Trigger |
|---|---|---|---|---|---|
| 467 | 16:24:13 | 384,716 | 14,369 | 399,088 | 36-min break → cache expired |
| 468 | 16:24:15 | 384,716 | 14,369 | 399,088 | Same rebuild billed again |
| 469 | 16:24:16 | 384,716 | 14,369 | 399,088 | Same rebuild billed a third time |
| 512 | 17:56:05 | 278,789 | 0 | 278,792 | 1.5-hour break → full invalidation + compaction |
| 518 | 17:58:19 | 411,474 | 14,369 | 425,846 | Post-compaction rebuild |
Turns 467-469: The same 384K cache rebuild was billed three times in three seconds. Three identical cache_creation charges at 125% rate for the same operation.
Turns 512 + 518: A 1.5-hour break triggered compaction (context dropped from 418K to 278K) followed by a 411K rebuild two minutes later. Combined cache_creation: 690K tokens at 125%.
Cache Invalidation Events (cache_read drop > 50%)
| Turn | Timestamp | Previous cache_read | Current cache_read | Drop |
|---|---|---|---|---|
| 467 | 16:24:13 | 395,575 | 14,369 | 96% |
| 512 | 17:56:05 | 418,338 | 0 | 100% |
| 518 | 17:58:19 | 283,511 | 14,369 | 95% |
Every invalidation correlates with a break in activity. The 5-minute cache TTL means any pause longer than 5 minutes destroys the entire cache and triggers a 125% rebuild.
Cache Read Growth Curve
T 1- 50 |██ 26,407
T 51-100 |███████ 76,267
T101-150 |████████████ 122,142
T151-200 |█████████████████ 174,997
T201-250 |█████████████████████ 210,124
T251-300 |████████████████████████ 236,148
T301-350 |█████████████████████████████ 288,071
T351-400 |██████████████████████████████████ 338,715
T401-450 |█████████████████████████████████████ 365,936
T451-500 |██████████████████████████████████████ 378,294
T501-550 |████████████████████████████████████████ 397,680
T551-600 |███████████████████████████████████████████████ 462,922
T601-618 |██████████████████████████████████████████████████ 487,965
Linear growth from 26K to 488K. Never approaches 1M. Each bar represents the average cache_read per turn in that range — this is the amount of cached context re-sent on every single message.
At 488K cache_read per turn and Opus 4.6 cache read pricing ($0.50/MTok), the last 18 turns alone cost: 18 × 488K × $0.50/M ≈ $4.39 in cache read fees for what was likely a few hundred tokens of actual new input.
Key Observations
-
Cache read grows linearly and never resets — it only drops during invalidation events, then immediately rebuilds to a higher level.
-
The “1M context” reached 493K (49.3%) in 618 turns — at this growth rate, it would take ~1,200 turns to actually approach 1M. But by then, every single turn would cost ~$0.50 in cache read alone.
-
Breaks are punished severely — a 36-minute break cost a 384K rebuild at 125%. A 1.5-hour break cost compaction + two rebuilds totaling 690K at 125%.
-
Duplicate billing detected — turns 467-469 show identical cache_creation values billed three times in three seconds.
-
Post-compaction behavior — after compaction reduced context from 418K to 278K, a rebuild immediately inflated it back to 425K. Compaction saved nothing; the rebuild erased the gains within 2 minutes.
This report was generated by Claude Code analyzing its own session data. The analysis itself added turns to this session, consuming additional cache tokens — a fitting metaphor for the problem it documents.