Article Summary (Model: gpt-5.4-mini)
Subject: Quota Burn Bug
The Gist:
This GitHub issue reports that Claude Code’s Pro Max 5x plan can burn through a quota window in about 1.5 hours even during only moderate use. The reporter logs token usage from session files and argues the likely cause is that cache_read tokens are being counted at full rate for quota purposes, which would erase the benefit of caching. They also point to background sessions, auto-compaction, and a 1M context window as amplifiers.
Key Claims/Facts:
- Token accounting: The report argues quota drain matches full-rate counting of
cache_read, not discounted counting. - Context growth: Large, long-running sessions and auto-compacts can create very large per-call token bursts.
- Shared quota pool: Idle or background sessions may still consume from the same quota window.
Discussion Summary (Model: gpt-5.4-mini)
Consensus: Frustrated and skeptical overall, with many users saying the quota behavior has become unpredictable or too aggressive, while others question whether the workload or session setup explains the burn rate.
Top Critiques & Pushback:
Better Alternatives / Prior Art:
claude-code-cache-fixorcozempic(c47739693, c47739704, c47739725, c47739445).Expert Context:
cache_readhypothesis is distinct and worth investigating, and later posts an analysis claiming their data fits a model wherecache_readdoes not meaningfully count toward the 5-hour quota, contrary to the original suspicion (c47739625, c47739759?)}