A 100,000-line codebase is roughly 1-1.5 million tokens. Even with Claude's 1M context window, you can't dump it all in. And even if you could, you shouldn't --- because the real problem isn't fitting code into the window. It's fitting the RIGHT code.
Research shows that Claude Code quality degrades at 20-40% context usage due to attention dilution, not token exhaustion. A focused 20K-token conversation outperforms a stuffed 100K-token one. Smaller context is cheaper, faster, AND produces better results.
Here's how to manage context like someone who's been debugging this for a while.
Key Takeaways
- Smaller, focused context beats large, unfocused context --- both for quality and cost
- Auto-compaction triggers at ~83.5% capacity. Don't wait for it. Compact manually at natural task boundaries.
- Subagents are the most powerful context management tool --- research stays in their window, not yours
- CLAUDE.md survives compaction (it's re-loaded from disk). Everything else is summarized and potentially lost.
- 3 parallel sessions on a 50K-line project can cover the entire codebase without saturating any window
Short Answer
How do I work with large codebases? Don't load everything. Scope each session to one task. Let Claude search for what it needs instead of pre-loading files. Use subagents for research-heavy work. Compact manually between tasks. Keep your CLAUDE.md focused (under 80 lines). The goal is relevant context, not maximum context.
Why Bigger Context Isn't Better
The intuition is wrong. More context should mean more information, which should mean better output. In practice, the opposite happens.
The Attention Dilution Problem
LLMs don't pay equal attention to everything in the context window. They over-attend to the beginning and end, and under-attend to the middle --- a phenomenon researchers call "Lost in the Middle." When you stuff 100K tokens of code into the window, Claude may miss a critical function buried in the middle while perfectly recalling the import statements at the top.
Multiple studies confirm that models claiming 200K token windows become unreliable around 130K-150K tokens, with sudden performance drops rather than gradual degradation.
The Practical Symptoms
When your context gets polluted, you'll see these:
- Claude duplicates functions it already wrote earlier in the session
- It forgets coding conventions you established 30 minutes ago
- It suggests one approach after already implementing a different one
- It hallucinates file paths that don't exist in your project
If you're seeing these, your context is too full or too noisy. The fix isn't a bigger window. It's a cleaner one.
How Claude Code Manages Context
Understanding the mechanics helps you work with them instead of against them.
Auto-Compaction
Claude Code reserves approximately 33K tokens as a buffer. When usage hits about 83.5% of the window, auto-compaction kicks in:
- The entire conversation is summarized into a compact block
- All message blocks before the compaction point are dropped
- CLAUDE.md files are re-loaded from disk (the only thing guaranteed to survive intact)
- The conversation continues from the summary
The /compact Command
You don't have to wait for auto-compaction. Run /compact manually at any time. You can pass custom instructions:
/compact Preserve all modified file paths, test results, and the current branch name
This tells the summarizer what to prioritize. Without guidance, it makes its own judgment calls about what's important --- and it may not match yours.
What Survives Compaction
| Survives | Lost |
|---|---|
| CLAUDE.md (re-loaded from disk) | Exact function signatures from earlier |
| Condensed summary of decisions | Specific error messages |
| General direction of the current task | Detailed conversation history |
| Files currently being edited | "Remember when I said..." context |
Subagents: Context Isolation
This is the single most powerful context management tool. When Claude reads 50 files to find the right one, those 49 irrelevant file contents pollute your main context. With subagents, the research happens in a separate context window. Only the relevant summary returns to your conversation.
Use a subagent to search the codebase for all places where
authentication tokens are refreshed. Report back the file paths
and the key patterns.
The subagent reads dozens of files, processes them, and returns a focused 500-token summary. Your main context stays clean.
The Strategy Playbook
File-Level: Load Only What's Relevant
Do: Let Claude's search tools (Grep, Glob) find what it needs dynamically.
Don't: Pre-load files "just in case Claude needs them." Every speculative file read consumes context that never gets freed.
Don't: Use @ imports in CLAUDE.md to embed entire files into every session. Point to files instead:
# Reference Docs
- For API design: read docs/MASTER-PLAN.md
- For database schema: read supabase/schema.sql
This loads the file only when Claude is actually working on that topic.
Project-Level: Your CLAUDE.md Is Context
Your CLAUDE.md loads into every session. If it's 300 lines, that's a few thousand tokens consumed before Claude even starts working. Keep it under 80 lines.
Put critical rules in the first 20 lines. Claude pays more attention to the beginning of context. Your test command, tech stack, and top 3 gotchas should be near the top.
Use nested CLAUDE.md files for domain-specific context. Instead of one massive root file:
project/
├── CLAUDE.md # Core rules (50 lines)
├── frontend/CLAUDE.md # React conventions, component patterns
├── backend/CLAUDE.md # API conventions, database patterns
└── infra/CLAUDE.md # Deployment, CI/CD rules
Subdirectory files load only when Claude works in those directories.
Session-Level: Scope and Refresh
One task per session. Mixing unrelated tasks pollutes context. If you're refactoring authentication and then switch to fixing a CSS bug, the authentication context is dead weight.
Use /clear between unrelated tasks. It resets the conversation while keeping CLAUDE.md loaded. Clean slate, no wasted tokens.
Use Plan mode for multi-file changes. Before refactoring 5 files, switch to Plan mode. This saves approximately 40% of tokens compared to an exploratory approach, because Claude plans the work before generating code.
Add compaction instructions to CLAUDE.md:
# Context Management
When compacting, always preserve:
- Full list of files modified in this session
- Test commands used and their results
- Current branch name and PR context
- Errors encountered and how they were resolved
Architecture-Level: Parallel Sessions
Boris Cherny, the creator of Claude Code, runs 5 local instances simultaneously in numbered git checkouts, plus 5-10 additional sessions on the web.
You don't need that many. But running 2-3 parallel sessions on a large project is a practical strategy:
- Session 1: Backend work (API endpoints, database queries)
- Session 2: Frontend work (components, styling)
- Session 3: Research and exploration (reading docs, understanding patterns)
Each session has focused context. No session needs to know about the others' work. Use git worktrees (claude --worktree) to keep file changes isolated.
Token Economics: Why This Matters for Your Bill
Every message in Claude Code includes the entire previous context as input. Costs compound with each turn.
| Scenario | Context Size | Cost Per Turn (Opus) | 50-Turn Session |
|---|---|---|---|
| Focused session | 20K tokens | ~$0.10 | ~$5 |
| Bloated session | 80K tokens | ~$0.40 | ~$20 |
| Stuffed session | 200K tokens | ~$1.00 | ~$50 |
The bloated session costs 4x more AND produces worse output. You're paying more for worse results.
If a session runs 50 turns carrying 80K tokens of stale context, that's 4M tokens of unnecessary input. At Opus rates, roughly $20 wasted on context that actively hurts performance.
The saving grace is caching. On subscription plans, 90%+ of tokens are cache reads (90% cheaper than fresh input). But even with caching, smaller context is always faster and cheaper.
The Anti-Patterns
1. Loading Entire Repos
A 200-file repository consumes roughly 800K tokens. You've burned 80% of your window before asking a single question. Claude won't refuse --- it'll just produce progressively worse output as attention dilutes across irrelevant code.
2. Over-Stuffed CLAUDE.md
If your CLAUDE.md is 200+ lines, Claude starts ignoring instructions. Important rules get lost in noise. One developer documented this: wrote 200 lines of rules, Claude ignored them all. Keep it under 80 lines. If Claude already does something correctly without the instruction, delete that instruction.
3. Not Scoping Sessions
The most common anti-pattern. You start with "fix the auth bug," then ask about deployment, then switch to a CSS issue. By turn 30, the context is a mosaic of three unrelated tasks and none of them are getting Claude's full attention.
4. Waiting for Auto-Compact
If you're waiting for auto-compaction to trigger, you've already been operating in degraded mode for thousands of tokens. Compact manually at natural task boundaries: after finishing a feature, after a commit, before switching topics.
5. Dumping "Just in Case" Context
Reading files speculatively wastes context. "Let me read all the auth-related files before we start" loads 10 files when you'll only need 2. Let Claude search for what it needs as the task demands it.
6. Skipping Subagents for Research
If Claude needs to read 30 files to understand a pattern, those 30 files should be in a subagent's context, not yours. The subagent reports back a summary. Your context stays focused.
FAQ
Does the 1M context window solve this problem?
It helps, but it doesn't solve it. Attention dilution still occurs at 20-40% usage. A 1M window means fewer forced compactions, but the quality benefits of focused context remain. Think of 1M as a bigger safety net, not a license to load everything.
How do I know when my context is getting too full?
Watch for the symptoms: duplicated code, forgotten conventions, hallucinated file paths, and inconsistent approaches within a single session. Claude Code also shows a context usage indicator. If you're above 60%, consider compacting.
Should I compact before or after committing?
After. Commit your work, verify it's saved, then compact or start a new session. Compacting before a commit risks losing context about uncommitted changes.
Does CLAUDE.md really survive compaction?
Yes. CLAUDE.md is re-loaded from disk after compaction --- it's the one piece of context that's guaranteed to persist. This is why your most important rules should live there, not in conversation.
What about the /rewind command?
/rewind lets you go back to a previous checkpoint in the conversation and choose "Summarize from here" for targeted compaction. Useful when you realize a particular line of exploration was a dead end and you want to compact just that part.
Key Takeaways
- Focused context beats maximum context --- 20K relevant tokens outperforms 100K noisy ones, and costs 4x less
- Compact manually at task boundaries --- don't wait for auto-compaction to kick in at 83.5%
- Subagents are your context firewall --- research in their window, results in yours
- CLAUDE.md is your one guaranteed survivor --- make it count with compaction instructions and critical rules
- Parallel sessions beat one massive session --- 3 focused sessions > 1 bloated session, every time
Further reading: The CLAUDE.md Playbook: 12 Rules That Work | 10 AI Agent Skills Every Developer Should Install