You already feel it: the agent burns 50K tokens before it answers. Sessions go to mush at message 30. Compaction nukes the context you actually needed. jCodeMunch ends that tax.
Agent asks "show me handle_request" — gets ~480 tokens of exact function source, not the 214K-token whole file. tree-sitter-parsed, byte-offset O(1) seeking.
jcodemunch-mcp init detects your client (Claude/Cursor/Windsurf), writes config, installs hooks, indexes the project, and audits token waste in your existing setup.
PreToolUse, PostToolUse, PreCompact, TaskCompleted, SubagentStart. The agent stops re-reading what it already has — and stops losing what it just learned.
get_blast_radius, get_call_hierarchy, find_dead_code, get_dependency_cycles, get_layer_violations, check_rename_safe. 67 tools. AST-grade, not regex-guess.
Python, JS/TS, Go, Rust, Java, C/C++, PHP, C#, Ruby, Kotlin, Swift, Dart, Elixir, Erlang, SQL, Razor/Blazor, Pascal, COBOL, Zig, PowerShell … and 60+ more.
Works with Claude, GPT, Gemini, Groq Remote MCP. Local ONNX embeddings bundled — zero-config first run. AI summarizer falls through Anthropic → Gemini → OpenAI-compat → signature.
middleware/auth.py (8K tokens)routes/api.py (12K tokens)get_symbol_source("require_auth") → 380 tokensget_call_hierarchy → 1.2K tokens"Roughly 5× more efficient context retrieval."
"Preserves your context budget for actual reasoning."
"Structural questions you simply can't ask Grep or Glob."
"The whole game is what you choose not to put in the prompt."