Copilot CLI Context Window: How I Cut Token Usage from 52% to 13%
I'll be honest: I started this whole investigation backwards. I had 117 skills consuming 413K tokens on disk and assumed that was the problem. I spent two hours optimizing them before I thought to actually measure what was in my context window. Turns out, skills are on-demand — they never touch the context window at all.
The biggest consumer was something I never would have guessed: a single plugin loading ~27K tokens of tool definitions into every message. This is the story of how I found it, scoped it down, and — importantly — how you can configure it to match your workflow without losing functionality.
What makes this different? There are already several great articles about MCP context optimization (devbolt.dev, The New Stack, StackOne, blog.pamelafox.org). This one adds: real measured token numbers from a production setup, the
/contextcommand as a diagnostic tool, the Azure MCP namespace scoping solution, and the Squad orchestration angle.
Step 1: Measure First — Check Your Token Breakdown
I run GitHub Copilot CLI with a multi-agent orchestration setup — half a dozen MCP servers, several plugins, and 117 skills. Mid-session, I got curious about what my context window actually looked like and ran /context:

Context Usage
claude-opus-4.6 · 104k/200k tokens (52%)
System/Tools: 62.5k (31%)
Messages: 41.8k (21%)
Free Space: 55.3k (28%)
Buffer: 40.4k (20%)
52% consumed before typing a single message. The System/Tools bucket alone was 62.5K tokens — 31% of my 200K window. That's the baseline cost of my setup: agent instructions, MCP tool definitions, system prompt, memories.
With only 28% free space, complex multi-agent tasks would trigger compaction mid-session. I needed to find what was actually consuming those 62.5K tokens — and the only way to know for sure was to audit what's always-loaded vs. what lives on disk.
Step 2: Distinguish Always-Loaded from On-Demand
The first question to ask is not "what's biggest?" but "what's always in context?"
The System/Tools bucket contains everything that loads on every message — unconditionally. If I can reduce that, every operation gets cheaper. Optimizing anything else only helps specific operations.
I built a breakdown:
| Consumer | ~Tokens | When Loaded | Controllable? |
|---|---|---|---|
| MCP/Plugin tool definitions | ~27K+ | Every message | ✅ Scope or remove |
| Agent instructions | ~20K | Every message | ✅ Slim it down |
| System prompt + memories | ~10K | Every message | Partial |
| Skills | ~143K on disk | On-demand only | Can optimize, but won't help context |
| Conversation history | Growing | Accumulates | Fresh sessions help |
Key insight: Skills sit on disk until an agent explicitly requests one. They are never in the context window. Optimizing them makes individual agent spawns cheaper and faster — valuable for performance — but they don't contribute to System/Tools at all. (I learned this after spending two hours optimizing them. Do as I say, not as I did.)

Step 3: Audit What's Always-Loaded
The mystery is: what's in that 62.5K System/Tools bucket?
MCP Tool Definitions (~6–10K tokens)
MCP servers inject their tool schemas into every message. I had:
- GitHub MCP — ~15 tools (issues, PRs, code search, actions)
- Mail MCP — ~20 tools (search, send, reply, forward, attachments)
- PowerBI MCP — ~6 tools (execute query, generate query, get schema)
- M365 Agents Toolkit — ~4 tools (knowledge, snippets, schema, troubleshoot)
- IDE — ~2 tools (diagnostics, selection)
These are real — about 47–55 tools across all servers. But they're only ~6–10K tokens total. Where's the other 50K?
The Azure Plugin (~27K tokens) — The Biggest Consumer
I checked ~/.copilot/settings.json and found the Azure plugin enabled:
| Plugin | Source | Impact |
|---|---|---|
| azure | microsoft/azure-skills | 50+ tools, ~27K tokens |
Here's the thing about the Azure MCP Server: it's comprehensive. Version 3.0.0-beta.6 has 259 tools across 56 namespaces — covering everything from ACR to Virtual Desktop to Well-Architected Framework. That breadth is genuinely impressive, and the team clearly designed it to be a one-stop shop for Azure developers.
The good news: the team also thought carefully about how developers actually work. They built in namespace scoping and mode selection so you don't have to load the entire surface area. In its default "namespace" mode, it groups tools by service — but if you're only using a few services, you can filter down to just those. More on that in a moment.
In my case, the default configuration was loading 50+ tool schemas into every message — even when I wasn't doing Azure work in that session. Not a bug, just a configuration I hadn't tuned yet.

Agent Instructions (~20K tokens)
My agent governance file — .github/copilot-instructions.md at the repo root — is 80KB. It loads on every turn. This is the ongoing cost of a sophisticated agent setup: the orchestration rules are comprehensive, and they load unconditionally whether I need them or not.
Step 4: Scope the Azure Plugin to Match How You Work
Once I understood the breakdown, the fix was straightforward. The Azure MCP team built exactly the right lever for this — namespace scoping lets you declare which services matter for your project and ignore the rest. No functionality lost, just a tighter fit.
Option A: Disable Entirely (Full removal)
If you genuinely don't use Azure, just turn it off:
// ~/.copilot/settings.json
"azure@azure-skills": false
This is what I did initially — it dropped System/Tools from 62.5K → 35.2K, freeing ~27K tokens instantly.
