Optimizing Copilot Skills: 65% Token Reduction Across 117 Skills

Optimizing skills felt less like deleting content and more like reorganizing a workshop — same tools, better drawers.
I'd been adding to the .copilot/skills/ directory for a while without taking inventory. Every feature or domain onboarding meant a new skill — sometimes three. The assumption was obvious: more skills = more consistency. For the first few dozen, that was true.
Here's what's weird: I had no actual count.
When I finally looked: 413,591 tokens across 136 skill and reference files (117 distinct skills). Just measuring it revealed the bloat:
- 6 SDK sample review skills: 140K tokens (34% of total budget)
- Dead redirect stubs: consuming tokens for no routing purpose
- Duplicated prose: same guidance repeated across language variants
Not a disaster, but the kind of creeping growth that happens when you build fast and don't audit.
Why this matters: Skills load on demand, so optimizing them doesn't free your active context window. But faster agent spawns and cheaper skill loading? That's a different lever, and I wanted to pull it.
The patterns I found — reference extraction, checklist compression, shared references — work with any tool. I used GitHub Copilot CLI with Squad orchestration to run them in parallel, but you could apply them manually in any editor. The techniques are the point, not the tooling.
Measuring Token Usage with microsoft/waza
The first move: measure. Waza is a skill quality toolkit, and waza_tokens count does exactly that — scans your skills directory and gives you sorted token usage. No guessing. Here's the breakdown:
$ waza_tokens count .copilot/skills/
┌─────────────────────────────────┬────────┐
│ Skill │ Tokens │
├─────────────────────────────────┼────────┤
│ data-plus-ai-sdk-java-sample... │ 25,841 │
│ data-plus-ai-sdk-python-samp... │ 23,921 │
│ ... │ │
│ dina-small-utility │ 312 │
├─────────────────────────────────┼────────┤
│ Total: 117 skills │413,591 │
└─────────────────────────────────┴────────┘
25K tokens for a single skill. That's the starting point. Waza has other tools too — waza_tokens suggest for optimization ideas, waza_quality to verify post-changes, and waza_dev --copilot for frontmatter — but for this work, count was the diagnostic tool.
Planning the Work
The strategy: decompose into phases, ordered by savings potential. Clear the big ones first; small wins come after.
| Phase | Target | Est. Savings |
|---|---|---|
| 1. Kill stubs | 3 empty redirect skills | ~73 tokens |
| 2. Refactor giants | 6 SDK review skills (140K!) | ~120K tokens |
| 3. Optimize large | 14 skills (5K–10K each) | ~30–50K tokens |
| 4. Optimize medium | 60 skills (1K–5K each) | ~10–20K tokens |
| 5. Trim small | 20 skills (under 1K each) | minimal |
| 6. Audit references | Large reference files | ~10–15K tokens |
Why this order matters: I started with stubs not because they saved much, but because they reduced noise before the real work. Phases 2–3 capture the bulk of savings. Phases 4–5 are diminishing returns per skill, but we completed them efficiently by applying the same patterns we'd already proven earlier.
Phase 1: Killing the Stubs
Problem: Three skills were redirect stubs — they pointed to other skills and had fewer than 50 tokens of actual content. No routing logic, no value.
Action: Deleted them.
Result: −73 tokens. Barely registers numerically, but this is the "boring is good" work. A clean directory is easier to maintain, and stubs confuse future maintainers.
Phase 2: The Giants
Problem: Six SDK sample review skills (Java, Python, Go, .NET, TypeScript, Rust) had identical structure: 15–16 detailed rule sections + full code examples inline. Total: 140K tokens (34% of budget). Agents loaded everything every time, even when they only needed one language's rules.
Technique: Reference Extraction. Move verbose rules and examples to references/ files. Keep SKILL.md slim — just routing info, a quick checklist, and blockers. Agents load the overview immediately, fetch detailed rules on demand.
Before:
java-sdk-review/SKILL.md (25,841 tokens)
├── Routing info (2K)
├── Error handling rules (8K + full examples)
├── Concurrency rules (7K + full examples)
├── Async patterns (6K + full examples)
└── ... 12 more sections ...
After:
java-sdk-review/SKILL.md (1,541 tokens)
├── Routing: detect Java SDK samples
├── Quick checklist
│ ├── Error handling: caught, logged, meaningful messages
│ ├── Concurrency: thread-safe, no race conditions
│ ├── Async patterns: proper callback/future chaining
│ └── ... 5 more items