Skip to main content

5 posts tagged with "AI assisted"

View All Tags

Optimizing Copilot Skills: 65% Token Reduction Across 117 Skills

· 13 min read

Watercolor illustration of a craftsperson's workbench being tidied and organized

Optimizing skills felt less like deleting content and more like reorganizing a workshop — same tools, better drawers.

I'd been adding to the .copilot/skills/ directory for a while without taking inventory. Every feature or domain onboarding meant a new skill — sometimes three. The assumption was obvious: more skills = more consistency. For the first few dozen, that was true.

Here's what's weird: I had no actual count.

When I finally looked: 413,591 tokens across 136 skill and reference files (117 distinct skills). Just measuring it revealed the bloat:

  • 6 SDK sample review skills: 140K tokens (34% of total budget)
  • Dead redirect stubs: consuming tokens for no routing purpose
  • Duplicated prose: same guidance repeated across language variants

Not a disaster, but the kind of creeping growth that happens when you build fast and don't audit.

Why this matters: Skills load on demand, so optimizing them doesn't free your active context window. But faster agent spawns and cheaper skill loading? That's a different lever, and I wanted to pull it.

The patterns I found — reference extraction, checklist compression, shared references — work with any tool. I used GitHub Copilot CLI with Squad orchestration to run them in parallel, but you could apply them manually in any editor. The techniques are the point, not the tooling.

Measuring Token Usage with microsoft/waza

The first move: measure. Waza is a skill quality toolkit, and waza_tokens count does exactly that — scans your skills directory and gives you sorted token usage. No guessing. Here's the breakdown:

$ waza_tokens count .copilot/skills/
┌─────────────────────────────────┬────────┐
│ Skill │ Tokens │
├─────────────────────────────────┼────────┤
│ data-plus-ai-sdk-java-sample... │ 25,841 │
│ data-plus-ai-sdk-python-samp... │ 23,921 │
│ ... │ │
│ dina-small-utility │ 312 │
├─────────────────────────────────┼────────┤
│ Total: 117 skills │413,591 │
└─────────────────────────────────┴────────┘

25K tokens for a single skill. That's the starting point. Waza has other tools too — waza_tokens suggest for optimization ideas, waza_quality to verify post-changes, and waza_dev --copilot for frontmatter — but for this work, count was the diagnostic tool.

Planning the Work

The strategy: decompose into phases, ordered by savings potential. Clear the big ones first; small wins come after.

PhaseTargetEst. Savings
1. Kill stubs3 empty redirect skills~73 tokens
2. Refactor giants6 SDK review skills (140K!)~120K tokens
3. Optimize large14 skills (5K–10K each)~30–50K tokens
4. Optimize medium60 skills (1K–5K each)~10–20K tokens
5. Trim small20 skills (under 1K each)minimal
6. Audit referencesLarge reference files~10–15K tokens

Why this order matters: I started with stubs not because they saved much, but because they reduced noise before the real work. Phases 2–3 capture the bulk of savings. Phases 4–5 are diminishing returns per skill, but we completed them efficiently by applying the same patterns we'd already proven earlier.

Phase 1: Killing the Stubs

Problem: Three skills were redirect stubs — they pointed to other skills and had fewer than 50 tokens of actual content. No routing logic, no value.

Action: Deleted them.

Result: −73 tokens. Barely registers numerically, but this is the "boring is good" work. A clean directory is easier to maintain, and stubs confuse future maintainers.

Phase 2: The Giants

Problem: Six SDK sample review skills (Java, Python, Go, .NET, TypeScript, Rust) had identical structure: 15–16 detailed rule sections + full code examples inline. Total: 140K tokens (34% of budget). Agents loaded everything every time, even when they only needed one language's rules.

Technique: Reference Extraction. Move verbose rules and examples to references/ files. Keep SKILL.md slim — just routing info, a quick checklist, and blockers. Agents load the overview immediately, fetch detailed rules on demand.

Before:

java-sdk-review/SKILL.md (25,841 tokens)
├── Routing info (2K)
├── Error handling rules (8K + full examples)
├── Concurrency rules (7K + full examples)
├── Async patterns (6K + full examples)
└── ... 12 more sections ...

After:

java-sdk-review/SKILL.md (1,541 tokens)
├── Routing: detect Java SDK samples
├── Quick checklist
│ ├── Error handling: caught, logged, meaningful messages
│ ├── Concurrency: thread-safe, no race conditions
│ ├── Async patterns: proper callback/future chaining
│ └── ... 5 more items
└── Reference files in references/java/ (loaded on demand)

Two-tier architecture. Same content, loaded smarter.

Execution: Ran all six in parallel:

SkillBeforeAfterReduction
Java25,8411,54194%
Python23,9211,08395%
Go24,3551,81593%
.NET23,3551,37894%
TypeScript21,5431,52593%
Rust21,3031,64392%
Total140,3188,985~131K saved

Trade-off: Agents now navigate a two-tier structure (SKILL.md → references/) instead of one flat file. Discoverability costs something. But these skills are used frequently enough that agents will learn the pattern. Zero content was removed — every rule and example is still there, just reference-extracted.

Phase 2 complete: SDK skills before/after showing 94%+ reduction per language

Phase 3: Large Skills

Problem: 14 more skills in the 5K–10K range had the same structure: verbose sections that could be extracted. Examples: azure-mcp-content-generation, dina-reskill, context-diagnostics.

Action: Applied the same reference extraction pattern in 4 parallel batches.

Result: −68,084 tokens (76% reduction)

Running total:

Phase 1 (stubs):   −73 tokens
Phase 2 (giants): −131,333 tokens
Phase 3 (large): −68,084 tokens
────────────────────────────────
Subtotal saved: ~199,490 tokens (48% of starting budget)
Remaining: ~214,101 tokens

At this point, the curve was clear. Phases 4–5 (medium and small skills) would yield diminishing returns per unit effort — but having proven the techniques in Phases 1–3, we already knew how to apply them efficiently at scale.

The PR and the Review

Problem: The PR touched 106 files, 18,571 deletions, 12,176 insertions. Before shipping, we needed to verify:

  • Structural integrity (paths, syntax, references valid)
  • Quality didn't regress (waza_quality scores)
  • Routing logic still precise
  • Didn't over-trim skills below usefulness

Action: Ran four automated review passes:

  1. Structural integrity check — passed
  2. Waza quality verification — passed with notes
  3. Trigger precision validation — passed with notes
  4. Adversarial over-trimming check — caught 2 real issues

Issues found and fixed:

  1. Reference file with broken relative path (in Phase 2)
  2. Skill trimmed below ~800 tokens (lost routing context entirely)

Second pass: ✅ SHIP

Pull request showing 65% Copilot skills token reduction across 106 files

Key finding: Don't reduce a SKILL.md below ~800 tokens for standalone skills. Below that threshold, you lose enough routing context that agents can't determine when or how to use the skill. Exception: Skills with strong internal routing logic (like the unified SDK skill at 469 tokens) can go lower because their dispatch logic compensates.

The ~800-token floor is a practical boundary, discovered through testing.

Phases 4–6: The Curve Flattens

After Phase 3, ~214K tokens remained. Phases 4–6 brought that down to 143K — another ~70K saved by applying the same patterns (checklist compression, reference extraction, deduplication) at smaller scale:

PhaseSkillsTechniqueSavings
4. Medium60 skills (1K–5K each)Checklist compression, dedup~45K
5. Small20 skills (under 1K each)Light trimming~10K
6. Reference auditLarge reference filesConsolidation~15K

The per-skill ROI drops in later phases, but having proven the techniques in Phases 1–3, the work was mechanical — same patterns, smaller targets. The current 143K total is sustainable for the usage pattern.

Final Numbers

Before:  413,591 tokens (117 skills, 136 files)
After: 143,354 tokens (114 skills, 130 files)
Saved: 270,237 tokens (65.3% reduction)

This reflects the main optimization PR. The workbench is cleaner — every tool is still there, but they're in labeled drawers instead of piled on the surface. The Bonus Round consolidation happened in a separate session and is described next.

Bonus Round: From Shared References to Unified Skills

After the optimization PR shipped, I ran waza_quality on the six SDK skills and noticed: isolation violation. Each skill had its own SKILL.md, its own routing, its own boilerplate duplicated across six files. That's a pattern violation — not atomic, not clean.

So I rethought it. Three options existed:

  1. Accept the low score — good-enough for a domain-specific exception
  2. Inline shared content — copy the 14 shared reference files into each skill, double the maintenance burden
  3. Single skill with language dispatch — collapse all 7 (6 SDK languages + 1 quickstart) into one unified skill with language auto-detection

I went with option 3. Not because it was obvious, but because it matched the actual use case: agents almost never review samples in all languages simultaneously. They review one language based on the codebase. A single skill with smart routing was actually more correct than the multi-skill pretense.

Result: azure-sdk-sample-review/ — one skill, 469 tokens in SKILL.md, language auto-detection via prompt analysis. Structure:

azure-sdk-sample-review/
├── SKILL.md (469 tokens) — routing + dispatch logic
├── evals/ (7 tasks, 100% passing)
├── references/
│ ├── shared/ (14 files: generic best practices)
│ ├── dotnet/, go/, java/, python/, rust/, typescript/
│ └── quickstart/

Before: 86 files across 6 language folders — Java, Python, Go, .NET, TypeScript, Rust — each with its own copy of the same 14 generic reference files. 45% pure duplication. Update one? Remember to update five more.

After: One shared/ folder holds the 14 generic files. Each language folder keeps only what's actually unique to that SDK. One update, one place. Six duplicated routing SKILL.md files collapsed to a single dispatch mechanism. Waza compliance achieved — no more isolation violations. All 7 behavioral evals running at 100%.

This is the evolution: reference extraction → shared references → unified skill with internal routing. Each step felt right at the time. Looking back, the final architecture is simpler and more correct. Wish I'd seen it from the start. (Work captured in a follow-up PR.)

Dogfooding: The Reskill Skill

I captured the optimization pipeline as a skill — dina-reskill — documenting the 8-pattern workflow (reference extraction, checklist compression, example pruning, and so on).

Then I ran it on itself, because apparently I can't leave well enough alone:

SKILL.md:  2,085 → 1,163 tokens (44% reduction)
Total: 5,401 → 4,288 tokens (21% reduction)

Three review passes: two approvals, one note flagged and fixed. The skill practices what it documents. The SDK skills themselves evolved further after this (described in Bonus Round) — from six separate skills down to a single unified skill. So while dina-reskill captures the second-pass improvements here, the SDK consolidation shows how those patterns continue to evolve as you live with them.

What Actually Worked: The Patterns

If you're building a skills optimization workflow, here are the patterns ranked by impact:

1. Reference Extraction

Principle: Move detailed rules, code examples, and verbose explanations into references/ files. The SKILL.md becomes a slim routing layer — overview, quick checklist, blocker list. Agents load references on demand.

When to use: For any skill over 5K tokens, this should be your first move. Start here, not somewhere else.

Example: The Java SDK skill went from 25,841 tokens (inline rules + examples) to 1,541 tokens (routing + checklist) by extracting ~24K into references/java/.

2. Checklist Compression

Principle: Turn paragraph-style guidance into concise checklists. Same information, fraction of the tokens.

Example:

  • Before: "When reviewing error handling, ensure that all errors are properly caught, logged with appropriate context, and returned with meaningful messages to the caller"
  • After: "✅ Errors: caught, logged with context, meaningful messages"

3. Example Pruning

Principle: One good example per pattern. If your skill has 3 examples of the same concept, keep the clearest one and move the others to references.

4. Shared References → Unified Skill Routing

Principle: If multiple skills share common guidance, the first instinct is to extract it once and link. That works, but it's often a stepping stone to something better: collapsing near-identical skills into one skill with internal dispatch logic.

When this works: When you have N near-identical skills differing only by one dimension (language, framework, etc.), a unified skill with auto-detection is cleaner than N separate skills with shared references.

Trade-off: One SKILL.md, one set of evals, one routing boundary. Zero isolation violations. But your SKILL.md becomes more complex.

5. Stub Elimination

Principle: If a skill just redirects to another skill, delete it. The router doesn't need a placeholder, and stubs confuse future agents trying to decide what to use.

Honest Lessons: How I Should Have Run This

The work happened over 8 user messages and 2 hours. Here's what went sideways and what would have prevented it:

What HappenedWhat Would Have Been Better
SDK dedup discovered late (turn 6–8)Mention "deduplicate shared content" upfront as a known phase
Asking about PR + review + results separatelyBundle deliverables: "PR, team review, results file" in one request
Phases 4–5 required separate promptsFront-load scope: "all phases including medium skills" keeps momentum

The pattern that would have worked: Front-load three things upfront:

  1. The technique or tool (waza_tokens, reference extraction, etc.)
  2. Full scope with known edge cases (all 6 phases, ~800-token floor, etc.)
  3. All deliverables you want at the end

Front-load scope, technique, and deliverables in one message. The AI doesn't lose patience — you do. Every "keep going" prompt is a planning failure you're paying for at execution prices.

The Setup

For reference, here's what I was running:

Where to Go From Here

To determine if your own skills directory needs this treatment: run waza_tokens count and see the total. If it's over 100K tokens, you have meaningful room to optimize. If you have skills over 5K tokens, reference extraction is almost always worth it.

Everyone's skill architecture is different — the interesting work is figuring out which patterns actually fit your setup. If you try these and discover something that works or something that breaks, I'd be curious to hear what you found.

The same workbench, now organized — tools on pegboard, labeled drawers, clean surface

Same workshop. Same tools. Better organized. That's what 270K tokens of optimization looks like.

Main optimization session: May 11, 2026. 8 user messages, ~2 hours, 270K tokens saved. The Bonus Round consolidation happened in a follow-up session.

Copilot CLI Context Window: How I Cut Token Usage from 52% to 13%

· 11 min read

I'll be honest: I started this whole investigation backwards. I had 117 skills consuming 413K tokens on disk and assumed that was the problem. I spent two hours optimizing them before I thought to actually measure what was in my context window. Turns out, skills are on-demand — they never touch the context window at all.

The biggest consumer was something I never would have guessed: a single plugin loading ~27K tokens of tool definitions into every message. This is the story of how I found it, scoped it down, and — importantly — how you can configure it to match your workflow without losing functionality.

What makes this different? There are already several great articles about MCP context optimization (devbolt.dev, The New Stack, StackOne, blog.pamelafox.org). This one adds: real measured token numbers from a production setup, the /context command as a diagnostic tool, the Azure MCP namespace scoping solution, and the Squad orchestration angle.

Step 1: Measure First — Check Your Token Breakdown

I run GitHub Copilot CLI with a multi-agent orchestration setup — half a dozen MCP servers, several plugins, and 117 skills. Mid-session, I got curious about what my context window actually looked like and ran /context:

Before optimization: /context showing 52% usage and compaction to 40%

Context Usage
claude-opus-4.6 · 104k/200k tokens (52%)

System/Tools: 62.5k (31%)
Messages: 41.8k (21%)
Free Space: 55.3k (28%)
Buffer: 40.4k (20%)

52% consumed before typing a single message. The System/Tools bucket alone was 62.5K tokens — 31% of my 200K window. That's the baseline cost of my setup: agent instructions, MCP tool definitions, system prompt, memories.

With only 28% free space, complex multi-agent tasks would trigger compaction mid-session. I needed to find what was actually consuming those 62.5K tokens — and the only way to know for sure was to audit what's always-loaded vs. what lives on disk.

Step 2: Distinguish Always-Loaded from On-Demand

The first question to ask is not "what's biggest?" but "what's always in context?"

The System/Tools bucket contains everything that loads on every message — unconditionally. If I can reduce that, every operation gets cheaper. Optimizing anything else only helps specific operations.

I built a breakdown:

Consumer~TokensWhen LoadedControllable?
MCP/Plugin tool definitions~27K+Every message✅ Scope or remove
Agent instructions~20KEvery message✅ Slim it down
System prompt + memories~10KEvery messagePartial
Skills~143K on diskOn-demand onlyCan optimize, but won't help context
Conversation historyGrowingAccumulatesFresh sessions help

Key insight: Skills sit on disk until an agent explicitly requests one. They are never in the context window. Optimizing them makes individual agent spawns cheaper and faster — valuable for performance — but they don't contribute to System/Tools at all. (I learned this after spending two hours optimizing them. Do as I say, not as I did.)

Context consumers breakdown: MCP tools 30-40K, agent instructions 20K, skills on-demand

Step 3: Audit What's Always-Loaded

The mystery is: what's in that 62.5K System/Tools bucket?

MCP Tool Definitions (~6–10K tokens)

MCP servers inject their tool schemas into every message. I had:

  • GitHub MCP — ~15 tools (issues, PRs, code search, actions)
  • Mail MCP — ~20 tools (search, send, reply, forward, attachments)
  • PowerBI MCP — ~6 tools (execute query, generate query, get schema)
  • M365 Agents Toolkit — ~4 tools (knowledge, snippets, schema, troubleshoot)
  • IDE — ~2 tools (diagnostics, selection)

These are real — about 47–55 tools across all servers. But they're only ~6–10K tokens total. Where's the other 50K?

The Azure Plugin (~27K tokens) — The Biggest Consumer

I checked ~/.copilot/settings.json and found the Azure plugin enabled:

PluginSourceImpact
azuremicrosoft/azure-skills50+ tools, ~27K tokens

Here's the thing about the Azure MCP Server: it's comprehensive. Version 3.0.0-beta.6 has 259 tools across 56 namespaces — covering everything from ACR to Virtual Desktop to Well-Architected Framework. That breadth is genuinely impressive, and the team clearly designed it to be a one-stop shop for Azure developers.

The good news: the team also thought carefully about how developers actually work. They built in namespace scoping and mode selection so you don't have to load the entire surface area. In its default "namespace" mode, it groups tools by service — but if you're only using a few services, you can filter down to just those. More on that in a moment.

In my case, the default configuration was loading 50+ tool schemas into every message — even when I wasn't doing Azure work in that session. Not a bug, just a configuration I hadn't tuned yet.

Azure plugin details: 4 plugins consuming context, 50+ tools at 30-40K tokens

Agent Instructions (~20K tokens)

My agent governance file — .github/copilot-instructions.md at the repo root — is 80KB. It loads on every turn. This is the ongoing cost of a sophisticated agent setup: the orchestration rules are comprehensive, and they load unconditionally whether I need them or not.

Step 4: Scope the Azure Plugin to Match How You Work

Once I understood the breakdown, the fix was straightforward. The Azure MCP team built exactly the right lever for this — namespace scoping lets you declare which services matter for your project and ignore the rest. No functionality lost, just a tighter fit.

Option A: Disable Entirely (Full removal)

If you genuinely don't use Azure, just turn it off:

// ~/.copilot/settings.json
"azure@azure-skills": false

This is what I did initially — it dropped System/Tools from 62.5K → 35.2K, freeing ~27K tokens instantly.

Azure plugin disabled: azure@azure-skills set to false

This is where the Azure MCP Server's design really shines. The team built namespace filtering specifically for this use case — you declare the services relevant to your project, and only those tool schemas load into context.

Configure it in your MCP settings with the --namespace flag:

--namespace appservice --namespace cosmos --namespace keyvault --namespace storage

This gives you 4 namespaces (~24 tools) instead of 56 namespaces (~259 tools) — a significant reduction in context usage while keeping the Azure tools you actually use.

Azure MCP Modes

The server supports 4 modes that control how tools are exposed:

ModeBehaviorBest For
namespace (default)One tool per service namespaceCopilot — good balance
consolidatedGroups operations by user intentNatural language workflows
singleOne routing tool for everythingMaximum simplicity
allEvery operation as a separate tool (259!)Maximum granularity — high context cost

Pick Your Stack

Here's a quick reference for common developer personas:

If you work with...Namespaces to keep
Web appsappservice, cosmos, keyvault, storage, functions
Data/Analyticscosmos, sql, kusto, eventhubs, storage
DevOps/Infracompute, aks, azureterraform, deploy, monitor
AI/MLfoundryextensions, search, speech, applicationinsights

All 56 Namespaces (Reference)

For the curious, here's the full list with tool counts. Use this to build your own --namespace filter:

NamespaceToolsNamespaceToolsNamespaceTools
acr2advisor1aks2
appconfig5applens1applicationinsights1
appservice7azurebackup16azuremigrate2
azureterraform10azureterraformbestpractices1bicepschema1
cloudarchitect1communication2compute12
confidentialledger2containerapps1cosmos2
datadog1deploy5deviceregistry1
eventgrid3eventhubs9extension3
fileshares14foundryextensions7functionapp1
functions3grafana1group2
keyvault8kusto7loadtesting6
managedlustre18marketplace2monitor16
mysql6policy1postgres6
pricing1quota2redis2
resourcehealth2role1search6
servicebus3servicefabric2signalr1
speech2sql13storage7
storagesync18subscription1virtualdesktop3
wellarchitectedframework1workbooks5

VS Code Users

You can also scope Azure MCP visually: click the gear icon next to the chat panel → select/deselect at the server, namespace, or individual tool level. No config files needed.

Other Filtering Options

  • Individual tools: --tool azmcp_storage_account_get --tool azmcp_cosmos_query for surgical precision
  • Combine namespace + tool filters for maximum control

Step 5: Then Optimize On-Demand Content (Optional)

Now that the always-loaded problem was solved, it was the right time to optimize skills — not because they consume context window (they don't), but because they improve individual agent spawn performance.

I spent two hours optimizing 117 Copilot CLI skills — reducing them from 413K to 143K tokens on disk, a 65% reduction. The process used waza_tokens to find bloated skills and patterns like reference extraction and checklist compression.

This didn't move the System/Tools percentage. But it made agent spawns faster and cheaper to run. Both wins are real — you just optimize them for different reasons.

Step 6: Measure Results

Fresh session after Azure disabled: context at 35K/200K (18%)

After scoping the Azure plugin:

System/Tools:  35.2k (18%)
Total usage: ~70k/200k (35%)
Free Space: ~90k (45%)

After upgrading the agent coordinator file:

System/Tools:  25.5k (13%)
Total usage: 26k/200k (13%)
Free Space: 134.1k (67%)

The remaining ~10K drop from 35.2K → 25.5K came from upgrading my agent coordinator file — the new version replaced the old 80KB governance prompt with a leaner one. Skill optimization (270K saved on disk) didn't affect this number because skills are on-demand and never in the context window.

Final state: context at 26K/200K (13%), 67% free space

The Scorecard

ActionTokens FreedEffortContext Impact
Scope Azure plugin~27KConfig changeSignificant — always loaded
Upgrade agent coordinator file~10K1 commandSignificant — always loaded
Optimize 117 skills~270K on disk2 hours, 106 filesZero on context — but faster agent spawns

System/Tools went from 62.5K → 25.5K. Free space went from 28% → 67%. That's 2.4x more room for actual work.

The counterintuitive lesson: The biggest token savings came from the smallest changes — because I measured first instead of guessing.

Why Measurement First Matters

Most people (including me, initially) assume the biggest files on disk must be the problem. It's intuitive. It's wrong.

Skills: 143K on disk → 0K in context. Azure plugin: 50+ tools → ~27K in context every message.

Without checking /context, I would have spent all my time optimizing the wrong thing. I did optimize skills first (and it was worthwhile for other reasons), but the crucial discovery was always-loaded vs. on-demand. I'm reframing my mistake as a teaching moment: measure first, then optimize.

Quick Diagnostic Guide

This is the methodology. Use it whenever context runs tight:

MCP config file layers: user vs repo level, what you can control

  1. Run /context — see your actual breakdown
  2. Check plugins~/.copilot/settings.json — scope or disable unused ones (biggest wins are usually here)
  3. Scope your MCPs — use namespace filtering, tool filtering, or mode selection to load only what you need
  4. Check MCP servers~/.copilot/mcp-config.json and .copilot/mcp-config.json — remove servers you don't use daily
  5. Check agent instructions — if you use custom agent governance files, they load every turn
  6. Skills are usually fine — they're on-demand, not always-loaded
  7. Start fresh sessions — conversation history accumulates; don't run marathon sessions

The biggest wins are almost always in steps 2–3. Scoping one plugin can save more context than hours of file optimization.

What About Hooks?

One thing I haven't tested yet: Copilot hooks (commit hooks, pre-push hooks, custom event hooks). These are lightweight by design — they're shell scripts or short instructions, not loaded into the context window the way MCP tool definitions are. They fire on specific events rather than sitting in the always-loaded bucket.

That said, if you have hooks that reference large config files or trigger MCP calls, those downstream effects could impact context during execution. Worth running /context before and after adding hooks to verify. My expectation is minimal impact, but I'll update this post once I've measured it directly.

The Setup

  • GitHub Copilot CLI v1.0.40
  • Squad v0.9.4-insider.1 for multi-agent orchestration
  • 117 skills in .copilot/skills/ — now ~143K tokens (optimized)
  • 5 MCP servers (GitHub, Mail, PowerBI, M365 Agents Toolkit, IDE)
  • Azure plugin: scoped to needed namespaces (the one that mattered)
  • Model: Claude Opus 4.6 with 200K context window

Investigation: May 5, 2026. The key lesson: measurement comes before optimization. Run /context and let the data guide your effort, not your intuition about file sizes. And when you find an MCP consuming more than you need — scope it down to match how you actually work.

The skills optimization ran same session — 117 skills reduced by 65% (413K → 143K tokens on disk) using waza tools.

Embarking on a Cloud-native Journey with a Todo API

· 4 min read

Our cloud-native adventure begins with the API layer - the magical bridge between the front-end UI and the back-end services. For our Todo project, we're keeping the API simple and efficient. Express.js is our chosen framework, a tried-and-true Node.js project. With the power of Copilot Chat, we'll be speeding through the process in no time!

How do you typically approach building a new API for a cloud-native project?

Fire Up the Dev Container

In the previous chapter of our journey, 002-developer-environment-setup, we set up a robust dev environment. Now, it's time to bring it to life! Open it in GitHub Codespaces or locally on your computer with Visual Studio Code (Docker installation required).

Whether you're the lead developer or part of a team, whether you're working on a familiar project or exploring new territories, you've got options. Visual Studio Code and the dev container for local work, or Codespaces for a cloud-based approach.

Stay in the dev container

If you are like me, your local computer may not be a workhorse so docker may not be started when you start working on your project. You can start using Copilot chap locally, then realize you need the dev container for something. The Copilot chat stays with the environment, it doesn't move (at this time). If you are 20 questions into your conversation with a few side trips here and there, switching environments and not having the chat to reference is frustrating.

If you are using dev containers and Copilot chat, start and stay in the container for the entire conversation.

Building an API with Copilot Chat

In just half an hour, Copilot Chat helped me create a fully functional API, complete with types, linting, tests, and a build-test workflow. Here's a sneak peek into the prompts I used:

  1. Building a todo microservice with Node.js and TypeScript.
  2. Adding tests for the todo API.
  3. Refactoring server.ts for both server and test.
  4. Modifying server.ts for CRUD operations.
  5. Providing initial sample data.
  6. Creating an OpenAPI yaml for the API.
  7. Adding an OpenAPI UI route.
  8. Setting up ESLint with Prettier.
  9. Deciding .gitignore contents.
  10. Moving openapi.yaml to the dist folder using tsc.
  11. Creating a GitHub action for linting, building, and testing.
  12. Identifying missing microservice elements.
  13. Adding type safety.

Refining Types and Refactoring

There was some back-and-forth over types and refactoring. Copilot shone in evaluating incoming API request data. After a few prompts, the validation looked like this:


export interface Todo {
id: number;
title: string;
}

export interface PartialTodo {
id?: unknown;
title?: unknown;
}

const todoSchema = Joi.object({
id: Joi.number().greater(0).required(),
title: Joi.string().min(1).max(1000).required(),
}).unknown(false);

const todoPartialSchema = Joi.object({
id: Joi.number().greater(0),
title: Joi.string().min(1).max(1000).required(),
}).unknown(false);

export interface TodoValidation {
valid: boolean;
error: Error | null | string | ValidationErrorItem[];
todo: Todo | PartialTodo | null;
}

export const validateTodo = (
todo: PartialTodo,
isNewTodo: boolean = false
): TodoValidation => {
const schema = isNewTodo ? todoPartialSchema : todoSchema;
const { error } = schema.validate(todo);
if (error) {
return {
valid: false,
error: error.details,
todo: null,
};
}
return { valid: true, error: null, todo };
};

Copilot also excelled in creating logging handlers for requests and responses, and in adding those handlers to the route.

Copilot's Strengths

Copilot gave me a flying start. While some answers lacked details, running the app or tests quickly revealed any errors, which were easy to fix.

Copilot's Weaknesses

Despite the conversation and the wealth of examples, I encountered more issues than expected. For more obscure subjects, I'd recommend breaking down the steps more atomically.

Why Not Use Existing Code?

Sure, there are plenty of examples on GitHub. But navigating licenses and attributions can be tricky. I preferred to avoid any potential missteps.

Time Investment

From start to finish, the project took about 2 hours. There were a few hiccups along the way, but each step was small, making issues manageable. Considering everything, 2 hours is a solid benchmark for a proof-of-concept project.