3 posts tagged with "mcp"

Portable Personal Context Across AI Client Surfaces

July 10, 2026 · 16 min read

Many developers use multiple AI surfaces daily: GitHub Copilot in VS Code, Copilot CLI, Microsoft 365 Copilot, Microsoft Scout, Claude, ChatGPT, or Cursor.

The problem: each surface starts without the context you gave the last one. Preferences, current work, boundaries, and decisions stay trapped in whichever tool you told.

This post proposes a portable personal context source: structured markdown files in a GitHub repo that AI tools can read. No vendor supports this end-to-end today. The storage can be portable; the behavior is not automatic. Each surface still needs its own wiring.

The Problem: Context Islands

Context islands diagram showing isolated AI surfaces

Surface	What it knows about you	Where that knowledge lives
Copilot in VS Code	`.github/copilot-instructions.md` in current repo	Per-repo, per-machine
Copilot CLI	Local instructions, skills, plugins, MCP servers, and session data	CLI-specific local state
Microsoft 365 Copilot	Your M365 Graph data (emails, calendar)	Cloud, not exportable
Microsoft Scout	Memories, preferences, profile	Local app state
Claude	`CLAUDE.md` per project, memory	Per-project file + cloud memory
Cursor	Project Rules in `.cursor/rules/*.mdc`, plus `AGENTS.md` support	Per-project rules

The pattern: each tool has its own user-context format. The result:

Repeated preferences in every tool ("I prefer concise output," "use TypeScript," "don't auto-push to main")
Decisions invisible outside the surface where they happened
Expertise and boundaries known only where you stated them

You Already Know What the Solution Feels Like

Personalization features exist in every tool but are locked to that tool

Built-in Feature	What it does	The problem
M365 Copilot: Custom instructions	"Be concise, use tables"	Doesn't reach VS Code or CLI
M365 Copilot: Work profile	Your role, org, skills	Locked in Microsoft Graph
M365 Copilot: Saved memories	Facts remembered between sessions	Only M365 Copilot sees them
ChatGPT: Memory	Auto-extracted facts about you	Only ChatGPT sees them
Claude: CLAUDE.md	Per-project instructions	Only Claude Code sees them
Cursor: Rules	Coding preferences	Only Cursor sees them

Each tool stores personalization separately. Portable context makes the source shared:

Today:
  M365 Copilot    → knows you like concise output
  VS Code Copilot → doesn't know (asks again)
  Copilot CLI     → doesn't know (asks again)
  Scout           → has its own separate copy
  Claude          → has its own separate copy

With portable personal context:
  Wired surfaces  → read from the same source → load your preferences

Why Not Just Use Those Existing Features?

	Built-in Personalization	Portable Context
Portability	One surface only	Shared source; manual wiring per surface
Transparency	Opaque ("View work data")	Human-readable markdown
Exportability	Can't export	`git clone` anywhere
Versioning	No history	Full git history
Control	Platform decides format	You decide format
Decisions	No structured log	Append-only ledger
Auto-extraction	Yes (convenient)	Manual (precise)

Use built-in personalization where it exists; keep portable context as the canonical source you can inspect and version.

Why Not CLAUDE.md, copilot-instructions, or Cursor Rules?

Those files are instructions TO the AI for one project. Personal context is information ABOUT you across tools. Cursor's current model is Project Rules in .cursor/rules/*.mdc; .cursorrules is legacy.

Scope comparison showing per-tool files as narrow vs personal context as universal

.github/copilot-instructions.md  → "In this repo, use ESM imports"
personal-context/process/...     → "I always prefer ESM over CommonJS"

Repo-level files govern a codebase. Personal context governs how to work with you.

Why Not Just an Agent or Skill?

Agents and skills are task-scoped. Personal context is user-scoped.

Persona hierarchy showing person above process above skills above agents

	Personal Context	Agent (agent.md)	Skill (SKILL.md)
Answers	"Who is this person?"	"How should I behave?"	"How do I do this task?"
Scope	Everything you do	One role or surface	One repeatable procedure
Lifespan	Years (grows with you)	Months (evolves with tooling)	Weeks (refined per use)
Portability	Shared source across wired surfaces	One surface	Some surfaces

Personal context should feed agents and skills, not duplicate them. Without it, agents start without your quality bar, boundaries, or past decisions.

Why Not Mem0 or a Cloud Memory Service?

Mem0 is a cloud API for persistent AI memory. It makes different tradeoffs:

	Mem0 (Cloud)	Personal Context Repo
Architecture	Hosted API service	Local-first (files in git)
Data ownership	Third-party hosted	You own it (your GitHub)
Works offline	No	Yes
Vendor dependency	Yes (Mem0 API key)	No (just git)
Human-readable	No (vector store)	Yes (markdown you can edit)
Versioned	No (mutable state)	Yes (git history + blame)
Semantic search	Yes (their strength)	No (not needed at personal scale)
Best for	App builders serving many users	Individual developers across their own tools

For one developer, dozens of curated facts, decisions, and preferences may not need a hosted dependency.

Personal Context Is Not Memory

Context vs Memory promotion pipeline

Key distinction: personal context is not memory. They overlap, but they need different storage, governance, and precedence rules.

Context is declared, curated, and authoritative. Memory is accumulated from use.

	Personal Context	Memory
Origin	Authored intentionally	Accumulated automatically
Nature	Curated / declared	Accreted / observed
Authority	Authoritative ("this is the rule")	Evidentiary ("this is what happened")
Example	"My branch naming convention is `{type}/{id}-{slug}`"	"Last Tuesday you renamed a branch to `wip-2`"
Volume	Small, deliberate (~50-150 facts)	High-volume, ever-growing
Governance	Human-reviewed	Auto-captured

They are two ends of a promotion pipeline:

observation  →  candidate  →  [ratification gate]  →  context
 (memory)       (proposed)     (a human decision)      (canonical)

Memory is the raw feed. Context is the reviewed output. The ratification gate is the control point. The architecture changes:

Stores. Memory wants a high-volume append log. Context wants a small, curated, versioned set.
Precedence. Context outranks memory. A remembered exception does not override a stated boundary.
Retrieval and governance. Context is load-always instruction; memory is search-when-relevant evidence.

This post is about context, not memory: authored, reviewed, and portable.

The Insight: LLMs Already Speak Markdown

Markdown plus GitHub enables cross-tool portability

AI tools read files. LLMs understand markdown. Developer tools commonly authenticate with GitHub.

A private GitHub repo with structured markdown files can be the shared context source.

Any surface that can read the repo can load the same context.

The Architecture: Personal Context as a Repo

Architecture diagram showing canonical repo feeding multiple surfaces

github.com/<yourname>/personal-context  (placeholder private repo)
│
├── context.json              ← Manifest: what's here + retrieval rules
│
├── core/                     ← RARELY CHANGES (your "constitution")
│   ├── expertise.md          # What you know, your domain authority
│   ├── boundaries.md         # What stays human, what AI never does alone
│   ├── role.md               # Job, scope, organization
│   └── communication.md     # How you prefer to interact
│
├── decisions/                ← APPEND-ONLY (your "ledger")
│   ├── _active.md            # Decisions still governing current work
│   ├── 2026-07.md            # This month's new decisions
│   └── ...
│
├── process/                  ← STABLE (your "playbook")
│   ├── content-workflow.md   # How you create content
│   ├── code-workflow.md      # How you write and ship code
│   ├── quality-bar.md        # Definition of done per work type
│   └── tool-preferences.md  # Preferred tools and patterns
│
├── active/                   ← CHANGES OFTEN (your "whiteboard")
│   ├── projects.md           # Current active projects
│   ├── sprint-focus.md       # This sprint's commitments
│   └── parking-lot.md        # Deferred items
│
└── .github/
    └── copilot-instructions.md  # Tells Copilot how to USE this repo

Why Four Layers?

Separate by durability: how often it changes and who can change it.

Four layers diagram showing durability spectrum

Layer	Half-life	Mutability	Example
Core	Months/years	Human-only	"I'm a senior developer on the Azure SDK docs team"
Decisions	Permanent (append-only)	Any surface proposes; append after human confirmation	"Use generation pipeline for MCP namespace files"
Process	Weeks/months	Propose via PR	"Branch naming: `{type}/{id}-{slug}`"
Active	Days/weeks	Any surface updates after pull-before-push	"Sprint focus: Ship auth-flow feature"

What Goes in Each Layer

Core: Your Constitution

Rarely changing context: expertise, boundaries, communication preferences.

core/expertise.md — What you know:

## Domain Expertise
- Azure SDK documentation across JavaScript, Python, .NET, Java, Go, Rust
- AI developer tools (MCP servers, AI Toolkit, Copilot extensions)
- Content workflow automation and multi-agent orchestration
- Technical writing for developer audiences

## Not My Expertise (don't assume I know)
- Kubernetes operations / cluster management
- Frontend framework internals (React, Vue, etc.)
- ML model training / fine-tuning

core/boundaries.md — What stays human:

## What AI Should Never Do Autonomously
- Push code to upstream repositories (only to forks)
- Send emails, Teams messages, or any outbound communication
- Close or resolve work items without my confirmation
- Delete files, branches, or repos
- Make irreversible changes without showing me the plan first

## What AI Can Do Without Asking
- Read files, search code, explore repos
- Draft content for my review
- Run tests, linting, builds
- Create branches on my fork
- Propose edits (but not commit without confirmation)

Decisions: Your Ledger

Decisions can be proposed from any surface so settled questions stay settled after review.

decisions/_active.md — Still-relevant decisions:

### [2026-07-06] Branch naming convention
- **Context:** Inconsistent branch names across repos
- **Decision:** Always use `{type}/{work-item-id}-{brief-slug}`
- **Types:** feat, fix, docs, refactor, test

### [2026-06-15] Prefer tables over prose for comparisons
- **Context:** AI kept writing long paragraphs comparing options
- **Decision:** When comparing 3+ options, always use a table
- **Supersedes:** Nothing (new preference)

### [2026-05-28] No hand-written namespace files
- **Context:** Generated files were higher quality than hand-written
- **Decision:** All namespace articles must come from the generation pipeline
- **Implications:** Slower to ship, but deterministically correct

Process: Your Playbook

Work preferences. Update as workflow changes.

process/quality-bar.md:

## When Is a Pull Request Done?
- [ ] Work item linked with "Fixes AB#{id}"
- [ ] Meaningful title and description (not just commit messages)
- [ ] No unrelated changes (surgical edits only)
- [ ] CI passes
- [ ] Review comments addressed, not dismissed
- [ ] Staged preview links included for doc changes

## When Is an Article Done?
- [ ] Technically accurate (verified against product behavior)
- [ ] Code samples run without modification
- [ ] All links resolve (no 404s)
- [ ] Metadata correct (ms.topic, ms.date, ms.service)
- [ ] Reviewed by at least 1 peer

Active: Your Whiteboard

Current work state, writable by any surface.

active/sprint-focus.md:

## Sprint 14 (2026-07-01 → 2026-07-12)

### Committed
1. Ship MCP auth namespace docs (AB#4521)
2. Review 3 community PRs on azure-dev-docs
3. Update AI Toolkit quickstart for v0.9

### Stretch
- Prototype portable context layer (this project!)

How Surfaces Consume It

Selective Retrieval: Only Load What's Relevant

Do not load the whole repo. Use context.json to choose task-relevant files:

Read context.json (< 1KB, always cached)
ALWAYS load: core/boundaries.md + core/communication.md (~400 words)
Classify the current task → match to load_by_task
Load those 2-3 files (~500 words)
Load decisions/_active.md (~300 words)

Total: ~1,200 words ≈ 1,600 tokens

Retrieval flow diagram

The Priority Stack

Resolve contradictions deterministically:

core/boundaries.md          ← ALWAYS wins. Non-negotiable.
decisions/_active.md        ← Settled questions. Don't re-ask.
process/*.md                ← How to do things. Follow unless overridden in-session.
active/*.md                 ← Informational state. Not authoritative.

Priority stack diagram

Threat Model: Retrieval Is the Boundary

The primary risk is prompt injection causing a surface to retrieve or reveal context it should not have. Trust tiers must be enforced before retrieval, by deciding what files or slices enter the prompt. Output scanning is not a security boundary; use it only as defense in depth. Keep two boundary files if needed: shareable operating rules that most tools can load, and private sensitive constraints that only trusted surfaces can retrieve.

Writing Back: Closing the Loop

After human confirmation, any surface can write decisions back:

# After making a decision in any surface:
cd ~/personal-context
echo "
### [$(date +%Y-%m-%d)] {decision title}
- **Context:** {why this came up}
- **Decision:** {what was decided}
- **Implications:** {what this means going forward}
" >> decisions/_active.md

git add decisions/_active.md
git commit -m "decision: {brief title}"
git push

That simple append is safe only for a single writer with a fresh clone. Multi-surface writes need write intents with IDs and timestamps, pull-before-push, and PR-based reconciliation for stale updates or conflicts. The core layer stays human-only; non-active layers should go through review instead of direct overwrite.

Connecting Each Surface (Proposed Integrations)

Examples only. Some work manually; others need vendor support. The mechanism: read files and inject context.

GitHub Copilot in VS Code

In your user-level settings.json:

{
  "github.copilot.chat.codeGeneration.instructions": [
    { "file": "~/personal-context/core/boundaries.md" },
    { "file": "~/personal-context/core/communication.md" },
    { "file": "~/personal-context/decisions/_active.md" }
  ]
}

Or reference the repo in any project's custom instructions:

<!-- .github/copilot-instructions.md in any repo -->
For my personal preferences and decisions, reference:
https://github.com/<yourname>/personal-context

Copilot CLI

Use the current standalone copilot CLI. Put durable CLI instructions in $HOME/.copilot/copilot-instructions.md, then point those instructions at the cloned context repo:

mkdir -p ~/.copilot
cat > ~/.copilot/copilot-instructions.md <<'EOF'
For personal preferences and decisions, read:
- ~/personal-context/core/communication.md
- ~/personal-context/core/boundaries.md
- ~/personal-context/decisions/_active.md

Treat the repo as reference context. Do not rewrite core files without human approval.
EOF

For richer integration, expose the same repo through an MCP server and register it with copilot mcp.

Microsoft Scout

Scout exposes settings for memory, personality presets, workspace, and permissions. Use those surfaces to mirror the same repo-backed preferences manually or through a sync process. A generated profile file can work as an implementation pattern, but the path below is illustrative, not a documented Scout contract:

# Sync script: pull personal-context → render an illustrative Scout profile
$role = Get-Content ~/personal-context/core/role.md -Raw
$comms = Get-Content ~/personal-context/core/communication.md -Raw
$boundaries = Get-Content ~/personal-context/core/boundaries.md -Raw

@"
# Personal Profile
$role

## Communication
$comms

## Boundaries
$boundaries
"@ | Set-Content ./scout-profile-example.md

Microsoft 365 Copilot

Sync the repo to a OneDrive folder:

OneDrive/personal-context/ → synced from GitHub repo

Any MCP-Enabled Tool (Claude, Cursor, ChatGPT)

Expose the repo as an MCP resource server, or clone it locally and point the tool config to the files. MCP is the integration protocol for tools, resources, and prompts; use it to expose context as resources or tools where supported.

Getting Started: Example 30-Minute Setup

Timeline showing 30-minute setup in 4 steps

This is a rough first-pass estimate, not a guarantee. The ongoing cost is maintenance: review proposed changes, resolve conflicts, and prune stale active context.

1. Create the repo (5 minutes)

gh repo create personal-context --private
cd personal-context
mkdir -p core decisions process active .github

2. Write your identity (10 minutes)

Write what you would tell a new team member on day one.

3. Capture your first decisions (10 minutes)

Write five repeated preferences or decisions in decisions/_active.md.

4. Connect one surface (5 minutes)

Wire up VS Code settings, Scout profile, or a CLI alias. Verify it loads context.

5. Evolve naturally

When an AI asks something it should know, write it down, commit, and push.

The Payoff

Before and after comparison showing repetition eliminated

Before	After
"I prefer concise output" (every session)	A wired surface can load it from `core/communication.md`
"Use fork-first workflow" (every PR)	A wired surface can load it from `process/code-workflow.md`
"We decided to use the pipeline" (re-explained monthly)	A wired surface can load it from `decisions/_active.md`
"My sprint focus is X" (repeated across tools)	A wired surface can read `active/sprint-focus.md`
Start over in each new tool	Start from the same source after each tool is wired

Likely payoff: fewer repeated preferences, fewer re-decisions, and faster starts. Keep time-saved claims only if measured.

Beyond the Repo: When a Service Makes Sense

Context broker architecture with MCP facade and trust tiers

The repo is the floor: markdown, git, no server, no vendor, no API key. Use a service only when a flat repo cannot enforce access. Git gives readers the whole file; a hosted service can return only authorized slices.

What a hosted version would buy you

Server-side redaction by trust tier. Enforce public / work / private tiers at the server. A flat repo cannot do that; clone access gets everything.
Identity-based audit and access control. Log who read context, when, and from which surface.
Central precedence. Resolve boundaries, decisions, process, and active state once instead of per surface.

The key design call: contract vs. transport

Do not make MCP the canonical contract. MCP is the integration protocol, not the canonical data model. Keep Git as the source of truth. If you build a service, make it a derived read facade over the repo, with a REST/OpenAPI contract and MCP exposed as a thin facade where clients support it.

Keep the contract you can't afford to rewrite in Git and REST; expose MCP as a facade you can afford to replace.

Version the REST facade carefully. Treat MCP adapters as replaceable.

What the service actually is: a context broker

The service has four jobs:

Merge — combine the layers (core, decisions, process, active) into one view.
Priority — apply the precedence stack so conflicts resolve deterministically.
Redaction — return only the caller's trust tier.
Defense-in-depth scanning — flag output that appears to reveal a tier the caller should not see.

The third job is the security boundary for prompt-injection exfiltration. With server-side redaction before retrieval, private context never enters the prompt for an untrusted caller. The fourth job can catch mistakes, but it cannot make unsafe retrieval safe.

Even a service still hits the standards wall

The limitation: even with a hosted service, consumption stays uneven:

Surface	Talks to a remote MCP server?
VS Code / Copilot CLI / Foundry	Yes — directly
Claude / ChatGPT	Yes — directly where the surface, plan, and auth model allow remote MCP
Microsoft 365 Copilot	No — it wants Graph connectors / declarative agents

The hosted version still needs a shared standard. For one developer, the repo is usually enough. A service earns its complexity only with multiple trust tiers, multiple consumers, or a real injection threat model.

What's Next: The Standard That Doesn't Exist Yet

Convergence diagram showing vendors approaching a missing standard

This post is a proposal, not a product announcement. Today, none of this works automatically. Each tool reads its own context files, in its own format, from its own location. Vendors are adding memory, custom instructions, project files, and agent profiles, but not a shared context standard.

What's missing is a shared standard for where personal context lives and how to read/write it. The Model Context Protocol standardized tool integration; user identity needs the same kind of agreement. No shared standard exists today.

The GitHub repo approach is a bet: structured markdown plus a retrieval manifest could work if tool builders agreed to read it.

The ask to tool builders: Add an $AI_CONTEXT_PATH or equivalent. Let users point to markdown context. Portable context works when surfaces agree to read the repo.

GitHub Copilot: From Basics to AI Agents

May 15, 2026 · 22 min read

Watercolor illustration of a woodworker in blue meeting his first AI helper in green at a furniture workshop

Imagine a furniture workshop. You're the craftsperson in the blue shirt — the one with the vision, the taste, the final say. The helpers in green shirts? Those are your AI agents. At first there's just one, handing you the right chisel at the right moment. By the end of this journey, you'll have a whole crew in green building furniture to your specifications while you direct, decide, and review.

A year ago, I was tab-completing function signatures. Today, I manage a team of named AI agents that handle PR reviews, documentation sweeps, and infrastructure audits.

That sounds like a sales pitch. It's not. It's a progression that happened one level at a time, each building on the last. And the best part? You can start the same journey in about 15 minutes.

Here's the path I took — four levels, from "ooh that's cool" to "wait, this changes everything."

The TL;DR

Level	What Changes	Time to Value
1. First Day	You get an AI pair programmer (IDE + CLI)	15 minutes
2. Making It Yours	Copilot learns YOUR codebase (instructions, MCPs, skills)	1-2 hours
3. Squad	A team of agents working in concert	1 day
4. Autonomous Ops	Fully defined work executes itself	2-3 days

Each level builds on the previous one, and each is independently useful. Once you see what's possible at each stage, you'll want to keep climbing.

Badge legend: 🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 🤖 Autonomous · 💻 Local · ☁️ Cloud · 🌐 GitHub.com

Level 1: Your First Day with Copilot

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of a blue-shirted craftsperson at the workbench while a green-shirted helper steadies the joint

Your first day in the workshop. You're at the bench with your mallet (blue shirt), fitting a dovetail joint. Your one helper in green steadies the piece, hands you the right tool before you ask, and suggests a better angle — but you swing the mallet.

This is where everyone starts — and honestly, where most of the immediate productivity gains live. Level 1 spans two environments: Copilot in your IDE (VS Code, JetBrains, etc.) and the standalone Copilot CLI in your terminal.

In the IDE: Inline Completions & Inline Chat

🖥️ VS Code · 👤 Interactive · 💻 Local

Inline completions — the thing most people think of as "Copilot." You type, it suggests. But it's more than autocomplete. It reads your open files, your comments, your function signatures, and generates contextually aware suggestions. This happens directly in your editor as you type.

Inline chat — highlight code, press Ctrl+I, ask a question. "Explain this regex." "Refactor this to use async/await." "Add error handling." It edits in place within the current file.

In the IDE: Copilot Chat Panel

🖥️ VS Code · 👤 Interactive · 💻 Local

The Chat panel (Ctrl+Shift+I or the sidebar) opens a conversation with Copilot that has broader awareness:

Open file context — ask questions about the file you're looking at: "What does this function do?" "Find the bug in this logic."
@workspace — ask about the entire repository: "Where is authentication handled?" "Show me all API routes." Copilot searches across your project.
@terminal — get help with shell commands without leaving the IDE: "How do I find large files?" "What's the git command to squash commits?"
Agent mode — Copilot Chat also has an "agent" mode where it can make multi-step edits, run terminal commands, and iterate. This is powerful for IDE-based workflows, but note: this is different from the Squad "agents" discussed later. Agent mode is a single AI working iteratively; Squad agents are specialized team members working in concert.

The Standalone Copilot CLI

⌨️ CLI · 👤 Interactive · 💻 Local

The copilot command brings the full Copilot agent to your terminal — file editing, shell commands, sub-agents, and more:

# Non-interactive prompt mode:
copilot -p "extract a .tar.gz file preserving permissions"

# Ask about git:
copilot -p "undo my last commit but keep the changes"

# Start an interactive session:
copilot

The standalone CLI (copilot) is a full agent runtime — it can read/write files, run commands, and orchestrate complex tasks from your terminal. It's distinct from the IDE chat panel but equally powerful.

When to Use Each

Context	Best For
Inline completions	Flow-state coding, writing new functions
Inline chat (`Ctrl+I`)	Quick edits to selected code
Chat panel (open file)	Understanding code you're reading
Chat panel (@workspace)	Finding things across a project
Chat panel (@terminal)	Shell command help inside IDE
Agent mode (IDE)	Multi-step edits within a project
`copilot` CLI	Terminal-first workflows, scripting, automation

Try This Now

Install GitHub Copilot in VS Code
Open any project, start a new file, write a comment:

// Parse a CSV string into an array of objects using the first row as headers

Copilot will generate the implementation. Tab to accept.

Install the standalone Copilot CLI and try:

copilot -p "explain why this Node.js app leaks memory when processing large CSV uploads"

What I Learned at Level 1

The biggest gain wasn't the code generation — it was the velocity shift in unfamiliar territory. Working in a language I don't know well? Copilot bridges the gap between "I know what I want" and "I know the syntax." It turned 30-minute research into 30-second completions.

The limitation: Copilot at this level generates generic best-practice code. It knows nothing about your specific conventions or preferences. That leads to ...

Level 2: Making Copilot Yours

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of a blue-shirted craftsperson alone, setting up custom jigs and labeled drawers

No green shirts in sight — this is setup time. You're alone at the bench, labeling drawers, building custom jigs, and pinning reference cards to the pegboard. You're not building furniture yet; you're building the system that makes your workshop uniquely yours. When the green-shirted helpers return, they'll know exactly where everything goes.

Level 1 Copilot is smart but generic. Level 2 is where it starts feeling like a teammate who's read your wiki. This level works in both the IDE and CLI — the same instruction files and MCP configs are picked up by Copilot Chat in the IDE and Copilot CLI.

Custom Instruction Files

Drop instruction files in your repo and Copilot learns your conventions:

.github/copilot-instructions.md — global instructions for all Copilot interactions:

# Project Conventions

- Use TypeScript strict mode with explicit return types
- Prefer `Result<T, Error>` pattern over throwing exceptions
- All API responses follow our envelope format: `{ data, error, meta }`
- Tests use vitest with the `describe/it` pattern
- Never use `any` — prefer `unknown` with type guards

AGENTS.md — agent instructions that can live anywhere in your repo. Unlike copilot-instructions.md (which must be in .github/), you can place multiple AGENTS.md files at different directory levels — the nearest one in the directory tree takes precedence. This makes it ideal for monorepos where each package needs its own agent behavior:

my-monorepo/
├── AGENTS.md              ← shared team-wide instructions
├── packages/
│   ├── frontend/
│   │   └── AGENTS.md      ← React-specific agent rules (wins here)
│   └── backend/
│       └── AGENTS.md      ← API-specific agent rules (wins here)

Every suggestion Copilot makes now respects these rules. No more "helpful" suggestions that violate your architecture.

MCP Servers: Giving Copilot New Abilities

Model Context Protocol (MCP) servers let you plug external data sources and tools into Copilot's context. Think of them as APIs that Copilot can call mid-conversation — in both the IDE and CLI.

// .copilot/mcp.json
{
  "mcpServers": {
    "azure": {
      "command": "npx",
      "args": ["-y", "@azure/mcp@latest", "server", "start"]
    }
  }
}

Now Copilot can query your Azure resources, check deployment status, or read your database schema — all within the conversation.

Some MCP servers I use daily:

Copilot for Azure — query Azure resources, check deployments
GitHub MCP — deep repo operations beyond what's built-in
Microsoft Learn MCP — let Copilot read/write files outside the workspace

Skills: Repeatable, Deterministic Work

Skills are the underrated powerhouse of Level 2. A skill is a directory with a SKILL.md file that defines a repeatable pattern — including deterministic steps from scripts and code.

.<directory>/skills/
├── pr-review/
│   └── SKILL.md        # "Run lint, check test coverage, review diff"
├── doc-sync/
│   └── SKILL.md        # "Compare API surface to docs, flag drift"
└── sdk-sample-check/
    └── SKILL.md        # "Validate all samples compile and match SDK version"

Read the Visual Studio documentation for the best directory location for your skill usage.

Skills differ from instructions in that they define executable workflows — not just preferences. A skill can include shell commands to run, files to check, and decision trees to follow. They're reusable across sessions and agents.

Why skills matter:

Repeatable — same process every time, no drift
Composable — skills can reference other skills
Deterministic where needed — embed scripts and validation steps that always run the same way
Shareable — check them into your repo, the whole team benefits

Try This Now

Create .github/copilot-instructions.md with your project's conventions
Add an MCP server for a tool you use daily (Azure, database, etc.)
Create a .github/skills/quick-review/SKILL.md that describes your code review checklist

Then open Copilot Chat or run copilot and notice the difference — it follows YOUR patterns now.

What I Learned at Level 2

Custom instructions are absurdly high-leverage. A 50-line markdown file eliminates 80% of the "no, not like that" moments. MCP servers bridge "Copilot that knows code" and "Copilot that knows your infrastructure." Skills turn tribal knowledge into executable processes.

The limitation: everything is still per-session. Copilot doesn't automatically carry context between sessions — it won't remember decisions from yesterday's refactor. It doesn't have persistent context about your project's evolving state. It doesn't coordinate with other instances of itself.

Enter Squad.

Level 3: Squad — A Team Working in Concert

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of craftspeople collaborating at a shared workbench in a woodworking shop

The workshop is getting busy. You're at the bench, studying the blueprint. Around you, a small team of helpers is assembling a cabinet together — one holds the frame, another drives the dowels, another checks the level. Each knows their role. Each stays in their lane. The work moves faster because they each know their job and coordinate with each other, not just with you.

This is where the mental model shifts from "AI assistant" to "AI team."

Squad gives you a team of specialized agents working in concert to complete tasks, where each member's expertise and boundaries positively impact the result. It's not just "named agents with memory" — it's a coordinated system where the reviewer's standards shape the coder's output, and the docs writer's perspective catches gaps the implementer missed.

Squad runs on the Copilot CLI (copilot --agent squad) and adds the organizational layer that makes agents feel like a real team rather than a single assistant wearing different hats.

What Makes Squad Different

Feature	Regular Copilot	Squad
Memory	Session-based	Persistent across sessions
Identity	Generic assistant	Named agents with charters
Coordination	You manage context	Agents hand off to each other
Specialization	Same agent for everything	Domain-specific agents with boundaries
Result quality	One perspective	Diverse perspectives improve output

Installing Squad

# Install Squad CLI
npm install -g @bradygaster/squad-cli

# Initialize in your project
npx @bradygaster/squad-cli init

# Start Copilot with Squad (standalone CLI)
copilot --agent squad

This scaffolds a .squad/ directory:

.squad/
├── agents/
│   ├── ralph/           # Orchestrator
│   │   └── charter.md
│   ├── reviewer/
│   │   └── charter.md
│   └── docs-writer/
│       └── charter.md
├── ceremonies/
│   └── sweep.md
└── memory/
    └── decisions.md

Agent Charters: Expertise + Boundaries

Each agent has a charter — a markdown file that defines who they are and what they do and, critically, what they won't do:

# Reviewer Agent Charter

## Identity
You are the code reviewer for this project. You focus on:
- Security vulnerabilities
- Performance anti-patterns
- Consistency with project conventions

## What I Own
- TypeScript files and build system

## Boundaries
- Never approve your own changes
- Escalate architectural concerns to the team lead
- Don't refactor code that isn't in the PR scope

The boundaries matter as much as the expertise. A reviewer that knows when to escalate produces better outcomes than one that tries to handle everything. The interplay between agents — where one's output becomes another's input — is what makes Squad feel like a team rather than parallel solo workers.

Ceremonies: On-Demand Structured Workflows

Ceremonies are repeatable workflows you trigger when needed:

# Ceremonies & Rituals

## Design Review

**When:** Before PRD implementation begins  
**Who:** <list of named agents>
**Purpose:** Validate requirements, issue templates, and process flow before work starts

## Retrospectives

**When:** After major deliveries (GitHub Projects setup, issue templates, Actions automation)  
**Who:** All team members  
**Facilitator:** <single agent name> 
**Purpose:** Reflect on what worked, what didn't, continuous improvement

## Cross-Repo Sync

**When:** As needed  
**Owner:** <single agent name>
**Purpose:** Ensure coordination across all projects (reads repos.json for scope)

Ceremonies encode your team's best practices into executable workflows that any agent can run consistently.

Try This Now

With the Squad open in a Copilot CLI interactive chat, assign work to the squad.

# Then talk to the team:
"Team, fan out and review this PR for security issues"
"Ralph go"

What I Learned at Level 3

The agents and charter system is what makes Squad click. With it, you have agents that maintain consistent behavior, remember decisions, and build expertise over time. Without it, you have "Copilot with extra steps."

The real insight: diversity, expertise, coordination, and boundaries create quality.When the reviewer can't approve its own work, when the docs writer must verify against actual code, when the security agent escalates instead of guessing — the team produces better results than any single agent could alone.

The honest trade-off: Squad requires investment in codifying your work patterns and practices. A poorly-defined agent is worse than no agent because it gives inconsistent results. Spend the time upfront.

Level 4: Autonomous Operations

🖥️ VS Code · ⌨️ CLI · 🤖 Autonomous · 💻 Local · ☁️ Cloud

Watercolor illustration of workers building furniture at separate workbenches in a woodworking shop

The helpers are working independently across the shop — each at their own bench, each building a different piece from your specifications. One saws, one hammers, one planes. You glance across the room and trust the work because you wrote clear blueprints for your team of experts. They don't need you hovering.

Level 4 is where the work has been fully defined and you just need it completed. You've already figured out what needs to happen — now you hand it off and let the system execute.

This is the difference between "AI that helps me work" and "AI that does the work I've specified."

Five Ways to Run Autonomously

1. VS Code Agent Mode

🖥️ VS Code · 🤖 Autonomous · 💻 Local

In VS Code, Copilot's agent mode executes multi-step tasks — reading files, running commands, editing code — without manual intervention. You describe the outcome, and agent mode figures out the steps:

# In VS Code Copilot Chat (agent mode):
"Refactor all API handlers to use the new error envelope format"

Agent mode uses your custom instructions and MCPs from Level 2, so it already knows your project's conventions. Best for: well-scoped tasks while you're in the IDE.

2. Copilot CLI Agent Mode

⌨️ CLI · 🤖 Autonomous · 💻 Local

The standalone CLI provides the same autonomous execution outside VS Code:

# CLI agent mode — executes the full task autonomously
copilot -p "Refactor all API handlers to use the new error envelope format"

Best for: well-scoped tasks from the terminal, scripted workflows, or when you prefer the command line over the IDE.

3. "Ralph, go" — Squad Work Queue (In-Session)

⌨️ CLI · 🤖 Autonomous · 💻 Local

Ralph, the Squad work monitor, processes your entire work queue autonomously within a Copilot session. First, connect Squad to your repo's issues ("pull issues from owner/repo"). Then Ralph triages those issues, assigns work to the right specialist agents, monitors progress, and keeps going until the board is clear:

# In a Copilot CLI session with Squad:
copilot --agent squad

# Then:
"Ralph, go"          # → Starts processing the work queue
"Ralph, status"      # → Shows what's open, stalled, or ready to merge

Ralph monitors GitHub issues, triages incoming work, and drives tasks through your agent team without you intervening. It doesn't stop between tasks — it keeps cycling until everything is done.

Best for: in-session work queue processing, multi-agent coordination, and batching related tasks.

4. Squad Watch — Persistent Local Monitoring

⌨️ CLI · 🤖 Autonomous · 💻 Local

When you're away from the keyboard but your machine is on, squad watch provides persistent polling of your GitHub issues:

# Polls every 10 minutes (default)
npx @bradygaster/squad-cli watch

# Custom intervals
npx @bradygaster/squad-cli watch --interval 5    # every 5 minutes
npx @bradygaster/squad-cli watch --interval 30   # every 30 minutes

This runs as a standalone local process (not inside Copilot) that auto-triages issues from your connected repo, assigns work based on team roles and keywords, and routes issues to agents or @copilot for pickup. It runs until you Ctrl+C. (Requires the same repo connection set up via Squad.)

Best for: overnight monitoring, catching issues while you're in meetings, and persistent triage between active sessions.

5. Copilot Cloud Agent (GitHub Issues)

☁️ Cloud · 🤖 Autonomous · 🌐 GitHub.com

Assign a GitHub issue to Copilot and it works independently — no terminal open, no local setup. The cloud agent runs in a GitHub Actions-powered ephemeral environment: it researches your repo, creates a plan, makes code changes, and opens a PR.

Trigger it from a GitHub issue comment:

<!-- In a GitHub issue comment: -->
@copilot implement this

Or assign the issue to Copilot directly from the GitHub Issues UI, VS Code, JetBrains, or the GitHub CLI.

The cloud agent works best for well-scoped, clearly described issues: "add a new endpoint that follows the existing pattern," "write tests for this module," "update the config schema to support the new field." Think of this as a single async task you hand off — describe the outcome clearly, and come back to a PR.

It uses GitHub Actions minutes and Copilot premium requests, so you're trading compute for time. No local session required; the work happens entirely on GitHub's infrastructure.

When to Use Each

Approach	Best For	Runs On	Requires Active Session?
VS Code agent mode	IDE-scoped tasks	Your machine	Yes
`copilot` CLI	Terminal-scoped tasks	Your machine	Yes
"Ralph, go"	Work queue + coordination	Your machine (Squad)	Yes
`squad watch`	Persistent monitoring	Your machine (background)	No — standalone process
Copilot cloud agent	Issue-driven implementation	GitHub's cloud	No — fully async

The Autonomy Spectrum

These options form a spectrum from "I'm here watching" to "I'm asleep":

You're present              You're away              You're asleep
─────────────────────────────────────────────────────────────────
VS Code agent mode    →    squad watch        →    Copilot cloud agent
Copilot CLI           →    (machine on)       →    (GitHub cloud)
Ralph, go                                         

What I Learned at Level 4

The key insight: autonomous execution requires fully-defined work. The quality of autonomous output is directly proportional to how clearly the task was specified. Vague issues get vague results. A well-written issue with acceptance criteria, examples, and constraints? That's where autonomous execution shines.

The cloud agent on GitHub is the lowest-friction option — no local setup, just assign an issue. squad watch bridges the gap between active sessions and cloud — your machine monitors and triages even when you're not in a Copilot session. Ralph is best when you're actively working through a backlog and want coordinated multi-agent execution.

Finding Your Path

Not everyone takes the same route through these levels:

Role	Start Here	Quick Win	Level Up
Engineer	Level 1 (completions + CLI)	Custom instructions for your stack	Skills for repeatable reviews
PM/Content	Level 1 (chat for drafting)	Custom instructions for voice/style	Squad ceremonies for sweeps
Team Lead	Level 2 (instructions + MCPs)	Skills for team processes	Squad for coordinated reviews
Platform	Level 2 (MCP + infra context)	Squad for monitoring	Squad for always-on monitoring

The Ecosystem at a Glance

The Copilot ecosystem is growing fast. Here are the key resources:

Essential Tools

GitHub Copilot Extension — the IDE extension (VS Code, JetBrains, etc.)
Copilot CLI — standalone copilot command for terminal
Squad CLI — named agents working in concert (npm i -g @bradygaster/squad-cli)

Learning & Community

Agentic SDLC Handbook — patterns for AI-first development
Copilot Insights — measure your Copilot usage
Awesome Copilot — community-curated extensions, skills, and tools (repo)

Infrastructure

Microsoft MCP — Model Context Protocol servers
Copilot for Azure — Azure resource context in Copilot

What Actually Changed for Me

I want to be honest about what's different after six months at Level 3+:

What improved:

PR turnaround dropped from days to hours (the green shirts handle first-pass review)
Documentation stays in sync with code (sweep ceremonies catch drift)
I work in unfamiliar codebases with dramatically less ramp-up time
Boilerplate tasks that used to take 30 minutes take 2 minutes
Skills encode my best practices — I define a process once, it runs the same way forever

What didn't change:

Architecture decisions still require human judgment
Debugging subtle logic errors still requires deep thought
Agent output needs review — trust but verify
Writing good charters and instructions is a skill that takes time to develop, update, and improve

The mental model shift: I stopped thinking "what code do I need to write?" and started thinking "what work needs to happen, and who should do it?" Sometimes the answer is me — blue shirt at the bench, swinging the mallet. Often it's a green shirt with clear instructions and a well-scoped task.

Start Today

You don't need to plan all four levels. Start where you are:

Never used Copilot? → Install the extension, write a comment, press Tab. That's it.

Using Copilot but it's generic? → Write a copilot-instructions.md file and one skill. 10 minutes, massive payoff.

Want more than autocomplete? → Install Squad CLI, write one agent charter, run copilot --agent squad.

Ready for autonomous execution? → Try copilot --autopilot on a well-defined task, or assign an issue to the cloud agent.

The progression is natural. Each level solves a real problem you'll discover at the previous one. And unlike most "AI transformation" pitches, you can validate the value at every step before investing in the next.

The future of development isn't AI replacing developers. It's developers who know how to orchestrate AI systems outperforming those who don't. The tools are here. The ecosystem is open source. The only question is which level you start at.

Want to go further? The next post in this series covers Cloud-Scale Agent Fleets for Level 5 — coming soon.

📣 GitHub Copilot Dev Days — Next Week!

Want to go deeper? GitHub Copilot Dev Days are happening next week with sessions in multiple languages and time zones:

🗓️ May 25, 2026 at 7 PM (BRT) — GitHub Copilot Dev Days Brazil [Portuguese]
🗓️ May 26, 2026 at 12 PM (CDMX) — GitHub Copilot Dev Days LATAM [Spanish]
🗓️ May 26, 2026 at 7:30 PM (CST) — GitHub Copilot Dev Days 中文版 [Simplified Chinese]
🗓️ May 27, 2026 at 9 AM (PST) — GitHub Copilot Dev Days [English]

These are free, virtual events covering the latest in Copilot extensibility, agentic development, and the ecosystem tools discussed in this post. See you there!

Have questions or want to share your own journey? Find me on GitHub at @dfberry or check out my other posts on the Copilot ecosystem.

Copilot CLI Context Window: How I Cut Token Usage from 52% to 13%

May 6, 2026 · 11 min read

I'll be honest: I started this whole investigation backwards. I had 117 skills consuming 413K tokens on disk and assumed that was the problem. I spent two hours optimizing them before I thought to actually measure what was in my context window. Turns out, skills are on-demand — they never touch the context window at all.

The biggest consumer was something I never would have guessed: a single plugin loading ~27K tokens of tool definitions into every message. This is the story of how I found it, scoped it down, and — importantly — how you can configure it to match your workflow without losing functionality.

What makes this different? There are already several great articles about MCP context optimization (devbolt.dev, The New Stack, StackOne, blog.pamelafox.org). This one adds: real measured token numbers from a production setup, the /context command as a diagnostic tool, the Azure MCP namespace scoping solution, and the Squad orchestration angle.

Step 1: Measure First — Check Your Token Breakdown

I run GitHub Copilot CLI with a multi-agent orchestration setup — half a dozen MCP servers, several plugins, and 117 skills. Mid-session, I got curious about what my context window actually looked like and ran /context:

Before optimization: /context showing 52% usage and compaction to 40%

Context Usage
  claude-opus-4.6 · 104k/200k tokens (52%)

  System/Tools:  62.5k (31%)
  Messages:      41.8k (21%)
  Free Space:    55.3k (28%)
  Buffer:        40.4k (20%)

52% consumed before typing a single message. The System/Tools bucket alone was 62.5K tokens — 31% of my 200K window. That's the baseline cost of my setup: agent instructions, MCP tool definitions, system prompt, memories.

With only 28% free space, complex multi-agent tasks would trigger compaction mid-session. I needed to find what was actually consuming those 62.5K tokens — and the only way to know for sure was to audit what's always-loaded vs. what lives on disk.

Step 2: Distinguish Always-Loaded from On-Demand

The first question to ask is not "what's biggest?" but "what's always in context?"

The System/Tools bucket contains everything that loads on every message — unconditionally. If I can reduce that, every operation gets cheaper. Optimizing anything else only helps specific operations.

I built a breakdown:

Consumer	~Tokens	When Loaded	Controllable?
MCP/Plugin tool definitions	~27K+	Every message	✅ Scope or remove
Agent instructions	~20K	Every message	✅ Slim it down
System prompt + memories	~10K	Every message	Partial
Skills	~143K on disk	On-demand only	Can optimize, but won't help context
Conversation history	Growing	Accumulates	Fresh sessions help

Key insight: Skills sit on disk until an agent explicitly requests one. They are never in the context window. Optimizing them makes individual agent spawns cheaper and faster — valuable for performance — but they don't contribute to System/Tools at all. (I learned this after spending two hours optimizing them. Do as I say, not as I did.)

Context consumers breakdown: MCP tools 30-40K, agent instructions 20K, skills on-demand

Step 3: Audit What's Always-Loaded

The mystery is: what's in that 62.5K System/Tools bucket?

MCP Tool Definitions (~6–10K tokens)

MCP servers inject their tool schemas into every message. I had:

GitHub MCP — ~15 tools (issues, PRs, code search, actions)
Mail MCP — ~20 tools (search, send, reply, forward, attachments)
PowerBI MCP — ~6 tools (execute query, generate query, get schema)
M365 Agents Toolkit — ~4 tools (knowledge, snippets, schema, troubleshoot)
IDE — ~2 tools (diagnostics, selection)

These are real — about 47–55 tools across all servers. But they're only ~6–10K tokens total. Where's the other 50K?

The Azure Plugin (~27K tokens) — The Biggest Consumer

I checked ~/.copilot/settings.json and found the Azure plugin enabled:

Plugin	Source	Impact
azure	microsoft/azure-skills	50+ tools, ~27K tokens

Here's the thing about the Azure MCP Server: it's comprehensive. Version 3.0.0-beta.6 has 259 tools across 56 namespaces — covering everything from ACR to Virtual Desktop to Well-Architected Framework. That breadth is genuinely impressive, and the team clearly designed it to be a one-stop shop for Azure developers.

The good news: the team also thought carefully about how developers actually work. They built in namespace scoping and mode selection so you don't have to load the entire surface area. In its default "namespace" mode, it groups tools by service — but if you're only using a few services, you can filter down to just those. More on that in a moment.

In my case, the default configuration was loading 50+ tool schemas into every message — even when I wasn't doing Azure work in that session. Not a bug, just a configuration I hadn't tuned yet.

Azure plugin details: 4 plugins consuming context, 50+ tools at 30-40K tokens

Agent Instructions (~20K tokens)

My agent governance file — .github/copilot-instructions.md at the repo root — is 80KB. It loads on every turn. This is the ongoing cost of a sophisticated agent setup: the orchestration rules are comprehensive, and they load unconditionally whether I need them or not.

Step 4: Scope the Azure Plugin to Match How You Work

Once I understood the breakdown, the fix was straightforward. The Azure MCP team built exactly the right lever for this — namespace scoping lets you declare which services matter for your project and ignore the rest. No functionality lost, just a tighter fit.

Option A: Disable Entirely (Full removal)

If you genuinely don't use Azure, just turn it off:

// ~/.copilot/settings.json
"azure@azure-skills": false

This is what I did initially — it dropped System/Tools from 62.5K → 35.2K, freeing ~27K tokens instantly.

Azure plugin disabled: azure@azure-skills set to false

Option B: Namespace Scoping (Surgical — Recommended)

This is where the Azure MCP Server's design really shines. The team built namespace filtering specifically for this use case — you declare the services relevant to your project, and only those tool schemas load into context.

Configure it in your MCP settings with the --namespace flag:

--namespace appservice --namespace cosmos --namespace keyvault --namespace storage

This gives you 4 namespaces (~24 tools) instead of 56 namespaces (~259 tools) — a significant reduction in context usage while keeping the Azure tools you actually use.

Azure MCP Modes

The server supports 4 modes that control how tools are exposed:

Mode	Behavior	Best For
namespace (default)	One tool per service namespace	Copilot — good balance
consolidated	Groups operations by user intent	Natural language workflows
single	One routing tool for everything	Maximum simplicity
all	Every operation as a separate tool (259!)	Maximum granularity — high context cost

Pick Your Stack

Here's a quick reference for common developer personas:

If you work with...	Namespaces to keep
Web apps	`appservice`, `cosmos`, `keyvault`, `storage`, `functions`
Data/Analytics	`cosmos`, `sql`, `kusto`, `eventhubs`, `storage`
DevOps/Infra	`compute`, `aks`, `azureterraform`, `deploy`, `monitor`
AI/ML	`foundryextensions`, `search`, `speech`, `applicationinsights`

All 56 Namespaces (Reference)

For the curious, here's the full list with tool counts. Use this to build your own --namespace filter:

Namespace	Tools	Namespace	Tools	Namespace	Tools
acr	2	advisor	1	aks	2
appconfig	5	applens	1	applicationinsights	1
appservice	7	azurebackup	16	azuremigrate	2
azureterraform	10	azureterraformbestpractices	1	bicepschema	1
cloudarchitect	1	communication	2	compute	12
confidentialledger	2	containerapps	1	cosmos	2
datadog	1	deploy	5	deviceregistry	1
eventgrid	3	eventhubs	9	extension	3
fileshares	14	foundryextensions	7	functionapp	1
functions	3	grafana	1	group	2
keyvault	8	kusto	7	loadtesting	6
managedlustre	18	marketplace	2	monitor	16
mysql	6	policy	1	postgres	6
pricing	1	quota	2	redis	2
resourcehealth	2	role	1	search	6
servicebus	3	servicefabric	2	signalr	1
speech	2	sql	13	storage	7
storagesync	18	subscription	1	virtualdesktop	3
wellarchitectedframework	1	workbooks	5

VS Code Users

You can also scope Azure MCP visually: click the gear icon next to the chat panel → select/deselect at the server, namespace, or individual tool level. No config files needed.

Other Filtering Options

Individual tools: --tool azmcp_storage_account_get --tool azmcp_cosmos_query for surgical precision
Combine namespace + tool filters for maximum control

Step 5: Then Optimize On-Demand Content (Optional)

Now that the always-loaded problem was solved, it was the right time to optimize skills — not because they consume context window (they don't), but because they improve individual agent spawn performance.

I spent two hours optimizing 117 Copilot CLI skills — reducing them from 413K to 143K tokens on disk, a 65% reduction. The process used waza_tokens to find bloated skills and patterns like reference extraction and checklist compression.

This didn't move the System/Tools percentage. But it made agent spawns faster and cheaper to run. Both wins are real — you just optimize them for different reasons.

Step 6: Measure Results

Fresh session after Azure disabled: context at 35K/200K (18%)

After scoping the Azure plugin:

System/Tools:  35.2k (18%)
Total usage:   ~70k/200k (35%)
Free Space:    ~90k (45%)

After upgrading the agent coordinator file:

System/Tools:  25.5k (13%)
Total usage:   26k/200k (13%)
Free Space:    134.1k (67%)

The remaining ~10K drop from 35.2K → 25.5K came from upgrading my agent coordinator file — the new version replaced the old 80KB governance prompt with a leaner one. Skill optimization (270K saved on disk) didn't affect this number because skills are on-demand and never in the context window.

Final state: context at 26K/200K (13%), 67% free space

The Scorecard

Action	Tokens Freed	Effort	Context Impact
Scope Azure plugin	~27K	Config change	Significant — always loaded
Upgrade agent coordinator file	~10K	1 command	Significant — always loaded
Optimize 117 skills	~270K on disk	2 hours, 106 files	Zero on context — but faster agent spawns

System/Tools went from 62.5K → 25.5K. Free space went from 28% → 67%. That's 2.4x more room for actual work.

The counterintuitive lesson: The biggest token savings came from the smallest changes — because I measured first instead of guessing.

Why Measurement First Matters

Most people (including me, initially) assume the biggest files on disk must be the problem. It's intuitive. It's wrong.

Skills: 143K on disk → 0K in context. Azure plugin: 50+ tools → ~27K in context every message.

Without checking /context, I would have spent all my time optimizing the wrong thing. I did optimize skills first (and it was worthwhile for other reasons), but the crucial discovery was always-loaded vs. on-demand. I'm reframing my mistake as a teaching moment: measure first, then optimize.

Quick Diagnostic Guide

This is the methodology. Use it whenever context runs tight:

MCP config file layers: user vs repo level, what you can control

Run /context — see your actual breakdown
Check plugins — ~/.copilot/settings.json — scope or disable unused ones (biggest wins are usually here)
Scope your MCPs — use namespace filtering, tool filtering, or mode selection to load only what you need
Check MCP servers — ~/.copilot/mcp-config.json and .copilot/mcp-config.json — remove servers you don't use daily
Check agent instructions — if you use custom agent governance files, they load every turn
Skills are usually fine — they're on-demand, not always-loaded
Start fresh sessions — conversation history accumulates; don't run marathon sessions

The biggest wins are almost always in steps 2–3. Scoping one plugin can save more context than hours of file optimization.

What About Hooks?

One thing I haven't tested yet: Copilot hooks (commit hooks, pre-push hooks, custom event hooks). These are lightweight by design — they're shell scripts or short instructions, not loaded into the context window the way MCP tool definitions are. They fire on specific events rather than sitting in the always-loaded bucket.

That said, if you have hooks that reference large config files or trigger MCP calls, those downstream effects could impact context during execution. Worth running /context before and after adding hooks to verify. My expectation is minimal impact, but I'll update this post once I've measured it directly.

The Setup

GitHub Copilot CLI v1.0.40
Squad v0.9.4-insider.1 for multi-agent orchestration
117 skills in .copilot/skills/ — now ~143K tokens (optimized)
5 MCP servers (GitHub, Mail, PowerBI, M365 Agents Toolkit, IDE)
Azure plugin: scoped to needed namespaces (the one that mattered)
Model: Claude Opus 4.6 with 200K context window

Investigation: May 5, 2026. The key lesson: measurement comes before optimization. Run /context and let the data guide your effort, not your intuition about file sizes. And when you find an MCP consuming more than you need — scope it down to match how you actually work.

The skills optimization ran same session — 117 skills reduced by 65% (413K → 143K tokens on disk) using waza tools.

The Problem: Context Islands​

You Already Know What the Solution Feels Like​

Why Not Just Use Those Existing Features?​

Why Not CLAUDE.md, copilot-instructions, or Cursor Rules?​

Why Not Just an Agent or Skill?​

Why Not Mem0 or a Cloud Memory Service?​

Personal Context Is Not Memory​

The Insight: LLMs Already Speak Markdown​

The Architecture: Personal Context as a Repo​

Why Four Layers?​

What Goes in Each Layer​

Core: Your Constitution​

Decisions: Your Ledger​

Process: Your Playbook​

Active: Your Whiteboard​

How Surfaces Consume It​

Selective Retrieval: Only Load What's Relevant​

The Priority Stack​

Threat Model: Retrieval Is the Boundary​

Writing Back: Closing the Loop​

Connecting Each Surface (Proposed Integrations)​

GitHub Copilot in VS Code​

Copilot CLI​

Microsoft Scout​

Microsoft 365 Copilot​

Any MCP-Enabled Tool (Claude, Cursor, ChatGPT)​

Getting Started: Example 30-Minute Setup​

1. Create the repo (5 minutes)​

2. Write your identity (10 minutes)​

3. Capture your first decisions (10 minutes)​

4. Connect one surface (5 minutes)​

5. Evolve naturally​

The Payoff​

Beyond the Repo: When a Service Makes Sense​

What a hosted version would buy you​

The key design call: contract vs. transport​

What the service actually is: a context broker​

Even a service still hits the standards wall​

What's Next: The Standard That Doesn't Exist Yet​

The TL;DR​

Level 1: Your First Day with Copilot​

In the IDE: Inline Completions & Inline Chat​

In the IDE: Copilot Chat Panel​

The Standalone Copilot CLI​

When to Use Each​

Try This Now​

What I Learned at Level 1​

Level 2: Making Copilot Yours​

Custom Instruction Files​

MCP Servers: Giving Copilot New Abilities​

Skills: Repeatable, Deterministic Work​

Try This Now​

What I Learned at Level 2​

Level 3: Squad — A Team Working in Concert​

What Makes Squad Different​

Installing Squad​

Agent Charters: Expertise + Boundaries​

Ceremonies: On-Demand Structured Workflows​

Try This Now​

What I Learned at Level 3​

Level 4: Autonomous Operations​

Five Ways to Run Autonomously​

1. VS Code Agent Mode​

2. Copilot CLI Agent Mode​

3. "Ralph, go" — Squad Work Queue (In-Session)​

4. Squad Watch — Persistent Local Monitoring​

5. Copilot Cloud Agent (GitHub Issues)​

When to Use Each​

The Autonomy Spectrum​

What I Learned at Level 4​

Finding Your Path​

The Ecosystem at a Glance​

Essential Tools​

Learning & Community​

Infrastructure​

What Actually Changed for Me​

Start Today​

📣 GitHub Copilot Dev Days — Next Week!​

Step 1: Measure First — Check Your Token Breakdown​

Step 2: Distinguish Always-Loaded from On-Demand​

The Problem: Context Islands

You Already Know What the Solution Feels Like

Why Not Just Use Those Existing Features?

Why Not CLAUDE.md, copilot-instructions, or Cursor Rules?

Why Not Just an Agent or Skill?

Why Not Mem0 or a Cloud Memory Service?

Personal Context Is Not Memory

The Insight: LLMs Already Speak Markdown

The Architecture: Personal Context as a Repo

Why Four Layers?

What Goes in Each Layer

Core: Your Constitution

Decisions: Your Ledger

Process: Your Playbook

Active: Your Whiteboard

How Surfaces Consume It

Selective Retrieval: Only Load What's Relevant

The Priority Stack

Threat Model: Retrieval Is the Boundary

Writing Back: Closing the Loop

Connecting Each Surface (Proposed Integrations)

GitHub Copilot in VS Code

Copilot CLI

Microsoft Scout

Microsoft 365 Copilot

Any MCP-Enabled Tool (Claude, Cursor, ChatGPT)

Getting Started: Example 30-Minute Setup

1. Create the repo (5 minutes)

2. Write your identity (10 minutes)

3. Capture your first decisions (10 minutes)

4. Connect one surface (5 minutes)

5. Evolve naturally

The Payoff

Beyond the Repo: When a Service Makes Sense

What a hosted version would buy you

The key design call: contract vs. transport

What the service actually is: a context broker

Even a service still hits the standards wall

What's Next: The Standard That Doesn't Exist Yet

The TL;DR

Level 1: Your First Day with Copilot

In the IDE: Inline Completions & Inline Chat

In the IDE: Copilot Chat Panel

The Standalone Copilot CLI

When to Use Each

Try This Now

What I Learned at Level 1

Level 2: Making Copilot Yours

Custom Instruction Files

MCP Servers: Giving Copilot New Abilities

Skills: Repeatable, Deterministic Work

Try This Now

What I Learned at Level 2

Level 3: Squad — A Team Working in Concert

What Makes Squad Different

Installing Squad

Agent Charters: Expertise + Boundaries

Ceremonies: On-Demand Structured Workflows

Try This Now

What I Learned at Level 3

Level 4: Autonomous Operations

Five Ways to Run Autonomously

1. VS Code Agent Mode

2. Copilot CLI Agent Mode

3. "Ralph, go" — Squad Work Queue (In-Session)

4. Squad Watch — Persistent Local Monitoring

5. Copilot Cloud Agent (GitHub Issues)

When to Use Each

The Autonomy Spectrum

What I Learned at Level 4

Finding Your Path

The Ecosystem at a Glance

Essential Tools

Learning & Community

Infrastructure

What Actually Changed for Me

Start Today

📣 GitHub Copilot Dev Days — Next Week!

Step 1: Measure First — Check Your Token Breakdown

Step 2: Distinguish Always-Loaded from On-Demand