Skip to main content

Copilot CLI Context Window: How I Cut Token Usage from 52% to 13%

· 11 min read

I'll be honest: I started this whole investigation backwards. I had 117 skills consuming 413K tokens on disk and assumed that was the problem. I spent two hours optimizing them before I thought to actually measure what was in my context window. Turns out, skills are on-demand — they never touch the context window at all.

The biggest consumer was something I never would have guessed: a single plugin loading ~27K tokens of tool definitions into every message. This is the story of how I found it, scoped it down, and — importantly — how you can configure it to match your workflow without losing functionality.

What makes this different? There are already several great articles about MCP context optimization (devbolt.dev, The New Stack, StackOne, blog.pamelafox.org). This one adds: real measured token numbers from a production setup, the /context command as a diagnostic tool, the Azure MCP namespace scoping solution, and the Squad orchestration angle.

Step 1: Measure First — Check Your Token Breakdown

I run GitHub Copilot CLI with a multi-agent orchestration setup — half a dozen MCP servers, several plugins, and 117 skills. Mid-session, I got curious about what my context window actually looked like and ran /context:

Before optimization: /context showing 52% usage and compaction to 40%

Context Usage
claude-opus-4.6 · 104k/200k tokens (52%)

System/Tools: 62.5k (31%)
Messages: 41.8k (21%)
Free Space: 55.3k (28%)
Buffer: 40.4k (20%)

52% consumed before typing a single message. The System/Tools bucket alone was 62.5K tokens — 31% of my 200K window. That's the baseline cost of my setup: agent instructions, MCP tool definitions, system prompt, memories.

With only 28% free space, complex multi-agent tasks would trigger compaction mid-session. I needed to find what was actually consuming those 62.5K tokens — and the only way to know for sure was to audit what's always-loaded vs. what lives on disk.

Step 2: Distinguish Always-Loaded from On-Demand

The first question to ask is not "what's biggest?" but "what's always in context?"

The System/Tools bucket contains everything that loads on every message — unconditionally. If I can reduce that, every operation gets cheaper. Optimizing anything else only helps specific operations.

I built a breakdown:

Consumer~TokensWhen LoadedControllable?
MCP/Plugin tool definitions~27K+Every message✅ Scope or remove
Agent instructions~20KEvery message✅ Slim it down
System prompt + memories~10KEvery messagePartial
Skills~143K on diskOn-demand onlyCan optimize, but won't help context
Conversation historyGrowingAccumulatesFresh sessions help

Key insight: Skills sit on disk until an agent explicitly requests one. They are never in the context window. Optimizing them makes individual agent spawns cheaper and faster — valuable for performance — but they don't contribute to System/Tools at all. (I learned this after spending two hours optimizing them. Do as I say, not as I did.)

Context consumers breakdown: MCP tools 30-40K, agent instructions 20K, skills on-demand

Step 3: Audit What's Always-Loaded

The mystery is: what's in that 62.5K System/Tools bucket?

MCP Tool Definitions (~6–10K tokens)

MCP servers inject their tool schemas into every message. I had:

  • GitHub MCP — ~15 tools (issues, PRs, code search, actions)
  • Mail MCP — ~20 tools (search, send, reply, forward, attachments)
  • PowerBI MCP — ~6 tools (execute query, generate query, get schema)
  • M365 Agents Toolkit — ~4 tools (knowledge, snippets, schema, troubleshoot)
  • IDE — ~2 tools (diagnostics, selection)

These are real — about 47–55 tools across all servers. But they're only ~6–10K tokens total. Where's the other 50K?

The Azure Plugin (~27K tokens) — The Biggest Consumer

I checked ~/.copilot/settings.json and found the Azure plugin enabled:

PluginSourceImpact
azuremicrosoft/azure-skills50+ tools, ~27K tokens

Here's the thing about the Azure MCP Server: it's comprehensive. Version 3.0.0-beta.6 has 259 tools across 56 namespaces — covering everything from ACR to Virtual Desktop to Well-Architected Framework. That breadth is genuinely impressive, and the team clearly designed it to be a one-stop shop for Azure developers.

The good news: the team also thought carefully about how developers actually work. They built in namespace scoping and mode selection so you don't have to load the entire surface area. In its default "namespace" mode, it groups tools by service — but if you're only using a few services, you can filter down to just those. More on that in a moment.

In my case, the default configuration was loading 50+ tool schemas into every message — even when I wasn't doing Azure work in that session. Not a bug, just a configuration I hadn't tuned yet.

Azure plugin details: 4 plugins consuming context, 50+ tools at 30-40K tokens

Agent Instructions (~20K tokens)

My agent governance file — .github/copilot-instructions.md at the repo root — is 80KB. It loads on every turn. This is the ongoing cost of a sophisticated agent setup: the orchestration rules are comprehensive, and they load unconditionally whether I need them or not.

Step 4: Scope the Azure Plugin to Match How You Work

Once I understood the breakdown, the fix was straightforward. The Azure MCP team built exactly the right lever for this — namespace scoping lets you declare which services matter for your project and ignore the rest. No functionality lost, just a tighter fit.

Option A: Disable Entirely (Full removal)

If you genuinely don't use Azure, just turn it off:

// ~/.copilot/settings.json
"azure@azure-skills": false

This is what I did initially — it dropped System/Tools from 62.5K → 35.2K, freeing ~27K tokens instantly.

Azure plugin disabled: azure@azure-skills set to false

This is where the Azure MCP Server's design really shines. The team built namespace filtering specifically for this use case — you declare the services relevant to your project, and only those tool schemas load into context.

Configure it in your MCP settings with the --namespace flag:

--namespace appservice --namespace cosmos --namespace keyvault --namespace storage

This gives you 4 namespaces (~24 tools) instead of 56 namespaces (~259 tools) — a significant reduction in context usage while keeping the Azure tools you actually use.

Azure MCP Modes

The server supports 4 modes that control how tools are exposed:

ModeBehaviorBest For
namespace (default)One tool per service namespaceCopilot — good balance
consolidatedGroups operations by user intentNatural language workflows
singleOne routing tool for everythingMaximum simplicity
allEvery operation as a separate tool (259!)Maximum granularity — high context cost

Pick Your Stack

Here's a quick reference for common developer personas:

If you work with...Namespaces to keep
Web appsappservice, cosmos, keyvault, storage, functions
Data/Analyticscosmos, sql, kusto, eventhubs, storage
DevOps/Infracompute, aks, azureterraform, deploy, monitor
AI/MLfoundryextensions, search, speech, applicationinsights

All 56 Namespaces (Reference)

For the curious, here's the full list with tool counts. Use this to build your own --namespace filter:

NamespaceToolsNamespaceToolsNamespaceTools
acr2advisor1aks2
appconfig5applens1applicationinsights1
appservice7azurebackup16azuremigrate2
azureterraform10azureterraformbestpractices1bicepschema1
cloudarchitect1communication2compute12
confidentialledger2containerapps1cosmos2
datadog1deploy5deviceregistry1
eventgrid3eventhubs9extension3
fileshares14foundryextensions7functionapp1
functions3grafana1group2
keyvault8kusto7loadtesting6
managedlustre18marketplace2monitor16
mysql6policy1postgres6
pricing1quota2redis2
resourcehealth2role1search6
servicebus3servicefabric2signalr1
speech2sql13storage7
storagesync18subscription1virtualdesktop3
wellarchitectedframework1workbooks5

VS Code Users

You can also scope Azure MCP visually: click the gear icon next to the chat panel → select/deselect at the server, namespace, or individual tool level. No config files needed.

Other Filtering Options

  • Individual tools: --tool azmcp_storage_account_get --tool azmcp_cosmos_query for surgical precision
  • Combine namespace + tool filters for maximum control

Step 5: Then Optimize On-Demand Content (Optional)

Now that the always-loaded problem was solved, it was the right time to optimize skills — not because they consume context window (they don't), but because they improve individual agent spawn performance.

I spent two hours optimizing 117 Copilot CLI skills — reducing them from 413K to 143K tokens on disk, a 65% reduction. The process used waza_tokens to find bloated skills and patterns like reference extraction and checklist compression.

This didn't move the System/Tools percentage. But it made agent spawns faster and cheaper to run. Both wins are real — you just optimize them for different reasons.

Step 6: Measure Results

Fresh session after Azure disabled: context at 35K/200K (18%)

After scoping the Azure plugin:

System/Tools:  35.2k (18%)
Total usage: ~70k/200k (35%)
Free Space: ~90k (45%)

After upgrading the agent coordinator file:

System/Tools:  25.5k (13%)
Total usage: 26k/200k (13%)
Free Space: 134.1k (67%)

The remaining ~10K drop from 35.2K → 25.5K came from upgrading my agent coordinator file — the new version replaced the old 80KB governance prompt with a leaner one. Skill optimization (270K saved on disk) didn't affect this number because skills are on-demand and never in the context window.

Final state: context at 26K/200K (13%), 67% free space

The Scorecard

ActionTokens FreedEffortContext Impact
Scope Azure plugin~27KConfig changeSignificant — always loaded
Upgrade agent coordinator file~10K1 commandSignificant — always loaded
Optimize 117 skills~270K on disk2 hours, 106 filesZero on context — but faster agent spawns

System/Tools went from 62.5K → 25.5K. Free space went from 28% → 67%. That's 2.4x more room for actual work.

The counterintuitive lesson: The biggest token savings came from the smallest changes — because I measured first instead of guessing.

Why Measurement First Matters

Most people (including me, initially) assume the biggest files on disk must be the problem. It's intuitive. It's wrong.

Skills: 143K on disk → 0K in context. Azure plugin: 50+ tools → ~27K in context every message.

Without checking /context, I would have spent all my time optimizing the wrong thing. I did optimize skills first (and it was worthwhile for other reasons), but the crucial discovery was always-loaded vs. on-demand. I'm reframing my mistake as a teaching moment: measure first, then optimize.

Quick Diagnostic Guide

This is the methodology. Use it whenever context runs tight:

MCP config file layers: user vs repo level, what you can control

  1. Run /context — see your actual breakdown
  2. Check plugins~/.copilot/settings.json — scope or disable unused ones (biggest wins are usually here)
  3. Scope your MCPs — use namespace filtering, tool filtering, or mode selection to load only what you need
  4. Check MCP servers~/.copilot/mcp-config.json and .copilot/mcp-config.json — remove servers you don't use daily
  5. Check agent instructions — if you use custom agent governance files, they load every turn
  6. Skills are usually fine — they're on-demand, not always-loaded
  7. Start fresh sessions — conversation history accumulates; don't run marathon sessions

The biggest wins are almost always in steps 2–3. Scoping one plugin can save more context than hours of file optimization.

What About Hooks?

One thing I haven't tested yet: Copilot hooks (commit hooks, pre-push hooks, custom event hooks). These are lightweight by design — they're shell scripts or short instructions, not loaded into the context window the way MCP tool definitions are. They fire on specific events rather than sitting in the always-loaded bucket.

That said, if you have hooks that reference large config files or trigger MCP calls, those downstream effects could impact context during execution. Worth running /context before and after adding hooks to verify. My expectation is minimal impact, but I'll update this post once I've measured it directly.

The Setup

  • GitHub Copilot CLI v1.0.40
  • Squad v0.9.4-insider.1 for multi-agent orchestration
  • 117 skills in .copilot/skills/ — now ~143K tokens (optimized)
  • 5 MCP servers (GitHub, Mail, PowerBI, M365 Agents Toolkit, IDE)
  • Azure plugin: scoped to needed namespaces (the one that mattered)
  • Model: Claude Opus 4.6 with 200K context window

Investigation: May 5, 2026. The key lesson: measurement comes before optimization. Run /context and let the data guide your effort, not your intuition about file sizes. And when you find an MCP consuming more than you need — scope it down to match how you actually work.

The skills optimization ran same session — 117 skills reduced by 65% (413K → 143K tokens on disk) using waza tools.

Exploring Copilot CLI Session Management to Improve Squad

· 13 min read

I've been using Squad, an AI team framework built on top of Copilot CLI, and I kept wondering: Copilot CLI already tracks everything that happens in a session — could that data make Squad's agents smarter? I spent some time digging into how both systems manage session data, and I think there's an untapped opportunity.

This post is my investigation notes — what I found, how the two systems compare, and where I think they could be combined for more value.

My working theory: Copilot is your diary (what happened). Squad is your playbook (what to do about it). Right now they're like two lighthouses on opposite shores of Bellingham Bay — both useful, but no bridge between them.

Two lighthouses on opposite shores of Bellingham Bay, their beams not quite connecting

What I Found: Two Memory Systems

Copilot CLI: The Raw Record

Copilot CLI records every session — prompts, responses, tool calls, file changes, and checkpoints. I discovered it powers:

  • /resume — pick up where you left off in any previous session
  • /chronicle — generate standup reports, get personalized tips, improve your custom instructions
  • /session — view and manage your sessions directly from the CLI

Session data lives in ~/.copilot/session-state/ as files and in ~/.copilot/session-store.db as a structured SQLite database.

What Copilot remembers: Everything that happened in every session — the full transcript.

What it doesn't do: Extract meaning. Copilot stores the raw conversation, not the conclusions you drew from it.

Squad: The Distilled Knowledge

Squad's memory is different — and this is where I see the gap. It's not a transcript — it's distilled knowledge, stored as markdown files in your repo:

WhatFilePurpose
Team decisions.squad/decisions.mdShared brain — every agent reads this
Agent memory.squad/agents/{name}/history.mdPersonal learnings per agent
Skills.copilot/skills/{name}/SKILL.mdRepeatable tasks with everything needed to execute
Session state.squad/sessions/*.jsonResume data (gitignored by default)
Scribe logs.squad/log/*.mdSession summaries (gitignored by default)

What Squad remembers: Decisions, patterns, preferences, and skills — the things that should change how agents behave next time.

What it doesn't do: Record the full conversation. That's Copilot's job.

The Gap I See

The two systems complement each other, but right now they're completely disconnected — like looking across Deception Pass and seeing the other side but having no way to cross.

Water rushing through a rocky gorge at Deception Pass — two cliff faces close together with no bridge between them

Here's where each system shines:

QuestionWhere to look
"What did I do last Tuesday?"Copilot — /session or /chronicle standup
"What did the team decide about auth?"Squad — .squad/decisions.md
"Have I worked on this file before?"Copilot — /session to browse past sessions
"How do we run a content audit?"Squad — .copilot/skills/content-audit/SKILL.md
"What went wrong last time I tried this?"Copilot — session transcript via /resume
"What does this agent know about TypeSpec?"Squad — .squad/agents/{name}/history.md

This separation works, but it's manual. You have to be the bridge — ferrying insights across the water yourself. That's the opportunity I'm investigating.

Where I Think Session Data Could Improve Squad

Squad has a built-in skill called reskill ("team, reskill") that audits agent charters and histories, extracts shared patterns into skills, and compresses bloated files. Think of it as sorting the morning catch on a Bellingham dock — keeping what's valuable, tossing the rest back.

Fisherman on a Bellingham dock sorting the morning catch into labeled crates

But reskill today is purely file-based — it reads .squad/ markdown and looks for textual duplication. It has no idea what actually happened in sessions.

Here's what I think session data could add:

Signal from Copilot sessionsWhat Squad could do with it
Agent X was spawned 40 times but only useful 25 timesRefine charter to reduce misfires
Agent Y always gets the same 3 files as inputBake those into charter's "What I Own"
Users keep correcting the same mistakeExtract as anti-pattern in a skill
An agent never gets spawnedFlag for removal during reskill
Two agents always get spawned togetherSuggest merging or formalizing the pairing
Certain skills are read but never appliedDeprecate during reskill
Session durations spike after charter changesDetect regressions from past reskills

There are two existing proposals in the Squad repo that go in this direction — tiered memory (#600, open) for hot/cold/wiki context layers, and reflect (#621, closed PR — not merged) for in-session learning capture. Neither one references Copilot CLI session data though. They're both Squad-internal. The bridge between Copilot's behavioral data and Squad's knowledge system doesn't exist yet.

Ideas I'm Exploring

The theme here is a feedback loop — raw session data flows downstream, gets refined into knowledge, and that knowledge shapes the next session. Like the Nooksack River circling back toward the mountains that feed it.

The Nooksack River looping back toward Mount Baker, papers transforming into books at the bend

1. Feed /chronicle into Reskill

After a productive session, Squad agents already extract the important parts:

  • Decisions go to .squad/decisions.md
  • Learnings go to agents/{name}/history.md
  • Reusable patterns become skills

But what if reskill could also query Copilot's session store to find patterns agents missed? /chronicle improve already analyzes session history to suggest custom instruction improvements. That same analysis could feed into Squad's skill extraction pipeline — Copilot finds the behavioral pattern, Squad encodes it permanently.

2. Use /chronicle for Behavioral Analysis

Copilot's /chronicle improve analyzes session history to find where agents struggled or needed correction. I'm thinking about how to make this a systematic input to Squad:

  • Run /chronicle improve periodically
  • Take the suggestions and apply them to agent charters or team directives
  • This creates a feedback loop: Copilot finds the pattern, Squad encodes it permanently

Today this is manual. I'd love to see a squad reskill --from-chronicle that automates the loop.

3. Use /session for Context

When starting work on something you've touched before, use /session to browse previous sessions and find relevant context:

"Before starting, check /session for any previous sessions 
that touched these files. Summarize what was done and any issues."

This gives agents a head start without you having to remember and re-explain.

4. Use Squad for Cross-Agent Memory

Copilot's session history is per-user. Squad's memory is per-team. When Agent A discovers something that Agent B needs to know, Squad's shared files make that happen:

  • Scribe writes cross-agent updates to affected agents' history.md
  • Decisions in decisions.md are read by every agent at spawn time
  • Skills are shared — any agent can use any skill

The Gitignore Decision

Squad gitignores session-related files by default. Here's what that means and when to change it:

FileDefaultChange when
.squad/sessions/GitignoredCommit if you need session transcripts in git (training repos, research)
.squad/log/GitignoredCommit if you want Scribe's summaries as an audit trail
.squad/orchestration-log/GitignoredCommit if you want agent routing history preserved
.squad/decisions.mdCommittedNever gitignore — this is the team's shared brain
.squad/agents/*/history.mdCommittedNever gitignore — this is each agent's knowledge
.copilot/skills/CommittedNever gitignore — these are your reusable patterns

The recommended hybrid: Keep sessions gitignored, but commit Scribe's logs for a lightweight audit trail. Remove .squad/log/ and .squad/orchestration-log/ from .gitignore to enable this.

⚠️ One caveat: If your org requires audit trails of AI interactions, git probably isn't the right system of record — no retention policies, no redaction, no legal hold. Worth checking before treating committed sessions as a compliance solution.

Under the Hood (Skip Unless Debugging)

Copilot CLI stores session data in two places: file-based events in ~/.copilot/session-state/{session-id}/events.jsonl and a searchable SQLite database at ~/.copilot/session-store.db. The database powers /chronicle and /session — you need "experimental": true in ~/.copilot/config.json to enable these features. Without experimental mode, /chronicle won't be available — enable it with /experimental on in any session.

Each session folder contains the event stream (every tool call, message, and model metric), workspace metadata, and checkpoint snapshots that /resume uses to reconstruct context. The session.shutdown event in events.jsonl is worth finding — it shows your token usage, cache hit rates, and code changes in one place.

The SQLite database (~59 MB after ~770 sessions in my case) holds structured records across seven tables: sessions, turns, checkpoints, session_files, session_refs, and an FTS5 search index. Records persist even after session directories are cleaned up. Don't delete the .db-wal file while Copilot is running — you'll lose recent writes.

What's in the .squad Session Files

If you're using Squad to orchestrate AI agents, there's a parallel session storage layer inside .squad/ in your repo. While .copilot/ tracks platform sessions, .squad/ accumulates session-by-session team memory.

Session-Scoped Files (Created Per Session)

These files can be traced back to a specific session:

FileWhat It Contains
orchestration-log/{timestamp}-{agent}.mdWho was spawned, why, what they did. Append-only audit trail.
log/{timestamp}-{topic}.mdScribe's session summary.
decisions/inbox/{agent}-{slug}.mdEphemeral drop-box — agents write decisions here during a session. Scribe merges them into decisions.md afterward.
identity/now.mdUpdated each session with current focus. Every agent reads this at spawn so they hit the ground running.

Running-State Files (Modified Across Sessions)

These files accumulate changes but don't track which session changed them:

FileHow It Changes
agents/*/history.mdGrows each session as agents record learnings. Scribe summarizes when it exceeds ~15 KB.
agents/*/charter.mdUpdated if an agent's role evolves. No session linkage.
skills/{name}/SKILL.mdCreated or updated when agents discover reusable patterns.
decisions.mdThe canonical decision ledger — grows each session, entries are dated.
team.md, routing.mdUpdated when members join or leave.
casting/registry.jsonNew agent names registered here. Persistent.

The distinction matters: session files are created per session and disposable. Running-state files are your team's accumulated intelligence — they compound over time.

The Two-Layer Model

Together, .copilot/ and .squad/ form a complete session memory system — like Whatcom County's geology, where buildings sit on the visible surface but the real water supply flows through the aquifer below.

Cross-section of Whatcom County geology — buildings on the surface, aquifer below, a well connecting them

LayerLocationScopeWhat it tracks per session
Platform~/.copilot/Per-user, cross-projectEvents, turns, tool calls, model metrics
Team.squad/ (in repo)Per-project, cross-sessionOrchestration logs, agent memory, decisions, focus

The platform layer is invisible infrastructure — you don't commit it, you query it. The team layer is committed to the repo — it travels with the code and survives across machines, sessions, and team members. Surface and aquifer, both feeding the same ecosystem.

Try This

Ready to explore your own session data? Here are three things you can do right now:

  1. Browse your sessions: Open ~/.copilot/session-state/ and look at the events.jsonl from your most recent session. Search for session.shutdown to see your token usage and cache hit rates.

  2. Query your history: In any Copilot CLI session, try /session to browse your past sessions. Use /resume to jump back into a previous session with full context.

  3. Feed Copilot into Squad: Run /chronicle improve and review the suggestions. Pick one that matches a recurring pattern and say: "Make that a skill" or "Add that to decisions."

If you're not using Squad yet, #1 and #2 still work — they're pure Copilot CLI. The session data is there whether you browse it or not.

If You're Building an Agent on Top of Copilot

This investigation was Squad-specific, but the underlying insight applies to anyone building on Copilot CLI: there's a lake of session data sitting right there in ~/.copilot/ — and most agents ignore it completely.

The good news is the plumbing already exists. The Copilot SDK (@github/copilot-sdk) exposes session listing, full event history, and real-time event subscriptions. You can filter sessions by repo or branch, pull every tool call and assistant response, and subscribe to events as they happen. The data access is there — what's missing is the intelligence layer on top.

Here are three ideas I keep coming back to — none of them Squad-specific:

1. Adaptive Prompt Tuning Based on Tool Failure Rates

I noticed in my own session data that certain tool calls fail repeatedly — grep with regex that doesn't match the codebase's naming conventions, for example. An agent could watch for these patterns and silently adjust its strategy — switching to glob patterns, broadening search terms, adding fallback chains — without me ever asking. Like a fishing guide who notices you keep casting into the wrong current and quietly repositions the boat.

2. Cross-Session Onboarding for New Repos

When I open a new repository for the first time, the agent has zero context about how I work. But my session history from other repos is right there — it shows whether I prefer TypeScript or JavaScript, whether I write tests first, which frameworks I reach for. An agent could mine that cross-project session data to bootstrap a developer profile, skipping the cold-start problem. First day in a new codebase, but the agent already knows your habits.

3. Drift Detection Between Intent and Outcome

Session data captures both what I asked for and what the agent actually did — tool calls, file edits, test results. Over time, an agent could spot drift: I keep correcting the same kind of CSS suggestion, or certain requests consistently take multiple follow-up turns. Imagine the agent saying, "You frequently adjust my styling — want me to follow a specific style guide?" That turns passive logs into active self-improvement.

The common thread: session data isn't just a transcript — it's telemetry. Any agent that treats it as a feedback signal rather than a static log has a real advantage, like reading the tides instead of just watching the water.

The Bottom Line

Use both memory systems intentionally:

  • Copilot handles the raw history. Let it. Don't try to replicate session transcripts in Squad files.
  • Squad handles the distilled knowledge. Invest here — decisions, history, and skills are what compound.
  • Feed insights from Copilot back into Squad via /chronicle improve, directives, and skill creation.
  • Start with the default gitignore. The valuable stuff is already being committed. Relax later if you need session trails.

Your agents get smarter not because they remember every conversation, but because the important conclusions persist in the right place. The river keeps flowing — what matters is what settles into the riverbed.

GitHub Account Cleanup: Audit, Archive & Remove Stale Repos

· 5 min read

My end of year project is a GitHub account repository-cleanup tool to provide safe, repeatable auditing and cleanup for my GitHub accounts. I also wanted to create a catalog of my active repos. This repo focuses on repository-level cleanup (archive/delete/catalog), but the same audit run can help you discover candidates for cloud-resource reclamation and CI/workflow maintenance.

When to use this project

  • Periodic account maintenance (end-of-year or scheduled audits).
  • Before publishing a portfolio or transferring repositories.
  • When you want a reproducible audit with a dry-run-first approach.

Functionality included in GitHub account cleanup

My TypeScript project cleans up my account in the following way:

  • Archive stale repositories,
  • Detect and delete empty repositories,
  • Remove forks,
  • Generate repo descriptions and topics with LLM and update repos with that info,
  • Generate a catalog of active repos for publishing.

High level architecture

1- primary entry is scripts/run-all.sh 2- workflows are optional 3- gh-cleanup calls github-rest and llm-completion 4- outputs go to generated/

The high-level architecture of the npm workspace monorepo:

  • packages/gh-cleanup — CLI commands and orchestration: categorization rules, scoring, reporting, and the runner that coordinates dry-run and apply flows.
  • packages/github-rest — GitHub REST helpers, typed endpoint wrappers, and shared network utilities.
  • packages/llm-completion — LLM/AI utilities: prompt helpers, request wrappers, retries, and response sanitization used by the describe step.
  • generated/ — Example outputs created by dry-run executions: catalog.md, active.json, descriptions.json, summaries, etc.
  • .github - Workflows and prompt files.
  • scripts - Top level script to clean up GitHub account, also used by

Prerequisites

This repo can be opened with Codespaces or locally with .devcontainer/devcontainer.json. The development container has all the developer setup for this project including Node.js. Once you have the repo open with the environment, create a GitHub token and an OpenAI key and an LLM model. Set these values in the root level .env.

  • A GitHub token in GH_TOKEN (classic PAT with repo scopes; delete_repo only required for destructive operations such as deleting a repo.)
  • OpenAI key for LLM generation of repo descriptions and topics and an LLM model. I used 4.1-mini from Azure OpenAI.
GH_TOKEN=
GH_USER=YOUR_GITHUB_USER_ACCOUNT
OPENAI_API_KEY=
OPENAI_ENDPOINT=https://RESOURCE-NAME.openai.azure.com/openai/deployments/MODEL_NAME/chat/completions?api-version=API_VERSION
OPENAI_MODEL=gpt-4.1-mini
OPENAI_TEMPERATURE=0.2

Install and build dependencies

The root package.json uses npm workspaces to control and access all the packages in ./packages. Use the root package.json scripts to install and build the tool.

npm install
npm run build

Try out the tool

One-line run examples

  • Remove forks (dry-run):

    npm run start -w gh-cleanup -- remove-forks
  • Archive stale repos older than a year (dry-run):

    npm run start -w gh-cleanup -- archive-stale-repos
  • Delete empty repos, no PRs, no commits, size is 0 KB (dry-run):

    npm run start -w gh-cleanup -- delete-empty-repos
  • Categorize repos (fetch languages + README, output Markdown):

    npm run start -w gh-cleanup -- categorize-repos --fetch --output=md --out=generated/catalog.md
  • Summary (write generated/summary.md):

    This creates the final active list of repos in a markdown table. I'll use this in my dfberry.github.io website to find projects with specific code, configuration, or CI.

    npm run start -w gh-cleanup -- summary --summary-out=generated/summary.md

Full run

Once you understand what the tool does, use the ./scripts/run-all.sh to clean up your GitHub account. I used my personal dfberry account to test while building. Now that it is complete, I'll use it on my work diberry account for Microsoft.

npm run run-all:apply

The apply means empty repos are deleted and active repos are updated for description and topics.

AI-assisted development

I use AI tools (Copilot, Ask/Plan/Agent) to speed development while keeping human oversight. I commit frequently and review AI suggestions line-by-line. Choose models and session types (local, background, cloud) deliberately to control cost and behavior. Never let AI run unattended on critical changes—use dry runs and manual tests. I scaffold the repo, add features incrementally, and document decisions so I can return later. Copilot helps with comments and diagrams, but I always review and adjust its output.

Next steps

Cleanup goes beyond removing unused repositories. When tidying a personal or org GitHub account you may also:

  • Reclaim unused cloud resources referenced by projects (e.g., old deployments, test clusters, or storage buckets).
  • Remove or archive unused repositories that are forks, abandoned, or no longer relevant.
  • Find and fix failing or stale GitHub Actions workflows (update action versions or workflow syntax) or remove workflows that are no longer useful.
  • Update CI matrices and runtimes (programming language versions, OS matrix entries) to reduce CI cost and avoid testing very-old combinations.
  • Bump pinned GitHub Action versions and dependencies to address deprecations and security fixes.