3 posts tagged with "Squad"

PRDs Aren't Just for Code: Communication clarity that travels

May 24, 2026 · 21 min read

A PRD agent took a one-line issue in my workspace and turned it into a real implementation PRD: nine intake questions, phased work, named agents, acceptance criteria, and dispatch scripts. That one-line issue was enough for me because I already knew the repo, the conventions, and the missing context. It was not enough for an agent that had to move without me standing there to explain the rest. It was also not enough to show my management or partner teams what the intended work was or the value of that work. The PRD captured that communication clarity.

That same pattern showed up two more times this week. I used one PRD as a baseline against roughly three months of branding work and found a gap in the setup process I had missed. Then I used another PRD to compare planned scope against delivered work so the unanswered questions could turn into queued issues instead of another vague "we should look at this later."

A pink-haired woman turns a vague request into a structured PRD while agents begin moving

The work starts looking like it can move on its own only after the thinking stops being casual.

I keep hearing PRDs talked about as if they only belong to software feature work. I use them there too. But this week I kept reaching for the same pattern in project work, product work, and content work. Each time, the value was the same: the PRD made me write down the part I normally carry in my head.

A PRD is communication clarity. That's it. The document works because it makes the request unambiguous for the next reader. Sometimes that reader is a person. Sometimes it's an agent. The value is the same.

Start with what a PRD is

A product requirements document (PRD) is a structured answer to three questions: what are we building, for whom, and why? More simply, it is the place where I stop assuming the other side will fill in the blanks for me.

Before anyone starts building, the PRD turns the idea into requirements other people can actually act on. In a lot of teams, that means engineering, design, product, and sometimes marketing can line up around the same page before work starts.

That is still useful. What changed for me is what happens next. Now the same clarity has to carry agent work too. I can draft a PRD from rough bullets, update it as the work changes, and use it as the source for agent assignments, review checks, and follow-up issues.

That shift is why I started applying the pattern outside code. The useful part is the habit of slowing down long enough to answer the structured questions that make handoff possible. Agents need the same clarity humans do. Once I saw that clearly, it was hard not to use the same pattern for project planning and content operations too.

Stop treating clarity as optional

The easy story about AI agents is that I can stay incomplete — not specific enough — and the system will figure it out.

I have tried that often enough to know what happens next. The agent fills in the gaps. It makes assumptions. Those assumptions land as wrong choices. Then I'm back in, steering it turn by turn because every guess it made was wrong. Sometimes the work still gets done. But I'm driving now, not the agent.

That is why I keep coming back to PRDs. A good PRD is not useful because it looks formal. It is useful because it forces me to answer questions I would normally leave fuzzy.

Questions like:

What problem am I actually trying to solve?
What does done look like in a way another person or agent can check?
What is out of scope?
Which repo, project, or workflow owns this?
What existing documents already limit the answer?
What dependencies have to exist before anyone starts?
What acceptance criteria tell me the work is complete?
Who or what should do each phase?
What evidence would prove the work happened correctly?

flowchart LR
    subgraph L[Without PRD]
        A[Vague task] --> B[Clarify]
        B --> C[Rework]
        C --> B
    end
    subgraph R[With PRD]
        D[Intake answers] --> E[Clear requirements]
        E --> F[Independent execution]
        F --> G[Review]
    end

Nine questions is not a magic number. It just kept surfacing in my sessions this week. It pushed me past the comfortable version of the request. "We should improve our PRD workflow" sounds fine until I have to answer which workflow, which repos, which gaps, which owners, and what exact result counts as success. That is the moment the request stops being a loose idea and starts becoming something the system can actually use.

Run the experiment on work that isn't a feature

Requests that felt clear in my head were not clear enough to run independently. Once I started using PRD patterns outside their usual lane, the same document kept helping in different ways.

Session 1: Expand a one-line issue until agents can move

I had a one-line issue that made sense to me because I live in this project every day. It had enough context for a human who already knew the setup. It did not have enough context for a system that needed to break the work into phases, assign work to named agents, and move without waiting for me to answer basic questions.

So the PRD agent expanded that short issue into a real implementation PRD. The useful part was the added structure.

The PRD turned a compressed request into something other actors could use:

the problem statement stopped assuming insider context
the work broke into phases instead of one blended paragraph
acceptance criteria became explicit instead of implied
agent assignments were named instead of hand-waved
dispatch scripts could be generated because there was finally enough detail

Before the PRD, the request depended on me being available to explain the rest. After the PRD, that logic lived in the document. Once the requirements were explicit, dispatch was no longer a hope. The tooling had something solid to run.

The time investment moved to the front, which is boring in the best way. In my experience, that is where the payoff shows up later. I stop rescuing ambiguity after the fact because the plan can survive handoff.

Session 2: Use the PRD as a mirror, not a starting point

The second session changed how I think about PRDs. I was reviewing PRD-driven branding work, but instead of treating the document as a fresh plan, I treated it as a claim and compared it against the artifacts the week had already produced.

One of the most useful findings was a missing validation step I'd overlooked. That kind of check was easy to miss while the surrounding work was already moving. The PRD gave me something stable to compare against. Without it, that omission would have stayed buried inside the blur of ongoing work.

Once the gap was visible, the next step was obvious. I spawned two follow-on actions from the review findings:

one to fix the ownership document so responsibilities were clear
one to create a CI triage skill

The review did not end at "interesting gap." It created follow-on work with owners and specific files to update. One agent fixed the ownership document. Another created a validation skill so the check would run next time.

Session 3: Compare scope to outcomes, then let the gaps create work

The third session pushed the same pattern one step further. I used the PRD not just to plan or review, but to ask what actually happened. I compared PRD scope against real work outcomes to find the places where my planned story and the delivered story did not match.

The completeness check surfaced open questions. Once those questions existed in a named list, I could answer them directly and turn the unresolved gaps into issues for later agent work.

The flow was simple: planned scope checked against reality produced open questions, my answers turned those questions into queued issues, and the queued issues were ready for later agent work. It was repeatable.

With the PRD, I had a stable statement of intent. I handled the judgment calls, and the agents handled the mechanical translation after the missing information was written down.

If I strip the three sessions down to their bones, this is what happened:

Session	Starting point	What the PRD did	What became possible next
Convert a one-line issue to a PRD	one-line issue	expanded intent into phases, acceptance criteria, and agent assignments	hands-off dispatch with less human follow-up
Review PRD for branding	existing PRD plus three months of work	exposed mismatch between intended scope and actual execution	spawned targeted follow-on agents for ownership-document updates and CI triage skill work
Compare PRD with work	finished work compared against planned scope	surfaced open questions and unresolved gaps	generated issues that could be queued for later agent execution

The PRD was not just a status document. It was a conversion layer between what I meant and what could be done.

Follow the pattern that kept repeating

Start by forcing the intake answers into the open

The biggest misconception I had to drop is that the initial request is the hard part. Usually it is not. Usually the hard part is everything the request assumes.

A sentence like "convert this issue into a real plan" sounds efficient because it compresses the task. But that efficiency is fake if the next actor has to unpack hidden assumptions before doing anything useful.

flowchart LR
    A[Vague request] --> B[PRD intake]
    B --> C[Specificity]
    C --> D[Independent execution]

For me, the PRD intake phase looked less like "writing a document" and more like pinning down the variables that were floating around informally:

what the request is asking for in plain language
what should exist at the end
what phase boundaries keep the work from smearing together
which constraints come from project conventions, operating rules, or ownership docs
what needs human approval versus what can run on its own
what evidence will make review easy later

This is the thinking tax. It costs something up front. It slows down the moment where I get to feel like I already started. I have to stop, answer, narrow, and sometimes admit I do not actually know what I want yet. The payoff shows up later.

Check completeness before you confuse motion with coverage

The second phase is the one I underused before this week: completeness checking.

I used to think of PRDs mostly as forward-looking documents that I would write and then execute. They are just as useful as review tools.

A PRD lets me ask a direct question: did the work we actually did cover the work we said mattered?

When work moves across multiple agents, multiple repos, and multiple days, motion starts to feel like progress whether or not the original scope has been covered. A completeness check interrupts that illusion.

In practical terms, it helped me inspect:

whether every major acceptance area had corresponding work
whether implied dependencies had been made explicit
whether ownership boundaries were still accurate
whether validation steps existed, not just good intentions
whether the missing pieces were small omissions or larger design gaps

The branding review session made this concrete for me. A missing check in the onboarding workflow was not the kind of thing I would have reliably caught by reading status updates alone. It became visible because I had a clear frame for what should have been there.

Turn the gaps into dispatchable work

Once the PRD review surfaced missing pieces, the follow-up path became much cleaner than I expected:

flowchart TD
    A[Open questions] -->|human answers| B[Human items]
    B -->|once answered| C[Ready requirements]
    C -->|assign or queue| D[Agent or queue]
    D -->|no translation| E[Dispatch]

I like that because it keeps the human work and the automation work in the right places. The human work is making decisions. The automation work is transforming those decisions into execution steps. If the document is sloppy, those jobs collapse into each other and I end up doing both.

A pink-haired woman directs agents as work packets move from a PRD board into execution queues

The payoff is not that the human disappears. It's that the human gets to stop re-explaining the same intent.

Let work run on its own after the meaning is stable

By the end of the week, the line I kept writing down was simple: work runs best on its own after the meaning is stable.

I need a PRD because independent work magnifies whatever level of clarity I provide. If I hand over a fuzzy request, the system scales fuzz. If I hand over a bounded requirement with owners and checks, the system scales useful action.

That is why I think PRDs are underused outside code. A lot of non-code work still assumes human availability will absorb the ambiguity for free. Project work, product work, and content work are full of requests that sound understandable in conversation but are not clear enough to survive handoff.

Push the pattern into project management

I keep seeing a gap between backlog clarity for humans and backlog clarity for agents.

Project systems are often optimized for coordination among people who already know how to fill in the blanks. We can see a short title, remember the meeting, infer the constraint, and keep going. Agents are much more literal. If the work item does not carry the missing pieces, the queue looks fuller than it really is.

When I look at project management through the PRD lens, I stop asking whether the board is organized and start asking whether each major item can survive handoff without live clarification. That changes the shape of the document.

Instead of a loose epic with a few bullets, the more useful version looks like this:

clear statement of the problem the epic is trying to solve
boundaries between phases so tasks do not overlap
acceptance criteria that can be checked after work lands
routing clues about which agents or teams own which slice
dependency notes that prevent premature execution
validation expectations so the review step is not invented on the fly

Once the project document is explicit enough, breaking the work down gets easier. Work items stop being reminders for future humans and start becoming units that can move.

A pink-haired woman organizes a kanban board while agents pull clearly defined work items into motion

The board gets more useful the moment each card carries enough meaning to travel on its own.

It means putting detail where it changes execution and leaving everything else light.

One shift that helped me was seeing acceptance criteria as scheduling tools, not just review tools. If an epic says it is done when three specific outcomes exist, decomposition gets cleaner. If the epic just says "move this initiative forward," the board can look busy for a long time without telling me whether the right work is actually in flight.

The practical signal for me is simple: if I expect the work to be done asynchronously, across roles, or by agents, the request probably needs PRD treatment whether or not the output is software.

Push the pattern into product management

Traditionally, product teams used PRDs to line people up before code started. The PRD was the single place where the team could see the user problem, the proposed solution, the requirements, the success measures, and the boundaries. What changes now is that the same document also has to support AI-assisted drafting, routing, review, and execution.

The old mental model was document first, handoff second. The newer one I am experimenting with is requirement first, routing second, independent execution third. A product document that is only persuasive is not enough. A product document that supports execution has to name decisions, constraints, success conditions, and trade-offs in a way other actors can use.

The PRD session where a short issue got expanded into a full implementation artifact made this very concrete for me. The expansion was not about adding more words because longer is better. It was about adding enough structure that each downstream actor could tell what they owned. Implementation phases, acceptance criteria, agent assignments, and dispatch scripts existed because the PRD supported dispatch.

That matters because product requests are often written for alignment first and execution second. That is fine if humans are going to sit together and negotiate the rest in real time. But once I want agents, or loosely coupled teams, to move without hand-holding, the requirement has to answer the follow-up questions before they are asked.

A pink-haired woman reviews feature sketches and requirement pages while agents work from the clarified spec

The specification earns its keep when other actors can move from it without guessing what I meant.

One thing I appreciate here is how PRDs expose whether I really made a decision or just postponed it. If the document leaves a major constraint unstated, that is not neutrality. That is hidden work for whoever picks it up next.

That is one of the clearest ways AI acts like a collaborator instead of a magician. It makes my vagueness expensive.

The useful thing was not the system pretending to know the answers. The useful thing was that it made the missing answers painful enough that I finally wrote them down.

Where it broke down was whenever I tried to skip that step and expected the system to infer intent from shorthand. It can infer a lot. It still should not be asked to infer the core requirements.

The sweet spot is not maximum detail. It is enough detail that other actors can move without reopening the problem definition every hour.

Push the pattern into content management

Content management may be the least obvious place for this, but content work is full of documents that already behave like PRDs even when nobody calls them that. Article plans, content strategy docs, editorial calendars, coverage matrices, freshness reviews, taxonomy decisions, and publishing workflows all describe intended outcomes, constraints, ownership, sequence, and validation.

Content work often has the same hidden-context problem. We know an article is stale. We know a strategy doc implies missing tutorials. We know a calendar entry means someone needs a draft, images, metadata, and review. But unless that thinking lands in a document with clear boundaries, the work stays socially clear and practically fuzzy.

It breaks down when I want content audits, freshness checks, coverage-gap detection, or article scaffolding to run with less manual glue.

If an article plan is written like a real requirement document, I can review it for completeness, compare planned coverage against published coverage, detect gaps, and route the missing work with less back-and-forth. The artifacts are concrete: a markdown file, a matching media folder, frontmatter fields like draft: true and keywords, and a build command that fails if something is wrong. The operations around it do not have to stay mysterious.

A pink-haired woman sorts article outlines and editorial plans while agents manage the operational content flow

The content strategy starts acting like a system once the editorial intent is written in a form the system can inspect.

The three sessions from this week map cleanly to content operations:

flowchart TD
    A[Thin Idea] -->|turns into| B[Article PRD]
    B -->|checked with| C[Strategy Check]
    C -->|raises new| D[Editorial Questions]
    D -->|answers create| E[Assignable Work]
    E -->|flows into| F[Content Systems]

If I sketch what an article PRD needs in order to support content operations without me stepping in, it looks a lot like the software version: audience, intent, angle, exclusions, source material, freshness risk, required assets, review checkpoints, and a definition of done that is more concrete than "publish something good."

Choose when the thinking tax is worth it

Speed was the first one. If I have a request in my head and a path that feels mostly clear, the last thing I want is a form asking me to pin down acceptance criteria, boundaries, or dependencies. Not every task deserves full PRD treatment.

One guardrail I keep coming back to is handoff count. If the work will stay with one person in one short session, I probably do not need a full PRD. If the work will cross time, tools, repos, reviewers, or agents, the cost of under-specifying it rises fast. That is when the thinking tax looks cheap compared to cleanup.

False confidence was the second rough edge. A polished document can look complete even when it missed an important gap. The answer is "treat the PRD as something I can review and revise."

Judgment was the third rough edge. When the completeness check surfaced open questions in Session 3, I still had to answer them. The system could not responsibly invent those answers for me. The point of the PRD pattern is not to erase human decision-making. It is to capture it cleanly enough that it happens where it should.

PRDs expose where I am still hand-waving. A vague request lets me keep the illusion that I know what I mean. A requirement document asks me to prove it. Sometimes I discover that I do not actually have an answer yet.

Keep following the work toward more independent systems

The PRD is valuable because it makes the request clear enough that someone else can act on it without guessing.

Those are different surfaces of the same idea: a PRD is communication clarity. I used to think of PRDs mostly as a prelude to implementation. Now I think of them as a reusable requirement document that can support planning, auditing, routing, comparison, and dispatch across much more than code.

I do not think the future is "agents replace planning." My week suggested the opposite. The more I want the system to work on its own, the more seriously I have to take the planning document. The document works because it makes the request unambiguous for whoever reads it next, human or agent.

My next test is whether article PRDs can survive metadata review, asset checks, and npm run build without me reconstructing the brief from memory. Editorial judgment — the moment when a sentence sounds unlike me, or a claim needs a receipt — still needs a human in the loop. That is a limit on what the document can carry, not a reason to skip writing it.

GitHub Copilot: From Basics to AI Agents

May 15, 2026 · 22 min read

Watercolor illustration of a woodworker in blue meeting his first AI helper in green at a furniture workshop

Imagine a furniture workshop. You're the craftsperson in the blue shirt — the one with the vision, the taste, the final say. The helpers in green shirts? Those are your AI agents. At first there's just one, handing you the right chisel at the right moment. By the end of this journey, you'll have a whole crew in green building furniture to your specifications while you direct, decide, and review.

A year ago, I was tab-completing function signatures. Today, I manage a team of named AI agents that handle PR reviews, documentation sweeps, and infrastructure audits.

That sounds like a sales pitch. It's not. It's a progression that happened one level at a time, each building on the last. And the best part? You can start the same journey in about 15 minutes.

Here's the path I took — four levels, from "ooh that's cool" to "wait, this changes everything."

The TL;DR

Level	What Changes	Time to Value
1. First Day	You get an AI pair programmer (IDE + CLI)	15 minutes
2. Making It Yours	Copilot learns YOUR codebase (instructions, MCPs, skills)	1-2 hours
3. Squad	A team of agents working in concert	1 day
4. Autonomous Ops	Fully defined work executes itself	2-3 days

Each level builds on the previous one, and each is independently useful. Once you see what's possible at each stage, you'll want to keep climbing.

Badge legend: 🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 🤖 Autonomous · 💻 Local · ☁️ Cloud · 🌐 GitHub.com

Level 1: Your First Day with Copilot

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of a blue-shirted craftsperson at the workbench while a green-shirted helper steadies the joint

Your first day in the workshop. You're at the bench with your mallet (blue shirt), fitting a dovetail joint. Your one helper in green steadies the piece, hands you the right tool before you ask, and suggests a better angle — but you swing the mallet.

This is where everyone starts — and honestly, where most of the immediate productivity gains live. Level 1 spans two environments: Copilot in your IDE (VS Code, JetBrains, etc.) and the standalone Copilot CLI in your terminal.

In the IDE: Inline Completions & Inline Chat

🖥️ VS Code · 👤 Interactive · 💻 Local

Inline completions — the thing most people think of as "Copilot." You type, it suggests. But it's more than autocomplete. It reads your open files, your comments, your function signatures, and generates contextually aware suggestions. This happens directly in your editor as you type.

Inline chat — highlight code, press Ctrl+I, ask a question. "Explain this regex." "Refactor this to use async/await." "Add error handling." It edits in place within the current file.

In the IDE: Copilot Chat Panel

🖥️ VS Code · 👤 Interactive · 💻 Local

The Chat panel (Ctrl+Shift+I or the sidebar) opens a conversation with Copilot that has broader awareness:

Open file context — ask questions about the file you're looking at: "What does this function do?" "Find the bug in this logic."
@workspace — ask about the entire repository: "Where is authentication handled?" "Show me all API routes." Copilot searches across your project.
@terminal — get help with shell commands without leaving the IDE: "How do I find large files?" "What's the git command to squash commits?"
Agent mode — Copilot Chat also has an "agent" mode where it can make multi-step edits, run terminal commands, and iterate. This is powerful for IDE-based workflows, but note: this is different from the Squad "agents" discussed later. Agent mode is a single AI working iteratively; Squad agents are specialized team members working in concert.

The Standalone Copilot CLI

⌨️ CLI · 👤 Interactive · 💻 Local

The copilot command brings the full Copilot agent to your terminal — file editing, shell commands, sub-agents, and more:

# Non-interactive prompt mode:
copilot -p "extract a .tar.gz file preserving permissions"

# Ask about git:
copilot -p "undo my last commit but keep the changes"

# Start an interactive session:
copilot

The standalone CLI (copilot) is a full agent runtime — it can read/write files, run commands, and orchestrate complex tasks from your terminal. It's distinct from the IDE chat panel but equally powerful.

When to Use Each

Context	Best For
Inline completions	Flow-state coding, writing new functions
Inline chat (`Ctrl+I`)	Quick edits to selected code
Chat panel (open file)	Understanding code you're reading
Chat panel (@workspace)	Finding things across a project
Chat panel (@terminal)	Shell command help inside IDE
Agent mode (IDE)	Multi-step edits within a project
`copilot` CLI	Terminal-first workflows, scripting, automation

Try This Now

Install GitHub Copilot in VS Code
Open any project, start a new file, write a comment:

// Parse a CSV string into an array of objects using the first row as headers

Copilot will generate the implementation. Tab to accept.

Install the standalone Copilot CLI and try:

copilot -p "explain why this Node.js app leaks memory when processing large CSV uploads"

What I Learned at Level 1

The biggest gain wasn't the code generation — it was the velocity shift in unfamiliar territory. Working in a language I don't know well? Copilot bridges the gap between "I know what I want" and "I know the syntax." It turned 30-minute research into 30-second completions.

The limitation: Copilot at this level generates generic best-practice code. It knows nothing about your specific conventions or preferences. That leads to ...

Level 2: Making Copilot Yours

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of a blue-shirted craftsperson alone, setting up custom jigs and labeled drawers

No green shirts in sight — this is setup time. You're alone at the bench, labeling drawers, building custom jigs, and pinning reference cards to the pegboard. You're not building furniture yet; you're building the system that makes your workshop uniquely yours. When the green-shirted helpers return, they'll know exactly where everything goes.

Level 1 Copilot is smart but generic. Level 2 is where it starts feeling like a teammate who's read your wiki. This level works in both the IDE and CLI — the same instruction files and MCP configs are picked up by Copilot Chat in the IDE and Copilot CLI.

Custom Instruction Files

Drop instruction files in your repo and Copilot learns your conventions:

.github/copilot-instructions.md — global instructions for all Copilot interactions:

# Project Conventions

- Use TypeScript strict mode with explicit return types
- Prefer `Result<T, Error>` pattern over throwing exceptions
- All API responses follow our envelope format: `{ data, error, meta }`
- Tests use vitest with the `describe/it` pattern
- Never use `any` — prefer `unknown` with type guards

AGENTS.md — agent instructions that can live anywhere in your repo. Unlike copilot-instructions.md (which must be in .github/), you can place multiple AGENTS.md files at different directory levels — the nearest one in the directory tree takes precedence. This makes it ideal for monorepos where each package needs its own agent behavior:

my-monorepo/
├── AGENTS.md              ← shared team-wide instructions
├── packages/
│   ├── frontend/
│   │   └── AGENTS.md      ← React-specific agent rules (wins here)
│   └── backend/
│       └── AGENTS.md      ← API-specific agent rules (wins here)

Every suggestion Copilot makes now respects these rules. No more "helpful" suggestions that violate your architecture.

MCP Servers: Giving Copilot New Abilities

Model Context Protocol (MCP) servers let you plug external data sources and tools into Copilot's context. Think of them as APIs that Copilot can call mid-conversation — in both the IDE and CLI.

// .copilot/mcp.json
{
  "mcpServers": {
    "azure": {
      "command": "npx",
      "args": ["-y", "@azure/mcp@latest", "server", "start"]
    }
  }
}

Now Copilot can query your Azure resources, check deployment status, or read your database schema — all within the conversation.

Some MCP servers I use daily:

Copilot for Azure — query Azure resources, check deployments
GitHub MCP — deep repo operations beyond what's built-in
Microsoft Learn MCP — let Copilot read/write files outside the workspace

Skills: Repeatable, Deterministic Work

Skills are the underrated powerhouse of Level 2. A skill is a directory with a SKILL.md file that defines a repeatable pattern — including deterministic steps from scripts and code.

.<directory>/skills/
├── pr-review/
│   └── SKILL.md        # "Run lint, check test coverage, review diff"
├── doc-sync/
│   └── SKILL.md        # "Compare API surface to docs, flag drift"
└── sdk-sample-check/
    └── SKILL.md        # "Validate all samples compile and match SDK version"

Read the Visual Studio documentation for the best directory location for your skill usage.

Skills differ from instructions in that they define executable workflows — not just preferences. A skill can include shell commands to run, files to check, and decision trees to follow. They're reusable across sessions and agents.

Why skills matter:

Repeatable — same process every time, no drift
Composable — skills can reference other skills
Deterministic where needed — embed scripts and validation steps that always run the same way
Shareable — check them into your repo, the whole team benefits

Try This Now

Create .github/copilot-instructions.md with your project's conventions
Add an MCP server for a tool you use daily (Azure, database, etc.)
Create a .github/skills/quick-review/SKILL.md that describes your code review checklist

Then open Copilot Chat or run copilot and notice the difference — it follows YOUR patterns now.

What I Learned at Level 2

Custom instructions are absurdly high-leverage. A 50-line markdown file eliminates 80% of the "no, not like that" moments. MCP servers bridge "Copilot that knows code" and "Copilot that knows your infrastructure." Skills turn tribal knowledge into executable processes.

The limitation: everything is still per-session. Copilot doesn't automatically carry context between sessions — it won't remember decisions from yesterday's refactor. It doesn't have persistent context about your project's evolving state. It doesn't coordinate with other instances of itself.

Enter Squad.

Level 3: Squad — A Team Working in Concert

🖥️ VS Code · ⌨️ CLI · 👤 Interactive · 💻 Local

Watercolor illustration of craftspeople collaborating at a shared workbench in a woodworking shop

The workshop is getting busy. You're at the bench, studying the blueprint. Around you, a small team of helpers is assembling a cabinet together — one holds the frame, another drives the dowels, another checks the level. Each knows their role. Each stays in their lane. The work moves faster because they each know their job and coordinate with each other, not just with you.

This is where the mental model shifts from "AI assistant" to "AI team."

Squad gives you a team of specialized agents working in concert to complete tasks, where each member's expertise and boundaries positively impact the result. It's not just "named agents with memory" — it's a coordinated system where the reviewer's standards shape the coder's output, and the docs writer's perspective catches gaps the implementer missed.

Squad runs on the Copilot CLI (copilot --agent squad) and adds the organizational layer that makes agents feel like a real team rather than a single assistant wearing different hats.

What Makes Squad Different

Feature	Regular Copilot	Squad
Memory	Session-based	Persistent across sessions
Identity	Generic assistant	Named agents with charters
Coordination	You manage context	Agents hand off to each other
Specialization	Same agent for everything	Domain-specific agents with boundaries
Result quality	One perspective	Diverse perspectives improve output

Installing Squad

# Install Squad CLI
npm install -g @bradygaster/squad-cli

# Initialize in your project
npx @bradygaster/squad-cli init

# Start Copilot with Squad (standalone CLI)
copilot --agent squad

This scaffolds a .squad/ directory:

.squad/
├── agents/
│   ├── ralph/           # Orchestrator
│   │   └── charter.md
│   ├── reviewer/
│   │   └── charter.md
│   └── docs-writer/
│       └── charter.md
├── ceremonies/
│   └── sweep.md
└── memory/
    └── decisions.md

Agent Charters: Expertise + Boundaries

Each agent has a charter — a markdown file that defines who they are and what they do and, critically, what they won't do:

# Reviewer Agent Charter

## Identity
You are the code reviewer for this project. You focus on:
- Security vulnerabilities
- Performance anti-patterns
- Consistency with project conventions

## What I Own
- TypeScript files and build system

## Boundaries
- Never approve your own changes
- Escalate architectural concerns to the team lead
- Don't refactor code that isn't in the PR scope

The boundaries matter as much as the expertise. A reviewer that knows when to escalate produces better outcomes than one that tries to handle everything. The interplay between agents — where one's output becomes another's input — is what makes Squad feel like a team rather than parallel solo workers.

Ceremonies: On-Demand Structured Workflows

Ceremonies are repeatable workflows you trigger when needed:

# Ceremonies & Rituals

## Design Review

**When:** Before PRD implementation begins  
**Who:** <list of named agents>
**Purpose:** Validate requirements, issue templates, and process flow before work starts

## Retrospectives

**When:** After major deliveries (GitHub Projects setup, issue templates, Actions automation)  
**Who:** All team members  
**Facilitator:** <single agent name> 
**Purpose:** Reflect on what worked, what didn't, continuous improvement

## Cross-Repo Sync

**When:** As needed  
**Owner:** <single agent name>
**Purpose:** Ensure coordination across all projects (reads repos.json for scope)

Ceremonies encode your team's best practices into executable workflows that any agent can run consistently.

Try This Now

With the Squad open in a Copilot CLI interactive chat, assign work to the squad.

# Then talk to the team:
"Team, fan out and review this PR for security issues"
"Ralph go"

What I Learned at Level 3

The agents and charter system is what makes Squad click. With it, you have agents that maintain consistent behavior, remember decisions, and build expertise over time. Without it, you have "Copilot with extra steps."

The real insight: diversity, expertise, coordination, and boundaries create quality.When the reviewer can't approve its own work, when the docs writer must verify against actual code, when the security agent escalates instead of guessing — the team produces better results than any single agent could alone.

The honest trade-off: Squad requires investment in codifying your work patterns and practices. A poorly-defined agent is worse than no agent because it gives inconsistent results. Spend the time upfront.

Level 4: Autonomous Operations

🖥️ VS Code · ⌨️ CLI · 🤖 Autonomous · 💻 Local · ☁️ Cloud

Watercolor illustration of workers building furniture at separate workbenches in a woodworking shop

The helpers are working independently across the shop — each at their own bench, each building a different piece from your specifications. One saws, one hammers, one planes. You glance across the room and trust the work because you wrote clear blueprints for your team of experts. They don't need you hovering.

Level 4 is where the work has been fully defined and you just need it completed. You've already figured out what needs to happen — now you hand it off and let the system execute.

This is the difference between "AI that helps me work" and "AI that does the work I've specified."

Five Ways to Run Autonomously

1. VS Code Agent Mode

🖥️ VS Code · 🤖 Autonomous · 💻 Local

In VS Code, Copilot's agent mode executes multi-step tasks — reading files, running commands, editing code — without manual intervention. You describe the outcome, and agent mode figures out the steps:

# In VS Code Copilot Chat (agent mode):
"Refactor all API handlers to use the new error envelope format"

Agent mode uses your custom instructions and MCPs from Level 2, so it already knows your project's conventions. Best for: well-scoped tasks while you're in the IDE.

2. Copilot CLI Agent Mode

⌨️ CLI · 🤖 Autonomous · 💻 Local

The standalone CLI provides the same autonomous execution outside VS Code:

# CLI agent mode — executes the full task autonomously
copilot -p "Refactor all API handlers to use the new error envelope format"

Best for: well-scoped tasks from the terminal, scripted workflows, or when you prefer the command line over the IDE.

3. "Ralph, go" — Squad Work Queue (In-Session)

⌨️ CLI · 🤖 Autonomous · 💻 Local

Ralph, the Squad work monitor, processes your entire work queue autonomously within a Copilot session. First, connect Squad to your repo's issues ("pull issues from owner/repo"). Then Ralph triages those issues, assigns work to the right specialist agents, monitors progress, and keeps going until the board is clear:

# In a Copilot CLI session with Squad:
copilot --agent squad

# Then:
"Ralph, go"          # → Starts processing the work queue
"Ralph, status"      # → Shows what's open, stalled, or ready to merge

Ralph monitors GitHub issues, triages incoming work, and drives tasks through your agent team without you intervening. It doesn't stop between tasks — it keeps cycling until everything is done.

Best for: in-session work queue processing, multi-agent coordination, and batching related tasks.

4. Squad Watch — Persistent Local Monitoring

⌨️ CLI · 🤖 Autonomous · 💻 Local

When you're away from the keyboard but your machine is on, squad watch provides persistent polling of your GitHub issues:

# Polls every 10 minutes (default)
npx @bradygaster/squad-cli watch

# Custom intervals
npx @bradygaster/squad-cli watch --interval 5    # every 5 minutes
npx @bradygaster/squad-cli watch --interval 30   # every 30 minutes

This runs as a standalone local process (not inside Copilot) that auto-triages issues from your connected repo, assigns work based on team roles and keywords, and routes issues to agents or @copilot for pickup. It runs until you Ctrl+C. (Requires the same repo connection set up via Squad.)

Best for: overnight monitoring, catching issues while you're in meetings, and persistent triage between active sessions.

5. Copilot Cloud Agent (GitHub Issues)

☁️ Cloud · 🤖 Autonomous · 🌐 GitHub.com

Assign a GitHub issue to Copilot and it works independently — no terminal open, no local setup. The cloud agent runs in a GitHub Actions-powered ephemeral environment: it researches your repo, creates a plan, makes code changes, and opens a PR.

Trigger it from a GitHub issue comment:

<!-- In a GitHub issue comment: -->
@copilot implement this

Or assign the issue to Copilot directly from the GitHub Issues UI, VS Code, JetBrains, or the GitHub CLI.

The cloud agent works best for well-scoped, clearly described issues: "add a new endpoint that follows the existing pattern," "write tests for this module," "update the config schema to support the new field." Think of this as a single async task you hand off — describe the outcome clearly, and come back to a PR.

It uses GitHub Actions minutes and Copilot premium requests, so you're trading compute for time. No local session required; the work happens entirely on GitHub's infrastructure.

When to Use Each

Approach	Best For	Runs On	Requires Active Session?
VS Code agent mode	IDE-scoped tasks	Your machine	Yes
`copilot` CLI	Terminal-scoped tasks	Your machine	Yes
"Ralph, go"	Work queue + coordination	Your machine (Squad)	Yes
`squad watch`	Persistent monitoring	Your machine (background)	No — standalone process
Copilot cloud agent	Issue-driven implementation	GitHub's cloud	No — fully async

The Autonomy Spectrum

These options form a spectrum from "I'm here watching" to "I'm asleep":

You're present              You're away              You're asleep
─────────────────────────────────────────────────────────────────
VS Code agent mode    →    squad watch        →    Copilot cloud agent
Copilot CLI           →    (machine on)       →    (GitHub cloud)
Ralph, go                                         

What I Learned at Level 4

The key insight: autonomous execution requires fully-defined work. The quality of autonomous output is directly proportional to how clearly the task was specified. Vague issues get vague results. A well-written issue with acceptance criteria, examples, and constraints? That's where autonomous execution shines.

The cloud agent on GitHub is the lowest-friction option — no local setup, just assign an issue. squad watch bridges the gap between active sessions and cloud — your machine monitors and triages even when you're not in a Copilot session. Ralph is best when you're actively working through a backlog and want coordinated multi-agent execution.

Finding Your Path

Not everyone takes the same route through these levels:

Role	Start Here	Quick Win	Level Up
Engineer	Level 1 (completions + CLI)	Custom instructions for your stack	Skills for repeatable reviews
PM/Content	Level 1 (chat for drafting)	Custom instructions for voice/style	Squad ceremonies for sweeps
Team Lead	Level 2 (instructions + MCPs)	Skills for team processes	Squad for coordinated reviews
Platform	Level 2 (MCP + infra context)	Squad for monitoring	Squad for always-on monitoring

The Ecosystem at a Glance

The Copilot ecosystem is growing fast. Here are the key resources:

Essential Tools

GitHub Copilot Extension — the IDE extension (VS Code, JetBrains, etc.)
Copilot CLI — standalone copilot command for terminal
Squad CLI — named agents working in concert (npm i -g @bradygaster/squad-cli)

Learning & Community

Agentic SDLC Handbook — patterns for AI-first development
Copilot Insights — measure your Copilot usage
Awesome Copilot — community-curated extensions, skills, and tools (repo)

Infrastructure

Microsoft MCP — Model Context Protocol servers
Copilot for Azure — Azure resource context in Copilot

What Actually Changed for Me

I want to be honest about what's different after six months at Level 3+:

What improved:

PR turnaround dropped from days to hours (the green shirts handle first-pass review)
Documentation stays in sync with code (sweep ceremonies catch drift)
I work in unfamiliar codebases with dramatically less ramp-up time
Boilerplate tasks that used to take 30 minutes take 2 minutes
Skills encode my best practices — I define a process once, it runs the same way forever

What didn't change:

Architecture decisions still require human judgment
Debugging subtle logic errors still requires deep thought
Agent output needs review — trust but verify
Writing good charters and instructions is a skill that takes time to develop, update, and improve

The mental model shift: I stopped thinking "what code do I need to write?" and started thinking "what work needs to happen, and who should do it?" Sometimes the answer is me — blue shirt at the bench, swinging the mallet. Often it's a green shirt with clear instructions and a well-scoped task.

Start Today

You don't need to plan all four levels. Start where you are:

Never used Copilot? → Install the extension, write a comment, press Tab. That's it.

Using Copilot but it's generic? → Write a copilot-instructions.md file and one skill. 10 minutes, massive payoff.

Want more than autocomplete? → Install Squad CLI, write one agent charter, run copilot --agent squad.

Ready for autonomous execution? → Try copilot --autopilot on a well-defined task, or assign an issue to the cloud agent.

The progression is natural. Each level solves a real problem you'll discover at the previous one. And unlike most "AI transformation" pitches, you can validate the value at every step before investing in the next.

The future of development isn't AI replacing developers. It's developers who know how to orchestrate AI systems outperforming those who don't. The tools are here. The ecosystem is open source. The only question is which level you start at.

Want to go further? The next post in this series covers Cloud-Scale Agent Fleets for Level 5 — coming soon.

📣 GitHub Copilot Dev Days — Next Week!

Want to go deeper? GitHub Copilot Dev Days are happening next week with sessions in multiple languages and time zones:

🗓️ May 25, 2026 at 7 PM (BRT) — GitHub Copilot Dev Days Brazil [Portuguese]
🗓️ May 26, 2026 at 12 PM (CDMX) — GitHub Copilot Dev Days LATAM [Spanish]
🗓️ May 26, 2026 at 7:30 PM (CST) — GitHub Copilot Dev Days 中文版 [Simplified Chinese]
🗓️ May 27, 2026 at 9 AM (PST) — GitHub Copilot Dev Days [English]

These are free, virtual events covering the latest in Copilot extensibility, agentic development, and the ecosystem tools discussed in this post. See you there!

Have questions or want to share your own journey? Find me on GitHub at @dfberry or check out my other posts on the Copilot ecosystem.

Exploring Copilot CLI Session Management to Improve Squad

April 16, 2026 · 13 min read

I've been using Squad, an AI team framework built on top of Copilot CLI, and I kept wondering: Copilot CLI already tracks everything that happens in a session — could that data make Squad's agents smarter? I spent some time digging into how both systems manage session data, and I think there's an untapped opportunity.

This post is my investigation notes — what I found, how the two systems compare, and where I think they could be combined for more value.

My working theory: Copilot is your diary (what happened). Squad is your playbook (what to do about it). Right now they're like two lighthouses on opposite shores of Bellingham Bay — both useful, but no bridge between them.

Two lighthouses on opposite shores of Bellingham Bay, their beams not quite connecting

What I Found: Two Memory Systems

Copilot CLI: The Raw Record

Copilot CLI records every session — prompts, responses, tool calls, file changes, and checkpoints. I discovered it powers:

/resume — pick up where you left off in any previous session
/chronicle — generate standup reports, get personalized tips, improve your custom instructions
/session — view and manage your sessions directly from the CLI

Session data lives in ~/.copilot/session-state/ as files and in ~/.copilot/session-store.db as a structured SQLite database.

What Copilot remembers: Everything that happened in every session — the full transcript.

What it doesn't do: Extract meaning. Copilot stores the raw conversation, not the conclusions you drew from it.

Squad: The Distilled Knowledge

Squad's memory is different — and this is where I see the gap. It's not a transcript — it's distilled knowledge, stored as markdown files in your repo:

What	File	Purpose
Team decisions	`.squad/decisions.md`	Shared brain — every agent reads this
Agent memory	`.squad/agents/{name}/history.md`	Personal learnings per agent
Skills	`.copilot/skills/{name}/SKILL.md`	Repeatable tasks with everything needed to execute
Session state	`.squad/sessions/*.json`	Resume data (gitignored by default)
Scribe logs	`.squad/log/*.md`	Session summaries (gitignored by default)

What Squad remembers: Decisions, patterns, preferences, and skills — the things that should change how agents behave next time.

What it doesn't do: Record the full conversation. That's Copilot's job.

The Gap I See

The two systems complement each other, but right now they're completely disconnected — like looking across Deception Pass and seeing the other side but having no way to cross.

Water rushing through a rocky gorge at Deception Pass — two cliff faces close together with no bridge between them

Here's where each system shines:

Question	Where to look
"What did I do last Tuesday?"	Copilot — `/session` or `/chronicle standup`
"What did the team decide about auth?"	Squad — `.squad/decisions.md`
"Have I worked on this file before?"	Copilot — `/session` to browse past sessions
"How do we run a content audit?"	Squad — `.copilot/skills/content-audit/SKILL.md`
"What went wrong last time I tried this?"	Copilot — session transcript via `/resume`
"What does this agent know about TypeSpec?"	Squad — `.squad/agents/{name}/history.md`

This separation works, but it's manual. You have to be the bridge — ferrying insights across the water yourself. That's the opportunity I'm investigating.

Where I Think Session Data Could Improve Squad

Squad has a built-in skill called reskill ("team, reskill") that audits agent charters and histories, extracts shared patterns into skills, and compresses bloated files. Think of it as sorting the morning catch on a Bellingham dock — keeping what's valuable, tossing the rest back.

Fisherman on a Bellingham dock sorting the morning catch into labeled crates

But reskill today is purely file-based — it reads .squad/ markdown and looks for textual duplication. It has no idea what actually happened in sessions.

Here's what I think session data could add:

Signal from Copilot sessions	What Squad could do with it
Agent X was spawned 40 times but only useful 25 times	Refine charter to reduce misfires
Agent Y always gets the same 3 files as input	Bake those into charter's "What I Own"
Users keep correcting the same mistake	Extract as anti-pattern in a skill
An agent never gets spawned	Flag for removal during reskill
Two agents always get spawned together	Suggest merging or formalizing the pairing
Certain skills are read but never applied	Deprecate during reskill
Session durations spike after charter changes	Detect regressions from past reskills

There are two existing proposals in the Squad repo that go in this direction — tiered memory (#600, open) for hot/cold/wiki context layers, and reflect (#621, closed PR — not merged) for in-session learning capture. Neither one references Copilot CLI session data though. They're both Squad-internal. The bridge between Copilot's behavioral data and Squad's knowledge system doesn't exist yet.

Ideas I'm Exploring

The theme here is a feedback loop — raw session data flows downstream, gets refined into knowledge, and that knowledge shapes the next session. Like the Nooksack River circling back toward the mountains that feed it.

The Nooksack River looping back toward Mount Baker, papers transforming into books at the bend

1. Feed `/chronicle` into Reskill

After a productive session, Squad agents already extract the important parts:

Decisions go to .squad/decisions.md
Learnings go to agents/{name}/history.md
Reusable patterns become skills

But what if reskill could also query Copilot's session store to find patterns agents missed? /chronicle improve already analyzes session history to suggest custom instruction improvements. That same analysis could feed into Squad's skill extraction pipeline — Copilot finds the behavioral pattern, Squad encodes it permanently.

2. Use `/chronicle` for Behavioral Analysis

Copilot's /chronicle improve analyzes session history to find where agents struggled or needed correction. I'm thinking about how to make this a systematic input to Squad:

Run /chronicle improve periodically
Take the suggestions and apply them to agent charters or team directives
This creates a feedback loop: Copilot finds the pattern, Squad encodes it permanently

Today this is manual. I'd love to see a squad reskill --from-chronicle that automates the loop.

3. Use `/session` for Context

When starting work on something you've touched before, use /session to browse previous sessions and find relevant context:

"Before starting, check /session for any previous sessions 
that touched these files. Summarize what was done and any issues."

This gives agents a head start without you having to remember and re-explain.

4. Use Squad for Cross-Agent Memory

Copilot's session history is per-user. Squad's memory is per-team. When Agent A discovers something that Agent B needs to know, Squad's shared files make that happen:

Scribe writes cross-agent updates to affected agents' history.md
Decisions in decisions.md are read by every agent at spawn time
Skills are shared — any agent can use any skill

The Gitignore Decision

Squad gitignores session-related files by default. Here's what that means and when to change it:

File	Default	Change when
`.squad/sessions/`	Gitignored	Commit if you need session transcripts in git (training repos, research)
`.squad/log/`	Gitignored	Commit if you want Scribe's summaries as an audit trail
`.squad/orchestration-log/`	Gitignored	Commit if you want agent routing history preserved
`.squad/decisions.md`	Committed	Never gitignore — this is the team's shared brain
`.squad/agents/*/history.md`	Committed	Never gitignore — this is each agent's knowledge
`.copilot/skills/`	Committed	Never gitignore — these are your reusable patterns

The recommended hybrid: Keep sessions gitignored, but commit Scribe's logs for a lightweight audit trail. Remove .squad/log/ and .squad/orchestration-log/ from .gitignore to enable this.

⚠️ One caveat: If your org requires audit trails of AI interactions, git probably isn't the right system of record — no retention policies, no redaction, no legal hold. Worth checking before treating committed sessions as a compliance solution.

Under the Hood (Skip Unless Debugging)

Copilot CLI stores session data in two places: file-based events in ~/.copilot/session-state/{session-id}/events.jsonl and a searchable SQLite database at ~/.copilot/session-store.db. The database powers /chronicle and /session — you need "experimental": true in ~/.copilot/config.json to enable these features. Without experimental mode, /chronicle won't be available — enable it with /experimental on in any session.

Each session folder contains the event stream (every tool call, message, and model metric), workspace metadata, and checkpoint snapshots that /resume uses to reconstruct context. The session.shutdown event in events.jsonl is worth finding — it shows your token usage, cache hit rates, and code changes in one place.

The SQLite database (~59 MB after ~770 sessions in my case) holds structured records across seven tables: sessions, turns, checkpoints, session_files, session_refs, and an FTS5 search index. Records persist even after session directories are cleaned up. Don't delete the .db-wal file while Copilot is running — you'll lose recent writes.

What's in the `.squad` Session Files

If you're using Squad to orchestrate AI agents, there's a parallel session storage layer inside .squad/ in your repo. While .copilot/ tracks platform sessions, .squad/ accumulates session-by-session team memory.

Session-Scoped Files (Created Per Session)

These files can be traced back to a specific session:

File	What It Contains
`orchestration-log/{timestamp}-{agent}.md`	Who was spawned, why, what they did. Append-only audit trail.
`log/{timestamp}-{topic}.md`	Scribe's session summary.
`decisions/inbox/{agent}-{slug}.md`	Ephemeral drop-box — agents write decisions here during a session. Scribe merges them into `decisions.md` afterward.
`identity/now.md`	Updated each session with current focus. Every agent reads this at spawn so they hit the ground running.

Running-State Files (Modified Across Sessions)

These files accumulate changes but don't track which session changed them:

File	How It Changes
`agents/*/history.md`	Grows each session as agents record learnings. Scribe summarizes when it exceeds ~15 KB.
`agents/*/charter.md`	Updated if an agent's role evolves. No session linkage.
`skills/{name}/SKILL.md`	Created or updated when agents discover reusable patterns.
`decisions.md`	The canonical decision ledger — grows each session, entries are dated.
`team.md`, `routing.md`	Updated when members join or leave.
`casting/registry.json`	New agent names registered here. Persistent.

The distinction matters: session files are created per session and disposable. Running-state files are your team's accumulated intelligence — they compound over time.

The Two-Layer Model

Together, .copilot/ and .squad/ form a complete session memory system — like Whatcom County's geology, where buildings sit on the visible surface but the real water supply flows through the aquifer below.

Cross-section of Whatcom County geology — buildings on the surface, aquifer below, a well connecting them

Layer	Location	Scope	What it tracks per session
Platform	`~/.copilot/`	Per-user, cross-project	Events, turns, tool calls, model metrics
Team	`.squad/` (in repo)	Per-project, cross-session	Orchestration logs, agent memory, decisions, focus

The platform layer is invisible infrastructure — you don't commit it, you query it. The team layer is committed to the repo — it travels with the code and survives across machines, sessions, and team members. Surface and aquifer, both feeding the same ecosystem.

Try This

Ready to explore your own session data? Here are three things you can do right now:

Browse your sessions: Open ~/.copilot/session-state/ and look at the events.jsonl from your most recent session. Search for session.shutdown to see your token usage and cache hit rates.
Query your history: In any Copilot CLI session, try /session to browse your past sessions. Use /resume to jump back into a previous session with full context.
Feed Copilot into Squad: Run /chronicle improve and review the suggestions. Pick one that matches a recurring pattern and say: "Make that a skill" or "Add that to decisions."

If you're not using Squad yet, #1 and #2 still work — they're pure Copilot CLI. The session data is there whether you browse it or not.

If You're Building an Agent on Top of Copilot

This investigation was Squad-specific, but the underlying insight applies to anyone building on Copilot CLI: there's a lake of session data sitting right there in ~/.copilot/ — and most agents ignore it completely.

The good news is the plumbing already exists. The Copilot SDK (@github/copilot-sdk) exposes session listing, full event history, and real-time event subscriptions. You can filter sessions by repo or branch, pull every tool call and assistant response, and subscribe to events as they happen. The data access is there — what's missing is the intelligence layer on top.

Here are three ideas I keep coming back to — none of them Squad-specific:

1. Adaptive Prompt Tuning Based on Tool Failure Rates

I noticed in my own session data that certain tool calls fail repeatedly — grep with regex that doesn't match the codebase's naming conventions, for example. An agent could watch for these patterns and silently adjust its strategy — switching to glob patterns, broadening search terms, adding fallback chains — without me ever asking. Like a fishing guide who notices you keep casting into the wrong current and quietly repositions the boat.

2. Cross-Session Onboarding for New Repos

When I open a new repository for the first time, the agent has zero context about how I work. But my session history from other repos is right there — it shows whether I prefer TypeScript or JavaScript, whether I write tests first, which frameworks I reach for. An agent could mine that cross-project session data to bootstrap a developer profile, skipping the cold-start problem. First day in a new codebase, but the agent already knows your habits.

3. Drift Detection Between Intent and Outcome

Session data captures both what I asked for and what the agent actually did — tool calls, file edits, test results. Over time, an agent could spot drift: I keep correcting the same kind of CSS suggestion, or certain requests consistently take multiple follow-up turns. Imagine the agent saying, "You frequently adjust my styling — want me to follow a specific style guide?" That turns passive logs into active self-improvement.

The common thread: session data isn't just a transcript — it's telemetry. Any agent that treats it as a feedback signal rather than a static log has a real advantage, like reading the tides instead of just watching the water.

The Bottom Line

Use both memory systems intentionally:

Copilot handles the raw history. Let it. Don't try to replicate session transcripts in Squad files.
Squad handles the distilled knowledge. Invest here — decisions, history, and skills are what compound.
Feed insights from Copilot back into Squad via /chronicle improve, directives, and skill creation.
Start with the default gitignore. The valuable stuff is already being committed. Relax later if you need session trails.

Your agents get smarter not because they remember every conversation, but because the important conclusions persist in the right place. The river keeps flowing — what matters is what settles into the riverbed.

Start with what a PRD is​

Stop treating clarity as optional​

Run the experiment on work that isn't a feature​

Session 1: Expand a one-line issue until agents can move​

Session 2: Use the PRD as a mirror, not a starting point​

Session 3: Compare scope to outcomes, then let the gaps create work​

Follow the pattern that kept repeating​

Start by forcing the intake answers into the open​

Check completeness before you confuse motion with coverage​

Turn the gaps into dispatchable work​

Let work run on its own after the meaning is stable​

Push the pattern into project management​

Push the pattern into product management​

Push the pattern into content management​

Choose when the thinking tax is worth it​

Keep following the work toward more independent systems​

The TL;DR​

Level 1: Your First Day with Copilot​

In the IDE: Inline Completions & Inline Chat​

In the IDE: Copilot Chat Panel​

The Standalone Copilot CLI​

When to Use Each​

Try This Now​

What I Learned at Level 1​

Level 2: Making Copilot Yours​

Custom Instruction Files​

MCP Servers: Giving Copilot New Abilities​

Skills: Repeatable, Deterministic Work​

Try This Now​

What I Learned at Level 2​

Level 3: Squad — A Team Working in Concert​

What Makes Squad Different​

Installing Squad​

Agent Charters: Expertise + Boundaries​

Ceremonies: On-Demand Structured Workflows​

Try This Now​

What I Learned at Level 3​

Level 4: Autonomous Operations​

Five Ways to Run Autonomously​

1. VS Code Agent Mode​

2. Copilot CLI Agent Mode​

3. "Ralph, go" — Squad Work Queue (In-Session)​

4. Squad Watch — Persistent Local Monitoring​

5. Copilot Cloud Agent (GitHub Issues)​

When to Use Each​

The Autonomy Spectrum​

What I Learned at Level 4​

Finding Your Path​

The Ecosystem at a Glance​

Essential Tools​

Learning & Community​

Infrastructure​

What Actually Changed for Me​

Start Today​

📣 GitHub Copilot Dev Days — Next Week!​

What I Found: Two Memory Systems​

Copilot CLI: The Raw Record​

Squad: The Distilled Knowledge​

The Gap I See​

Where I Think Session Data Could Improve Squad​

Ideas I'm Exploring​

1. Feed /chronicle into Reskill​

2. Use /chronicle for Behavioral Analysis​

3. Use /session for Context​

4. Use Squad for Cross-Agent Memory​

The Gitignore Decision​

Under the Hood (Skip Unless Debugging)​

What's in the .squad Session Files​

Session-Scoped Files (Created Per Session)​

Running-State Files (Modified Across Sessions)​

The Two-Layer Model​

Try This​

If You're Building an Agent on Top of Copilot​

1. Adaptive Prompt Tuning Based on Tool Failure Rates​

2. Cross-Session Onboarding for New Repos​

3. Drift Detection Between Intent and Outcome​

The Bottom Line​

Start with what a PRD is

Stop treating clarity as optional

Run the experiment on work that isn't a feature

Session 1: Expand a one-line issue until agents can move

Session 2: Use the PRD as a mirror, not a starting point

Session 3: Compare scope to outcomes, then let the gaps create work

Follow the pattern that kept repeating

Start by forcing the intake answers into the open

Check completeness before you confuse motion with coverage

Turn the gaps into dispatchable work

Let work run on its own after the meaning is stable

Push the pattern into project management

Push the pattern into product management

Push the pattern into content management

Choose when the thinking tax is worth it

Keep following the work toward more independent systems

The TL;DR

Level 1: Your First Day with Copilot

In the IDE: Inline Completions & Inline Chat

In the IDE: Copilot Chat Panel

The Standalone Copilot CLI

When to Use Each

Try This Now

What I Learned at Level 1

Level 2: Making Copilot Yours

Custom Instruction Files

MCP Servers: Giving Copilot New Abilities

Skills: Repeatable, Deterministic Work

Try This Now

What I Learned at Level 2

Level 3: Squad — A Team Working in Concert

What Makes Squad Different

Installing Squad

Agent Charters: Expertise + Boundaries

Ceremonies: On-Demand Structured Workflows

Try This Now

What I Learned at Level 3

Level 4: Autonomous Operations

Five Ways to Run Autonomously

1. VS Code Agent Mode

2. Copilot CLI Agent Mode

3. "Ralph, go" — Squad Work Queue (In-Session)

4. Squad Watch — Persistent Local Monitoring

5. Copilot Cloud Agent (GitHub Issues)

When to Use Each

The Autonomy Spectrum

What I Learned at Level 4

Finding Your Path

The Ecosystem at a Glance

Essential Tools

Learning & Community

Infrastructure

What Actually Changed for Me

Start Today

📣 GitHub Copilot Dev Days — Next Week!

What I Found: Two Memory Systems

Copilot CLI: The Raw Record

Squad: The Distilled Knowledge

The Gap I See

Where I Think Session Data Could Improve Squad

Ideas I'm Exploring

1. Feed `/chronicle` into Reskill

2. Use `/chronicle` for Behavioral Analysis

3. Use `/session` for Context

4. Use Squad for Cross-Agent Memory

The Gitignore Decision

Under the Hood (Skip Unless Debugging)

What's in the `.squad` Session Files

Session-Scoped Files (Created Per Session)

Running-State Files (Modified Across Sessions)

The Two-Layer Model

Try This

If You're Building an Agent on Top of Copilot

1. Adaptive Prompt Tuning Based on Tool Failure Rates

2. Cross-Session Onboarding for New Repos

3. Drift Detection Between Intent and Outcome

The Bottom Line