Before the robot builder touches anything, the owner pins a blueprint brief note on the wall: "Kitchen must fit a dishwasher. Keep the load-bearing beam. Finish under budget."
It's written down, it lives on site, and the robot — plus every inspector after it — works against that note, not against something said out loud last Tuesday that nobody can check. That note is PLAN.md. Now the real mechanics.
A task you can trust an agent with is a structured task
MS Learn is blunt about this: "When tasks are under-specified, agents can produce changes that look plausible but don't actually solve the underlying problem." The fix is to define every agent task as three things:
- Inputs — what the agent needs: the issue context, plus the constraints and boundaries (which folders it may touch, what it must never do).
- Outputs — what it must produce: a plan, a PR, and evidence (links to workflow runs).
- Success criteria — how the result is judged: checks pass, scans clean, the real problem actually resolved, scope matches intent.
A structured task (or task contract) is a task written as inputs + outputs + success criteria, so "done" is defined before the agent starts — not argued about after.
What a real task contract looks like
MS Learn's worked example — remediating a vulnerability — spells out all three parts:
- Inputs: a security alert / issue link; repo scope ("changes allowed under
src/and dependency files, but notinfra/unless explicitly requested"); constraints ("no workflow changes without platform review; no secrets introduced; no direct-to-main pushes"). - Outputs: a pull request containing "a structured plan (in PR description or
Github/pull_request_template.md)", a bounded changeset (commits on an agent branch), and evidence links to the workflow run. - Success criteria: required checks pass; the security signal is resolved; scope matches intent (no unexpected files changed); a rollback/escalation path is recorded for higher-risk changes.
A well-formed agent task names three things up front: its inputs (what the agent needs to start), its outputs (what it must produce — typically a plan, a PR, and evidence), and its success criteria (how the result gets judged — checks, scans, review outcomes). Skip any one and you can't tell whether the agent actually finished.
Our summary · grounded in MS Learn — Designing Agent Architecture & SDLC Integration, unit 3 · fetched 2026-05-30
Where the intent lives — and why it's a file
The intent doesn't float in your head. It's an artifact (same word as A0.1 — a recorded, openable file you can inspect later). It lives in writing, at three scopes:
- Issue — the per-job intent ("fix this one vulnerability"). → A0.1
PLAN.md— the canonical "what good looks like" for the whole build: scope, constraints, and the definition of done.- The agent's structured plan — rides in the PR description (or a
.github/pull_request_template.mdthat forces it). It's also an output the agent must produce.
PLAN.md is just the most durable form of intent: a Markdown file, version-controlled, that you, a reviewer, or the next agent can open and diff. Intent becomes a file, not a feeling.
This repo dogfoods exactly that: its own PLAN.md opens by stating what "done" means (exam, pass mark, scope) and cites its source. When you — or a Copilot agent assigned here — work an issue, you work against that file.
"CI passed" is necessary — not sufficient
If success criteria are vague, an agent can "complete the task" in a way that looks correct but misses the goal. MS Learn's example: it bumps a dependency, but the vulnerable version is still reachable through a transitive dependency — green checks, problem unsolved.
A green pipeline is necessary but not always sufficient. Write success criteria that name the real outcome — “vulnerability resolved” rather than just “tests passed” — so an agent can't call a job done while the actual problem survives.
Our summary · grounded in MS Learn — Designing Agent Architecture & SDLC Integration, unit 3 · fetched 2026-05-30
Prompt vs file — don't confuse them
The whole move of A0.2 is turning evaporating guidance into a stored artifact:
- Verbal prompt — lives in a chat window, gone tomorrow, can't be reviewed or diffed, doesn't survive the agent.
- Written intent (Issue /
PLAN.md) — stored forever, diffable, and it is the record of what "done" meant — so GitHub can store it (system of record) and gate against it (control plane).
And don't blur the two written forms: the Issue = one specific job (the work-order ticket); PLAN.md = the whole build's canonical brief (the blueprint note).
Common confusions (read these or get them wrong)
- "A structured task just means a clear prompt." No — it's specifically inputs + outputs + success criteria, written down before work starts.
- "If the tests pass, the agent did its job." Not necessarily — "CI passed" is necessary but not sufficient. Success criteria must name the real outcome.
- "The plan is just chatter in the PR." The plan is a required output and a stored artifact — it lives in the PR description or the pull-request template, where it's inspectable.
First — what's an "anti-pattern"?
You'll meet this word on the exam, so pin it down now. A pattern is a known good way of doing something — the move people keep recommending. An anti-pattern is the opposite: a way that looks reasonable, or feels easier in the moment, but is a known bad idea that bites you later. It's not a one-off slip — it's a recurring trap, common enough that it has earned a name.
Anti-pattern = the wrong move people keep making. (A pattern = the right move people keep recommending.)
For A0.2, the anti-pattern is simply the opposite of an inspectable plan: an agent that mixes planning and doing all at once, with no written plan anyone can open and review. You can't tell what it intended or check its reasoning — which kills the visibility A0.1 was built on. So when the exam shows you four options and three are sound design, the anti-pattern is the one that breaks the rule you just learned.
The exam doesn't ask you to recite the triad — it tests it as design judgment. Expect items where clear success criteria and required checks are the good design, and mixing planning and execution with no inspectable plan is the anti-pattern. It's the same idea as the definition of done a reviewer checks against — scope, checks, review, and policy. Same payload, scenario wording. (The enforcement move — making a plan a required status check — is lesson 1.4 / 1.5.)
Phase A's hands-on build is to give gh-600-prep a real PLAN.md — its scope, the folders an agent may touch, and our definition of "done" — so a Copilot agent (or you) assigned an issue here works against a written brief, not a guess. We turn this exact lesson into a file in our own repo: intent becomes an artifact you can open and diff.
Ticks this lesson done on the home roadmap. Saved in this browser.