Same build site from lesson 1.3 — but this time, meet the corner-cutting contractor. You hired them to fix a leaking pipe. Here's how the job goes wrong, four ways:
- They start knocking out drywall before showing you any plan — you don't know what they intend until the wall's already open. (Planless execution.)
- They've got a master key to the whole building — electrical room, the safe — when they needed one bathroom. (Over-permissioned.)
- They work behind a tarp. You get a finished wall but no idea what pipe they used or why. (Hidden reasoning.)
- The lights still turn on, so you assume it's fine — never mind the new pipe is the wrong gauge and bursts next winter. (Blind trust — "CI passed, ship it.")
Six months later the ceiling leaks. Now you need the site logbook: permits, inspection sign-offs, photos, who approved what and when. If it ties every action to a date and a person, you can reconstruct what happened and fix it. If it doesn't — you're tearing open walls and guessing.
That logbook is traceability. Agent work needs the same: a trail, not just a finished diff. And one rule never changes — the contractor did the work, but you are still accountable for the building.
Part 1 · The four anti-patterns (and their fixes)
An agent with capability but no architecture acts too early, makes opaque changes, or skips validation — real risks to code quality, security, and stability. The module names four. Learn each as risk → what it looks like → fix:
| Anti-pattern | What it looks like | Fix (enforcement, not hope) |
|---|---|---|
| Planless execution | PR has a diff but no plan or rationale | Require a plan section via PR template; review before merge |
| Over-permissioned | Token/tools write broadly, access secrets | Least-privilege GITHUB_TOKEN (a temporary key that lets a workflow act on the repo); environment reviewers; restrict triggers |
| Hidden reasoning | Only the diff is visible — no assumptions, scope, decisions | Require a plan; link workflow runs; record decisions in PR comments |
| Blind trust | "CI passed, ship it" | Checks + CODEOWNERS + required reviews + risk-based approvals |
A few more the exam folds in here:
- Mixed planning + execution with no inspectable plan — reviewers see only the final diff and can't validate intent before impact. This is the canonical "architectural anti-pattern" answer.
- Unrestricted SDLC access (SDLC — Software Development Life Cycle: idea → code → test → ship) — treating an agent like a general developer makes it impossible to reason about, limit, or audit. Scope early to reduce blast radius (how much damage a misbehaving agent can do) — limit which directories it can touch.
- Bypassable safety — if an agent can skip required checks or merge without review, that's a workflow design failure, not a model problem.
- Under-specified tasks — vague success criteria let an agent "complete" a task that looks right but misses the goal (bumps a direct dependency but leaves the vuln reachable transitively).
- Self-approval / red-light tasks — don't rely on an agent alone when criteria are unclear, the task is irreversible/production-sensitive, it needs broad secrets, it would approve its own output, or human policy judgment is required.
The single most important rule: guidance vs enforcement
Natural-language instructions ("don't touch production") are guidance — easily ignored. Only technical gates — tool allowlists, least-privilege permissions, required reviews, rulesets/branch protection, hooks — are enforcement. Treat "instructions not to edit" as guidance; treat tool allowlists and gates as enforcement.
Part 2 · Traceability — proving what happened
Supervising an agent well takes more than a final diff — it takes a trail. The point isn't box-ticking compliance; it's operational understanding: when something fails, you need to know what changed, who approved it, what evidence existed, and what happened next.
- Goal — stated intent (issue link or PR description)
- Plan — an inspectable plan (PR section or file)
- Changeset — a bounded set of branch + commits
- Evidence — automated: workflow runs and uploaded artifacts
- Human judgment — review and approval
- Outcome — a clear result: merge, revert, or escalation
Where the evidence lives — GitHub-native artifacts, not the agent's private logs or an external doc:
- PR timeline, commits and branch history
- Workflow runs and job logs; required checks and scan results
- Uploaded artifacts (test reports, logs) — durable; raw log output scrolls away
- Code scanning / secret scanning alerts
- Audit log events (org-level; availability depends on org/enterprise config)
Two practical rules: put an "Evidence" section in the PR linking runs and artifacts; and make every piece of evidence traceable to a specific workflow run and commit — so an audit can answer "what code state produced this?"
An agent's vulnerability fix passes CI but causes a regression (a change that secretly breaks something that used to work) weeks later. A sufficient trail lets you answer: Was there a visible plan and scope? Were the right reviewers requested and did they approve? Did the checks match the risk? Can you reconstruct what happened? If yes, the system made the mistake understandable and preventable — that's the goal.
And the line that ties back to 1.3: agents change who performs the work, never who owns the outcome. Accountability stays with the humans who defined the task, set permissions, chose the controls, and approved the change.
The layered control model (how it all fits)
The study guide's synthesis — verified against the official control-plane list — is a ladder from soft to hard:
| Layer | Role | Strength |
|---|---|---|
| Instructions | guide model behaviour | soft — guidance only |
| Tool lists | limit capabilities | enforces what the agent can do |
| Hooks | intercept behaviour | enforce at tool-call time |
| Workflows | validate behaviour | objective checks |
| Rulesets / branch protection | enforce repo policy | hard gate on merge |
| Audit logs | record accountability | after-the-fact trail |
None of these are on by default — enabling required checks, rulesets, and branch protection is an admin task. The supervision model applies everywhere, but enforcement only happens when controls are turned on. And an agent's real power ≈ what its workflow token and tool credentials can do — so set default GITHUB_TOKEN to read-only and grant more only to the jobs that need it.
The cert-language version
Agent architectures fail through recognizable anti-patterns: planless execution, over-permissioned agents, hidden reasoning, blind trust in automation, mixed planning/execution with no inspectable plan, and unrestricted SDLC access. The fixes are enforcement, not instruction — least-privilege tokens/tools, required reviews, CODEOWNERS, rulesets/branch protection, gates; "tell the agent to be careful" is not a control. Traceability demands a full trail — goal, plan, bounded changeset, automated evidence, human review, outcome — stored in GitHub-native artifacts and traceable to a specific run and commit. Agents change who does the work, never who is accountable.
Our summary · grounded in MS Learn — Foundations of Agentic AI (units 4–5) + Designing Agent Architecture & SDLC Integration (units 2–5, 7) · fetched 2026-05-31
Common confusions (read these or lose points)
- "I told the agent not to touch production, so it's safe." No — that's guidance. Enforcement = tool allowlists, permissions, required reviews, rulesets.
- "CI passed, so the change is good." No — blind trust. Checks only validate what they were built to detect. Add CODEOWNERS + reviews + risk-based approvals.
- "The diff is enough to review." No — a diff with no plan/assumptions is hidden reasoning. You need the trail.
- "Give the agent broad access so it doesn't get blocked." No — over-permissioned. Least privilege; scope directories to shrink blast radius.
- "An agent-generated plan proves it's safe." No — a plan is not validation.
- (Study guide suggests) context drift — the agent's assumptions diverge from reality (stale memory, repo changed mid-task, concurrent edits, missing handoff). Mitigate with durable handoff artifacts (files passed along so the next step/agent has the context — e.g. upload/download a file like
review.mdbetween jobs; this pattern is official-backed) rather than passing context only in chat. Treat theplan.md/review.mdnaming, the "hooks fail-open" claim, and the stalled-Copilot lab steps as study-guide framing, not official wording.
Ticks this lesson done on the home roadmap. Saved in this browser.