agenticlately · GH-600 Study Prep
Home Phase A Lesson 1.4
PHASE A · LESSON 1.4

Plan → Act → Evaluate

Lesson 1.2 said an agent decides its own next step. This is how: a loop. It plans, it acts, it checks the result against real signals — and if it's not done, it goes round again. Get this loop and the rest of Domain 1 clicks into place.

~9 minread 4quiz questions Tier 1source cited
Story

You're back in the car with a driver from lesson 1.2 — that's the agent. This lesson is the sat-nav running in the dashboard.

You type a destination: "the airport." The sat-nav doesn't drive the whole trip blind. It runs a tight loop:

  1. Plan — it lays out a route. Here are the turns, ETA 25 minutes.
  2. Act — you drive the next leg. One segment, not the whole journey at once.
  3. Evaluate — it checks reality against the plan. On the expected road? Traffic? A closure?

Match the plan → keep going. Don't match — accident, road closed — it reroutes: back to plan, drive, check, again. The loop ends only when you've arrived (success) or it says "can't get there — take over" (escalate to a human).

That's an agent on a task: Plan → Act → Evaluate, looping until success or hand-off. The crucial bit — the sat-nav doesn't decide you've arrived because it feels confident; it checks the actual GPS signal. An agent must judge by real signals (tests, scans, reviews), not its own confidence.

The loop, in plain English

An agent doesn't make one decision and stop. It cycles — three phases, repeated:

1. Plan — the agent interprets the goal and works out the steps. In a good system the plan is not a hidden internal thought — it's a structured, reviewable artifact (a PR description, an issue, a checklist) a human can read. A strong plan states three things:

2. Act — the agent does the work in the repository: creates a branch, commits, opens or updates a PR, responds to review. This is deliberately bounded — everything on a branch and through the PR workflow, never a direct push to the default branch. The branch + PR are the guardrails (lesson 1.3).

3. Evaluate — the agent and its human supervisors judge the result using signals from GitHub: workflow runs and status checks (build/test/lint), code review feedback, and security signals (code scanning/SARIF — a standard file format security scanners use to report findings — secret scanning, dependency alerts).

The golden rule of evaluation

Evaluation must be grounded in system signals, not the agent's confidence. "It looks done to me" is not evaluation. "Required checks pass and the vulnerability scan is clean" is.

And the phase people forget: evaluation isn't the end. If checks fail or requirements aren't met, the loop continues — revise the plan, adjust the action, re-evaluate — until the outcome is acceptable or it's handed to a human.

PhaseWhat happensGitHub artifact / signal
Planinterpret goal → scope, success criteria, rollbackIssue, PR description, Agents tab
Actbranch, commits, open/update PR, reviseBranch, commits, pull request
Evaluatejudge by objective signals; loop or escalateWorkflow runs, checks, reviews, security scans
Evaluation can have teeth

When evaluation is made mandatory — rulesets / branch protection requiring checks to pass before merge — the loop's last step becomes an enforceable gate, not a polite suggestion.

Plan vs execution — and when a human validates

A reliable system keeps three things separate: planning (intent), execution (state change), validation (evidence). The cleanest way to say it: "Planning is reviewable intent. Execution changes state." Separation exists so a human can review intent before accepting impact.

That raises one design question — not "is the work reviewed" (it always is) but when is the human's check relative to the code?

Option A · Plan-first PROption B · Plan + execution (one PR)
WhatPlan approved before any codePlan (in description) + code (in commits) together
Human validates…before code existsbefore merge (code already written)
Best forHigh-risk — workflows, infra, auth, production; hard to reverseLow/medium-risk — speed matters, easily reversible
Same GitHub controls?Yes — checks, CODEOWNERS, branch protectionYes
Blueprint analogy

Option A = the builder shows you blueprints and waits for sign-off before lifting a hammer. Option B = they start framing while handing you the sketch — faster, fine for a garden shed, reckless for the foundation. Plan-first for large refactors, security, deployment/workflow, cross-repo, or multi-agent work.

Option B's only extra risk is at the proposal stage — GitHub's merge gates still stop unsafe code from shipping.

The task contract: inputs, outputs, success criteria

Before an agent ever runs, define the task as a contract:

"CI passed" is necessary but not sufficient

Make success reflect the real intent — "vulnerability resolved", not just "tests passed". And define success criteria before you give the agent tools — otherwise it can't know when to stop looping. A workflow can turn a success criterion into a required status check, so the PR can't merge until it passes — success enforced by the system, not assumed by the agent.

Designing for failure (reliability)

Agents will fail — misread tasks, break tests, conflict with existing behaviour. A reliable architecture assumes failure and builds recovery in. Four mechanisms:

The cert-language version

Agentic systems run a plan → act → evaluate lifecycle — a loop, not a single pass — iterating until success criteria are met or the work is escalated. The plan is a structured, reviewable artifact (scope, success criteria, rollback); action is bounded to branches and PRs; evaluation is grounded in system signals (checks, reviews, security scans), not the agent's confidence. Keep planning, execution, and validation separate; choose plan-first for high-risk work. Define the task as inputs / outputs / success criteria, and design for failure with retries, escalation, rollback, and least privilege.

Our summary · grounded in MS Learn — Foundations of Agentic AI (unit 3) + Designing Agent Architecture & SDLC Integration (units 3, 4, 7) · fetched 2026-05-31

Common confusions (read these or lose points)

Ticks this lesson done on the home roadmap. Saved in this browser.

Quiz · Lock it in

0 / 0 answered
Q1 · multiple choice

In a plan → act → evaluate architecture, where should evaluation evidence primarily live?

Answer · C. Evaluation is evidence-based and must be inspectable by humans and other agents — so it lives in GitHub-native artifacts (workflow runs, checks, uploaded artifacts), not private logs, external docs, or the agent's own confidence.
Q2 · multiple choice

An agent is assigned to modify a production deployment workflow. Which approach fits best, and why?

Answer · B. Deployment / workflow / infra / auth changes are high-risk and hard to reverse, so plan-first is preferred: reviewers validate the intent before any code exists, minimising early exposure. C and D drop required human validation entirely.
Q3 · multiple choice

A required check on an agent's PR fails. Following the reliability pattern, what happens?

Answer · D. Bounded retries, then escalation: one retry, and on a second failure of the same required check the agent hands off to a human with what failed, what was tried, the evidence, and a suggested next step. Not infinite retries (C), not merging past a failed gate (A).
Q4 · explain back

An agent opens a PR and says "Done — I've fixed the vulnerability, looks good." In your own words, why is that not evaluation, what would real evaluation look like, and why is "CI passed" still not enough?

Suggested answer

"Looks good" is the agent's confidence, and evaluation must be grounded in system signals, not confidence. Real evaluation = the required status checks pass, the security scan is clean (the vulnerable version is actually replaced), the scope matches intent (no unexpected files changed), and a reviewer approves. And "CI passed" is necessary but not sufficient: tests passing doesn't prove the real goal was met — success criteria must reflect the actual intent ("vulnerability resolved"), and for risky changes a rollback path should be recorded. If anything fails, the loop continues or the agent escalates.


  
Source · MS Learn — Foundations of Agentic AI + Designing Agent Architecture & SDLC Integration · fetched 2026-05-31

Unofficial study material. Not affiliated with, endorsed by, or sponsored by GitHub or Microsoft. “GH-600” and “GitHub” are trademarks of their respective owners, used for identification only.