Pictures · the course in one page

Key mechanisms, visualized

Diagrams of the concepts most often confused on the exam. Read the course weeks for depth; return here when you need a quick mental model.

Week 1 · Agentic architecture

The agentic loop

Agentic loop: model returns tool_use or end_turn Claude language model Tool your function End of turn return to caller stop_reason: tool_use tool_result stop_reason: end_turn
The agentic loop. Each turn, Claude returns a stop_reason. tool_use means "I need you to run this tool and feed the result back into a new turn." end_turn means the task is complete. The orchestration code you write is the loop around this signal — not the model itself.

→ Read more in Week 1: Agentic Architecture and Orchestration.

Week 1 · Multi-agent coordination

Coordinator and subagents

Coordinator with three subagents and explicit context passing Coordinator plans · routes · synthesizes Research agent narrow tools · read-only Writer agent drafts · structured output Verifier agent independent review explicit context structured result
Coordinator and subagents. Subagents do not inherit the coordinator's context. The coordinator must pass exactly what each subagent needs — no more, no less — and receives a structured result back. Over-provisioning a subagent with tools or context leads to role drift; under-provisioning leads to failed handoffs.

→ Read more in Week 1: Agentic Architecture and Orchestration.

Week 1 · Loop control

stop_reason as control surface

stop_reason: four branches and the system response each requires Inspect stop_reason on every response tool_use model wants tools run end_turn turn complete max_tokens output truncated stop_sequence stop string hit Continue loop execute tools, append results, repeat Return answer final text is ready for the user Continue / extend request continuation or raise budget Honor the boundary treat as planned stop, do not retry blindly The "read the assistant's prose" anti-pattern Collapses all four branches into a guess. The loop continues when it should stop, stops when it should continue, and silently mishandles truncation.
stop_reason as control surface. Four distinct API states, four distinct system responses. Driving the loop off explicit state preserves the branches; driving it off assistant text collapses them into one fragile guess.

→ Read more in Week 1, Lecture 1.1: The Agentic Loop.

Week 1 · Decomposition

Prompt chain vs adaptive decomposition

Decomposition strategies: prompt chain vs adaptive tree Prompt chaining fixed sequence, planned upfront Adaptive decomposition tree updates with evidence 1. analyze each file 2. summarize file findings 3. cross-file integration pass 4. final report Predictable steps · best when shape is known map the codebase high-risk modules dependency graph revise plan with findings re-delegate synthesize when stable Plan evolves · best when shape is uncertain
Two decomposition shapes. A chain commits to a sequence in advance and is correct when the workflow is predictable. An adaptive tree lets early findings reshape later steps and is correct when the work is exploratory. Picking the wrong shape is the common Week 1 trap — chains for open-ended problems lock out discovered evidence; adaptive plans for fixed pipelines add coordination overhead with no payoff.

→ Read more in Week 1, Lecture 1.5: Decomposition Strategies.

Week 1 · Review and verification

Independent review — what the reviewer sees

Independent review: what the reviewer sees in same-session vs independent setups Same-session review (reviewer in the generator's context) Independent review (fresh context, artifact only) user: original request generator: reasoning trace generator: justification of choice generator: produced artifact user: "now critically review your work" Reviewer ratifies. Defends the prior reasoning, fixes typos. system: review criteria user: the artifact only no generator reasoning, no draft history, no defense of choices "does this artifact meet the criteria?" Reviewer can disagree. No prior path to defend; opinion is fresh.
What the reviewer's context contains. Same-session review puts the generator's reasoning chain in front of the reviewer before it forms an opinion, which biases the reviewer toward ratifying. Independent review withholds the chain — only the artifact and criteria are present — and that absence is what makes real disagreement possible.

→ Read more in Week 1, Lecture 1.8: Independent Review.

Week 3 · Claude Code configuration

Configuration file map

Configuration file map: user scope vs project scope, by concern User scope ~/ — never travels with the repo Project scope in the repo — version-controlled Memory Commands Skills Settings MCP ~/.claude/CLAUDE.md ~/.claude/commands/ ~/.claude/skills/ ~/.claude/settings.json ~/.claude.json (user MCP entries) CLAUDE.md · .claude/rules/ .claude/commands/ .claude/skills/ .claude/settings.json (shared) .claude/settings.local.json (gitignored personal) .mcp.json (shared MCP servers) Scope is determined by file location, not by what is written inside it.
Configuration file map. Each row is one concern; each column is one scope. "Where should X live?" reduces to picking the right row and the right column. The most common mistake is putting MCP server config inside CLAUDE.md as prose, or treating .local as project-shared.

→ Read more in Week 3, Lecture 3.7: The Configuration File Map.

Week 3 · Execution mode

Plan mode decision

Plan mode decision tree: when planning earns its overhead Is there real planning to do? uncertainty · blast radius · multiple valid approaches YES NO Plan mode explore before editing Direct execution proceed straight to the change EXAMPLES • 40-file migration across packages • new auth design with two valid approaches • refactoring an unfamiliar subsystem • one-line config change with subtle blast radius EXAMPLES • rename a variable across one file • add a known config option • apply a routine refactor pattern • 200-line change with one obvious approach Wrong signal: task size alone. Right signal: is there something to plan?
Plan mode decision. The fork is uncertainty and blast radius, not line count. A one-line change with subtle design implications still wants planning; a 200-line routine refactor does not. The trap on the exam is picking by task size when the rubric is about whether anything is unresolved before the edit.

→ Read more in Week 3, Lecture 3.3: Plan Mode vs Direct Execution.

Week 3 · API patterns

Synchronous vs batch workflows

Synchronous vs batch: workflow shapes and what each costs you Synchronous something is waiting on this answer Batch (Message Batches API) ~50% off, up to 24-hour turnaround caller (CI / chat / service) request Claude (one request → one reply) response caller proceeds latency: seconds cost: full price caller submits N requests submit queue · processed up to 24 h later poll results retrieved in bulk latency: minutes to hours cost: ~50% discount The qualifier is latency tolerance, not volume. Pre-merge gates and chat are sync regardless of how few requests they make.
Synchronous vs batch. Synchronous fits when something downstream is actively waiting — CI gates, user-facing replies, request-response services. Batch fits when 24-hour turnaround is acceptable; the discount is real, but using batch on a blocking workload trades cost savings for stalled pipelines.

→ Read more in Week 3, Lecture 3.6: Synchronous, Asynchronous, and Batch.

Week 2 · Tool design and MCP

MCP lifecycle

MCP lifecycle: client, server, and tool Host (Claude app) MCP server Tool / resource tools/list tool schemas tools/call invoke function return value tool_result resources/* — read-only context, not an action call
MCP lifecycle. The host discovers tools via tools/list, invokes them via tools/call, and the server proxies to the underlying function. Resources are a separate, read-only channel — use them for context the model needs to see, not for actions it needs to take.

→ Read more in Week 2: Tool Design and MCP Integration.

Week 2 · Tool design and MCP

Resource versus tool turn shape

Resource versus Tool: turn shape comparison Catalog as TOOL (costs a round trip) Catalog as RESOURCE (in context at turn start) User asks for a specific service tool: list_services() Returns catalog [a, b, c, …] tool: invoke_service(b) 4 steps · 2 tool calls resource: services catalog [a, b, c, …] (read at turn start) User asks for a specific service tool: invoke_service(b) 2 steps · 1 tool call
Resource vs Tool. Exposing a read-only catalog as a tool forces an exploratory round trip before the real action. The same catalog as a resource sits in context at turn start — the model can go straight to the action. Same information, half the turns.

→ Read more in Week 2, Lecture 2.7: Resources versus Tools.

Week 2 · Tool design and MCP

Context economy — raw vs trimmed tool output

Context economy: raw tool dump versus trimmed output Raw tool dump Trimmed at the source Context window tool output raw rows · timestamps · trace IDs debug fields · unused metadata ~3,800 tokens room for reasoning cramped · ~1,200 tokens ⚠ noisy · costly · degrades long-horizon reasoning Context window tool output 5 fields · ~400 tokens room for reasoning open · ~4,600 tokens long-horizon plans fit multi-turn state fits ✓ signal-dense · cheap · reasoning breathes
Context economy. Tool output consumes the same budget as the system prompt. A raw dump crowds out room the model needs for reasoning, costs real money per turn, and often misleads the model with irrelevant tokens. Trim at the source — the tool is responsible for returning what the model actually needs.

→ Read more in Week 2, Lecture 2.9: Context Economy.

Week 2 · Tool design and MCP

Error taxonomy and recoveries

Error taxonomy: four categories and their correct recoveries Tool failed what kind? Transient timeout, 5xx, rate-limit Validation bad input, schema mismatch Permission unauthorized, forbidden Business policy, not-found, conflict Retry with backoff, same inputs Ask / revise clarify, correct, resubmit with fix Escalate to a principal with scope Explain surface the policy, don't retry blindly The "Operation failed" anti-pattern Collapses all four branches into one message. The coordinator has no basis to pick a recovery, so it either picks wrong or gives up.
Error taxonomy. Each branch has a different correct recovery. Structured errors preserve the branch so the coordinator can choose; a single generic "failed" message destroys that information.

→ Read more in Week 2, Lecture 2.3: Structured Error Reporting.

Week 4 · Structured output and validation

Validation layers — syntax, schema, semantic

Validation layers: syntax, schema, and semantic — what each catches and what each fix is SYNTAX parser-level — is this valid JSON? examples: malformed brackets, bad quotes, trailing commas recovery: engineering fix (prompt or parser) retry: never — the model will keep emitting the same shape SCHEMA shape-level — do the fields match? examples: missing required field, wrong type, extra property recovery: engineering fix (tool definition or prompt) retry: rarely — usually a definition issue, not a model error SEMANTIC logic-level — do the values make sense? examples: line items don't sum, impossible date, currency mismatch recovery: retry with feedback if source has truth, else escalate retry: yes, when source contains the answer tool_use with a JSON schema collapses syntax + schema into one API-enforced check. Semantic validation still has to be written in application code.
Validation layers. Three layers, three distinct fixes. Reporting "validation failed" without naming the layer forces the recovery code to either retry blindly or give up. Naming the layer lets the system pick the right response: engineering for the top two, retry-or-escalate for the bottom one.

→ Read more in Week 4, Lecture 4.6: Validation Layers.

Week 4 · Retry and recovery

Retry decision tree

Retry decision tree: when retry helps and when it doesn't Validation failed. Which layer? SYNTAX SCHEMA SEMANTIC Engineering fix prompt or parser bug; do not retry Engineering fix tool def or prompt; use tool_use + schema Is the truth in the source? YES NO Retry with feedback include source, failed output, and the specific check Escalate no retry will conjure absent information Wrong default: retry every failure. Right default: branch on the layer first.
Retry decision. Two layers (syntax, schema) are engineering bugs that retry cannot fix. The third layer (semantic) branches on whether the source has the answer — if yes, retry with explicit feedback; if no, escalate. Retrying everything wastes API calls on errors that no retry can resolve.

→ Read more in Week 4, Lecture 4.4: Retry With Error Feedback.

Week 4 · Review architecture

Multi-pass review — single vs split passes

Single-pass review vs multi-pass review: where attention concentrates Single pass 14 files in one prompt Multi-pass local per-file, then integration 14 files in one context window attention spreads thin and unevenly Uneven depth deep on a few files, shallow on the rest pass 1 — local, per file each file gets full attention in its own pass pass 2 — cross-file integration one pass over file-level findings Even depth consistent local + dedicated integration
Multi-pass review. A single 14-file pass spreads attention unevenly — some files are analyzed deeply, others skimmed. Splitting into a per-file local pass and a separate cross-file integration pass concentrates attention on one concern at a time, which is what the exam's "uneven depth" question is about.

→ Read more in Week 4, Lecture 4.5: Message Batches API and Multi-Pass Review.

Week 5 · Escalation

Resolve, clarify, or escalate

Resolve, clarify, or escalate: triggers and the right action for each Incoming request — what's the trigger? classify before acting RESOLVE policy is clear, identity is set CLARIFY ambiguous identity or input ESCALATE trigger condition met TRIGGERS • request is in-policy • identity verified • tools succeed • no contested data TRIGGERS • multiple customer matches • missing required field • ambiguous order reference → ask one question, don't guess TRIGGERS • explicit human request • policy gap or ambiguity • threshold breach • identity unresolved Wrong signal: escalate on user sentiment. Right signal: classify by trigger; calm users may need escalation, frustrated users often don't.
Resolve, clarify, or escalate. Three branches with explicit triggers each. The exam's recurring trap is treating user sentiment as the routing signal — the actual triggers are structural (explicit human request, policy gap, ambiguous identity), not emotional.

→ Read more in Week 5, Lecture 5.2: Escalation and Ambiguity.

Week 5 · Provenance

Provenance flow through synthesis

Provenance flow: flattened synthesis vs structured claim/source/date pipeline Flattened synthesis attribution lost in compression Structured provenance claim · source · date carried through 3 sources, claims + dates + URLs source A (2024) · source B (2022) · source C (2024) summarize as prose "the consensus seems to be X" (no attribution preserved) Final report "X is true." no source · no date · contested? unknown · unverifiable 3 sources, claims + dates + URLs source A (2024) · source B (2022) · source C (2024) structured handoff {claim, source, excerpt, date} one record per claim · attribution intact Final report "X (A 2024, C 2024); B (2022) said Y" established + contested · with dates consumer can evaluate the conflict
Provenance flow. Flattening claims into prose discards the very fields that let consumers evaluate the report — who said this, when, and whether anyone disagreed. Structured {claim, source, excerpt, date} records carry the attribution through synthesis intact, so contested findings stay visible instead of being silently resolved.

→ Read more in Week 5, Lecture 5.5: Provenance and Uncertainty.

Week 5 · Calibration

Aggregate vs stratified accuracy

Aggregate vs stratified accuracy: where the weak segment hides Aggregate accuracy one number, looks shippable Stratified accuracy per-segment, the weak slice surfaces 97% all documents Looks shippable "97% accurate" where's the failure mode? invisible until production Invoices (60% of volume) 99% Receipts (30%) 98% Foreign-currency invoices (7%) 95% Handwritten receipts (3%) 70% by segment Failure mode visible handwritten segment is broken route those to human review ship the rest
Aggregate vs stratified accuracy. Same system, same total. The aggregate number hides the broken segment; the stratified breakdown surfaces it. The exam's recurring trap is treating headline accuracy as the readiness signal — calibration and stratified review are how teams avoid shipping a system that looks good until it hits the use case it was always going to fail.

→ Read more in Week 5, Lecture 5.4: Human Review and Confidence Calibration.

Week 5 · Long-session context

Large codebase exploration — centralized vs distributed context

Large codebase exploration: everything-in-main-agent vs distributed context Everything in main agent context saturates · answers degrade Distributed context main coordinates · subagent explores MAIN AGENT doing coordination + discovery ... tool results accumulate Late-session answer "typical patterns in this kind of codebase…" specific knowledge blurred away MAIN AGENT coordinator · no tool results delegate EXPLORE SUBAGENT isolated context · verbose discovery summary back scratchpad.md within-session findings manifest.json crash-recovery state Late-session answer "class OrderRepo in src/orders/repo.ts…" specific knowledge preserved State lives outside the main conversation: subagent for discovery, scratchpad within-run, manifest across runs.
Large codebase context management. When the main agent does all the discovery itself, tool results accumulate in context and late-session answers degrade into vague pattern- matching. Delegating discovery to an Explore subagent, persisting working findings in a scratchpad, and exporting state to a manifest keeps the main conversation clean and the specific knowledge retrievable even after four hours of exploration.

→ Read more in Week 5, Lecture 5.6: Large Codebase Context Management.

Week 6 · Exam method

The exam method — failure → layer → fix

The exam method: failure → layer → fix, with explicit distractor rejection The method diagnose, then choose Wrong default pick by plausibility 1 What specifically failed? 2 Which layer owns it? 3 Smallest direct fix? 4 Reject distractors explicitly distractors solve the wrong layer 1 Read the scenario fast 2 Pick the most "correct-sounding" option 3 Move on 4 Walk into the distractor trap plausible answers are usually wrong layer
The exam method. Diagnose the failure, identify which layer owns it, choose the smallest direct fix, then explicitly reject the distractors by naming what they solve instead. The wrong-default process — pick the answer that sounds right — is exactly how the exam's distractors are built to catch you.

→ Read more in Week 6, Lecture 6.2: How To Read Scenario Questions.

Week 6 · Trap patterns

Trap pattern matrix

Trap pattern matrix: what each generic-sounding wrong answer solves and what it misses THE TRAP WHAT IT SOLVES WHAT IT MISSES "Add more prompting" routine drift deterministic enforcement "Larger context window" token-budget overflows attention quality, decomposition "Add more examples" format ambiguity missing structural control "Tighten the schema" shape errors semantic correctness "Drop partial results, rerun" simple error handling partial work is real evidence "Escalate on sentiment" user emotional state explicit triggers (request, policy) "Compress provenance" cleaner narrative attribution and contested findings "Same-session self-review" looks like verification reviewer ratifies prior reasoning
Trap pattern matrix. Eight generic-sounding wrong answers shown next to what they actually solve and what failure mode they miss. Every trap has the same shape — solves something nearby, misses the layer the scenario actually pointed at. Reading the question for what specifically failed eliminates them in seconds.

→ Read more in Week 6, Lecture 6.3: Common Trap Patterns.

Week 6 · Distinctions

Cross-week distinctions

Cross-week distinctions: paired contrasts the exam uses to build distractors DISTRACTOR INSTINCT STRUCTURAL ANSWER WEEK 1 — agentic prompt guidance deterministic enforcement assumed context inheritance explicit context passing same-session self-review independent reviewer (fresh context) WEEK 2 — tools / MCP over-provisioned tool access role-scoped tool restrictions "Operation failed" envelope typed error (transient/validation/...) WEEK 3 — Claude Code user-scope team standards project-scope (versioned) batch for blocking gates sync; latency tolerance decides WEEK 4 — prompts / structured output free-form JSON in prose tool_use with JSON schema tighter schema for math errors semantic validation in app code retry every failure retry semantic w/ source; engineer the rest WEEK 5 — context / reliability prose summary of exact values structured facts block sentiment-based escalation explicit triggers (request, policy gap) aggregate accuracy alone stratified by segment + calibration
Cross-week distinctions. The left column is the instinct the exam's distractors are built to exploit; the right column is the structural answer the course teaches. Naming the distinction quickly is what eliminates the wrong answer in seconds — that's the skill Week 6 is trying to build.

→ Read more in Week 6, Lecture 6.1: The Core Exam Distinctions.