Diagrams of the concepts most often confused on the exam. Read the
course weeks for depth; return here when you need a quick mental
model.
Week 1 · Agentic architecture
The agentic loop
The agentic loop. Each turn, Claude returns a
stop_reason. tool_use means "I need you to run
this tool and feed the result back into a new turn." end_turn
means the task is complete. The orchestration code you write is the loop
around this signal — not the model itself.
Coordinator and subagents. Subagents do not
inherit the coordinator's context. The coordinator must pass exactly what
each subagent needs — no more, no less — and receives a structured
result back. Over-provisioning a subagent with tools or context leads to
role drift; under-provisioning leads to failed handoffs.
stop_reason as control surface. Four distinct API
states, four distinct system responses. Driving the loop off explicit
state preserves the branches; driving it off assistant text collapses
them into one fragile guess.
Two decomposition shapes. A chain commits to a
sequence in advance and is correct when the workflow is predictable.
An adaptive tree lets early findings reshape later steps and is
correct when the work is exploratory. Picking the wrong shape is the
common Week 1 trap — chains for open-ended problems lock out
discovered evidence; adaptive plans for fixed pipelines add
coordination overhead with no payoff.
What the reviewer's context contains. Same-session
review puts the generator's reasoning chain in front of the reviewer
before it forms an opinion, which biases the reviewer toward
ratifying. Independent review withholds the chain — only the artifact
and criteria are present — and that absence is what makes real
disagreement possible.
Configuration file map. Each row is one concern;
each column is one scope. "Where should X live?" reduces to picking
the right row and the right column. The most common mistake is
putting MCP server config inside CLAUDE.md as prose,
or treating .local as project-shared.
Plan mode decision. The fork is uncertainty and
blast radius, not line count. A one-line change with subtle design
implications still wants planning; a 200-line routine refactor does
not. The trap on the exam is picking by task size when the rubric
is about whether anything is unresolved before the edit.
Synchronous vs batch. Synchronous fits when
something downstream is actively waiting — CI gates, user-facing
replies, request-response services. Batch fits when 24-hour
turnaround is acceptable; the discount is real, but using batch on
a blocking workload trades cost savings for stalled pipelines.
MCP lifecycle. The host discovers tools via
tools/list, invokes them via tools/call, and the
server proxies to the underlying function. Resources are a
separate, read-only channel — use them for context the model needs to
see, not for actions it needs to take.
Resource vs Tool. Exposing a read-only catalog as a
tool forces an exploratory round trip before the real action. The same
catalog as a resource sits in context at turn start — the
model can go straight to the action. Same information, half the turns.
Context economy. Tool output consumes the same budget
as the system prompt. A raw dump crowds out room the model needs for
reasoning, costs real money per turn, and often misleads the model
with irrelevant tokens. Trim at the source — the tool is responsible
for returning what the model actually needs.
Error taxonomy. Each branch has a different correct
recovery. Structured errors preserve the branch so the coordinator can
choose; a single generic "failed" message destroys that information.
Validation layers. Three layers, three distinct
fixes. Reporting "validation failed" without naming the layer
forces the recovery code to either retry blindly or give up. Naming
the layer lets the system pick the right response: engineering for
the top two, retry-or-escalate for the bottom one.
Retry decision. Two layers (syntax, schema) are
engineering bugs that retry cannot fix. The third layer (semantic)
branches on whether the source has the answer — if yes, retry with
explicit feedback; if no, escalate. Retrying everything wastes API
calls on errors that no retry can resolve.
Multi-pass review. A single 14-file pass spreads
attention unevenly — some files are analyzed deeply, others
skimmed. Splitting into a per-file local pass and a separate
cross-file integration pass concentrates attention on one concern
at a time, which is what the exam's "uneven depth" question is
about.
Resolve, clarify, or escalate. Three branches with
explicit triggers each. The exam's recurring trap is treating
user sentiment as the routing signal — the actual triggers are
structural (explicit human request, policy gap, ambiguous identity),
not emotional.
Provenance flow. Flattening claims into prose
discards the very fields that let consumers evaluate the report —
who said this, when, and whether anyone disagreed. Structured
{claim, source, excerpt, date} records carry the attribution
through synthesis intact, so contested findings stay visible
instead of being silently resolved.
Aggregate vs stratified accuracy. Same system,
same total. The aggregate number hides the broken segment; the
stratified breakdown surfaces it. The exam's recurring trap is
treating headline accuracy as the readiness signal — calibration
and stratified review are how teams avoid shipping a system that
looks good until it hits the use case it was always going to fail.
Large codebase exploration — centralized vs distributed context
Large codebase context management. When the main
agent does all the discovery itself, tool results accumulate in
context and late-session answers degrade into vague pattern-
matching. Delegating discovery to an Explore subagent, persisting
working findings in a scratchpad, and exporting state to a
manifest keeps the main conversation clean and the specific
knowledge retrievable even after four hours of exploration.
The exam method. Diagnose the failure, identify
which layer owns it, choose the smallest direct fix, then explicitly
reject the distractors by naming what they solve instead. The
wrong-default process — pick the answer that sounds right — is
exactly how the exam's distractors are built to catch you.
Trap pattern matrix. Eight generic-sounding wrong
answers shown next to what they actually solve and what failure
mode they miss. Every trap has the same shape — solves something
nearby, misses the layer the scenario actually pointed at. Reading
the question for what specifically failed eliminates them in
seconds.
Cross-week distinctions. The left column is the
instinct the exam's distractors are built to exploit; the right
column is the structural answer the course teaches. Naming the
distinction quickly is what eliminates the wrong answer in
seconds — that's the skill Week 6 is trying to build.