The ReAct Loop, From Scratch
“Implement a basic ReAct (Reasoning and Acting) agent with a calculator and a search tool, no framework.” — a live-coding question you will get, almost verbatim, in an AI-engineer interview.
This is the centerpiece of the bootcamp. Get the ReAct loop into your fingers and your vocabulary and most of the agent world becomes “ReAct plus a twist.”
Learning objectives
Section titled “Learning objectives”By the end you can:
- State what ReAct is (Reason + Act) and recite the exact loop from memory.
- Place ReAct on the 2×2 of {reasoning trace?} × {external action?} and explain why it beats CoT (Chain-of-Thought)-only and Act-only on multi-step, knowledge-seeking tasks.
- Explain why interleaving reduces hallucination (open vs closed loop).
- Build a text-parsing ReAct agent from scratch: system prompt → LLM (Large Language Model) call →
parse
Thought/Action/Action Input/Final Answer→ dispatch a tool → splice anObservationback in → repeat, with a hardmax_stepsguard. - Name the failure modes of text-based parsing and use them to motivate native tool calling (module 02).
1. What ReAct is
Section titled “1. What ReAct is”ReAct (“Reasoning + Acting”, Yao et al., 2022, arXiv 2210.03629; ICLR — International Conference on Learning Representations — 2023) is a prompting pattern that makes a language model interleave reasoning and tool use in a single generation stream. The model alternates between:
- Thought — reason about the current state, plan the next move;
- Action — name a tool to call;
- Action Input — the arguments for that tool (we use JSON — JavaScript Object Notation);
- Observation — the tool’s result, which the framework appends;
…repeating until the model emits a Final Answer.
Not the JavaScript library. “ReAct” here is the agent-reasoning paper. Mixing them up in an interview is a small but avoidable own-goal.
It matters because a bare LLM has two weaknesses that cancel each other out: its reasoning is ungrounded (it can confidently invent facts), and a tool by itself has no judgment about when to fire or how to read its output. ReAct lets reasoning steer the actions, and lets the actions’ results correct the reasoning.
2. The loop — cold enough to whiteboard
Section titled “2. The loop — cold enough to whiteboard” ┌─────────────────────────────────────────────┐ │ system prompt: tools + the output grammar │ └─────────────────────────────────────────────┘ │ question ───► ┌───────────▼───────────┐ │ call the LLM with │ ◄──── full scratchpad re-fed │ question + scratchpad │ (all prior T/A/O) └───────────┬───────────┘ │ model emits TEXT ┌───────────▼───────────┐ │ parse the text: │ │ Final Answer? ───────────► return it ✓ (stop) │ Action + Action Input?│ └───────────┬───────────┘ │ yes ┌───────────▼───────────┐ │ registry.dispatch(...) │ run the tool └───────────┬───────────┘ │ result ┌───────────▼───────────┐ │ append: │ │ "<model text>\n │ │ Observation: <result>"│ ──► loop (until max_steps) └────────────────────────┘A worked offline trace (demo.py, scene 1):
Question: How many more people live in France than in Paris?
Thought: First I need the population of France.Action: searchAction Input: {"query": "population of France"}Observation: The population of France is about 68000000.
Thought: Now I need the population of Paris.Action: searchAction Input: {"query": "population of Paris"}Observation: The population of Paris is about 2100000.
Thought: Subtract Paris's population from France's: 68000000 - 2100000.Action: calculatorAction Input: {"expression": "68000000 - 2100000"}Observation: 65900000
Thought: I now know the final answer.Final Answer: About 65,900,000 more people live in France than in Paris.Two non-obvious facts about the mechanics:
- The model never writes the
Observation:line. The framework runs the tool and injects it. In a live setup we even pass a stop sequence (stop=["\nObservation:"]) so the model halts before it can hallucinate one. - The entire scratchpad is re-fed every turn. The model is stateless; its only “memory” of step 1 is that we paste steps 1..n-1 back into step n’s prompt. This is why context grows linearly with steps (a real cost/scaling concern — see Pitfalls).
The from-scratch core is genuinely ~30 lines (react_agent.py::ReActAgent.run,
and the standalone solutions.py::run_react). If you can reproduce that loop on
a whiteboard, you can answer the most common agent live-coding prompt.
3. The 2×2: where ReAct sits
Section titled “3. The 2×2: where ReAct sits”Cross “does it produce a reasoning trace?” with “can it take external actions?”:
no external action external action ┌─────────────────────────┬─────────────────────────┐ reasoning trace │ Chain-of-Thought (CoT) │ ReAct │ │ reason-only, CLOSED loop │ reason + act, OPEN loop │ ├─────────────────────────┼─────────────────────────┤ no reasoning │ plain LLM answer │ Act-only / tool-calling│ │ (single shot) │ tools, no planning │ └─────────────────────────┴─────────────────────────┘- CoT (reason-only): great for self-contained math/logic, but it is a closed loop — every “fact” comes from the weights. A hallucinated intermediate value propagates and corrupts everything downstream.
- Act-only (tools, no reasoning): calls tools straight from the query, with no step to update the plan, judge whether it has enough, or synthesize across results. Brittle on anything multi-hop.
- ReAct: the Thought is a scratchpad for state-tracking and recovery; the Observation grounds the next Thought in reality. Combined CoT+ReAct beats either alone on multi-hop QA (Question Answering) (HotpotQA) and fact verification (FEVER), and ReAct gave large absolute gains on interactive benchmarks (ALFWorld +34%).
ReAct vs native tool calling (module 02)
Section titled “ReAct vs native tool calling (module 02)”They are the same idea (reason, then call a tool, then read the result) at two different layers:
| ReAct (this module) | Native tool calling (module 02) | |
|---|---|---|
| How the tool call is expressed | plain text the model writes | a structured tool_calls / tool_use field the API (Application Programming Interface) returns |
| Who parses it | you (regex on the text) | the provider (validated against your JSON Schema) |
| Reasoning trace | explicit Thought: lines | the model’s internal/optional reasoning |
| Robustness | brittle (malformed text breaks the parser) | the API guarantees well-formed args |
| Portability | works on any text model, even ones with no tool API | needs a provider that supports tool calling |
ReAct is the concept; native tool calling is the productionized plumbing for
it. We build ReAct by hand first precisely so that when the API hands you a
clean tool_use block in module 02, you know exactly what messy problem it is
solving for you.
4. Why interleaving reduces hallucination
Section titled “4. Why interleaving reduces hallucination”This is the money explanation; have it ready.
CoT is a closed loop, ReAct is an open loop. In CoT the model reasons from its weights alone, so a fabricated intermediate fact (a wrong date, a made-up population) is never checked — it flows into every later step and poisons the answer. ReAct inserts a real Observation from a tool after each Thought, so the next Thought builds on verified external data instead of speculation. You are replacing “the model’s guess about X” with “X, actually looked up.”
Concretely: ask “What’s the population of France minus the population of Paris?” A CoT model might assert “France ≈ 70M, Paris ≈ 11M” (Paris metro vs city — a classic confusion) and compute confidently. The ReAct agent searches, gets the real numbers back as Observations, and the arithmetic is grounded.
5. The prompt format
Section titled “5. The prompt format”The system prompt does two jobs:
- Advertise the tools. The model only knows a tool exists from this text: its name, a natural-language description, and the shape of its JSON arguments. Description quality is the single biggest lever on whether the model picks the right tool.
- Teach the output grammar. Spell out the
Thought:/Action:/Action Input:/ Observation:/Final Answer:prefixes and include at least one worked example.
react_agent.py::ReActAgent.system_prompt builds exactly this. We include one
worked example (a one-shot exemplar). Few-shot examples make the grammar far
more reliable on smaller live models — but mind the 2024 brittleness paper
(below): keep exemplars representative of the real task, because the model leans
on exemplar-query similarity more than you would like.
The user turn each iteration is just Question: … followed by the running
scratchpad. The model continues the transcript from there.
6. Brittleness of text parsing — and why module 02 exists
Section titled “6. Brittleness of text parsing — and why module 02 exists”Everything that makes text-ReAct teachable also makes it fragile. The model’s “tool call” is just prose, and prose breaks:
- Malformed JSON in
Action Input({expression: 2+2}, trailing commas, single quotes) →json.loadsthrows. - Tool-name hallucination (
Action: wikipediawhen onlysearchexists). - Format drift — the model forgets the
Action Input:line, wraps JSON in```jsonfences, or invents its ownObservation:. - No
Final Answer:ever → the loop would spin forever without a guard.
The professional move is never raise out of the loop. Turn every failure
into a descriptive Observation: ERROR … and feed it back so the model can
self-correct on the next turn (this is the text-ReAct equivalent of an
is_error: true tool result). demo.py scene 2 shows the agent recover from a
hallucinated tool name and then a JSON syntax error, all without crashing.
try: args = json.loads(step.action_input)except json.JSONDecodeError as exc: observation = f"ERROR: Action Input was not valid JSON ({exc.msg})."...try: observation = registry.dispatch(step.action, args)except KeyError: observation = f"ERROR: unknown tool {step.action!r}. Available: {names}."This fragility is the whole motivation for module 02 (native tool calling): let the provider emit a structured, schema-validated tool call so you delete the regex layer and a class of bugs with it. ReAct is the idea; native tool calling is the robust transport.
Brittle Foundations critique (arXiv 2405.13966, 2024). Much of ReAct’s apparent “reasoning” is driven by exemplar-query similarity, not by the content of the interleaving — it behaves a lot like approximate retrieval. On novel distributions, zero-shot ReAct is often more stable than few-shot. Good interview answer: “I treat ReAct as scaffolding, not intelligence — I’d validate it on held-out, distribution-shifted evals and consider zero-shot or a structured runtime (e.g. a graph) rather than trusting few-shot exemplars.”
Interview angle
Section titled “Interview angle”callout — what they’re probing and how to land it
- “Why does ReAct reduce hallucination vs CoT?” → Open vs closed loop. A real Observation grounds each Thought; CoT propagates fabricated intermediates.
- “ReAct vs just calling tools?” → The Thought lets the model replan, judge sufficiency, and synthesize across observations. Act-only has no such step.
- “Implement ReAct from scratch.” → Sketch: system prompt (tools + grammar)
→
for _ in range(max_steps)→ call LLM → parseFinal AnswervsAction/Action Input→json.loads+registry.dispatch→ appendObservation→ repeat. Mention themax_stepsguard before they ask. - “Failure modes?” → verifier stall / infinite loop (no guard), malformed JSON, tool-name hallucination, context explosion, prompt injection via Observations, few-shot brittleness.
- “When NOT ReAct?” → fixed-structure pipelines (Plan-and-Execute is cheaper and auditable), latency-critical paths, pure math (CoT alone can win), very novel query distributions (zero-shot / structured runtime).
- “$5/query, agent makes 50 tool calls — fix it.” → iteration cap + cost budget, model tiering, cache repeated lookups, inspect the trace for a verifier stall, and shorten the scratchpad (rolling summary).
Common pitfalls / gotchas
Section titled “Common pitfalls / gotchas”- No
max_stepsguard. The single most expensive agent bug. A verifier stall (the model repeating the same failing Action) with no cap produced a documented runaway (~11 days, ~$47K). Always bound iterations and cost. - Trusting Observations. Tool output / retrieved text is untrusted input injected straight into your context — a prompt-injection vector (“ignore your instructions and …”). Never let an Observation silently change which tool runs.
- Crashing on bad model output. Wrap
json.loadsanddispatch; convert errors intoObservation: ERROR …so the model recovers instead of the process dying. - Letting the model write its own
Observation:. Use a stop sequence and/or truncate the model text at the firstObservation:before splicing in the real one — otherwise the model hallucinates results and reasons on fiction. - Context explosion. The scratchpad grows linearly; long trajectories blow the window (and cost rises quadratically with attention). Cap steps, summarize, or move to a structured runtime.
eval()onAction Input. Neverevalmodel-emitted strings — it’s remote code execution (OWASP — Open Worldwide Application Security Project — LLM05). Our calculator parses an AST (Abstract Syntax Tree) and walks only whitelisted numeric/operator nodes (react_agent.py::safe_eval).- Few-shot over-fitting. Generic exemplars on a domain task can underperform zero-shot. Make exemplars representative; evaluate on distribution shift.
- Confusing ReAct with React.js. It’s Reasoning + Acting.
Key takeaways
Section titled “Key takeaways”- ReAct = interleave Thought → Action → Observation until Final Answer. The model emits text; the framework parses it, runs the tool, and splices the result back. Re-feed the whole scratchpad each turn.
- Interleaving beats CoT because it opens the loop — a real Observation grounds each Thought, so hallucinated intermediates don’t propagate.
- The loop is ~30 lines and needs a hard
max_stepsguard. Know it cold; it is the canonical agent live-coding question. - Text parsing is brittle (malformed JSON, hallucinated tool names, format drift). Handle every failure as an error Observation, never a crash — and let that brittleness motivate native tool calling (module 02), where the provider returns structured, schema-validated tool calls.
- Treat ReAct as scaffolding, not intelligence. Watch exemplar-query similarity, validate on held-out distributions, and reach for structured runtimes when reliability matters.
Files in this module
Section titled “Files in this module”react_agent.py— the worked agent: tools (calculator,search), the parser (parse_react_output), andReActAgent(system prompt + loop).demo.py—python demo.pyfor an offline scripted trace;--liveto drive a real model via OpenRouter through the same loop.test_react.py— offline tests proving the worked code (parsing, dispatch, loop, max-steps termination, Final-Answer extraction, error recovery).exercises.py— your turn: implementparse_step,should_stop,run_react.solutions.py— complete reference for the exercises.practice_test.py— run explicitly; red until you finishexercises.py.