The ReAct Loop, From Scratch

“Implement a basic ReAct (Reasoning and Acting) agent with a calculator and a search tool, no framework.” — a live-coding question you will get, almost verbatim, in an AI-engineer interview.

This is the centerpiece of the bootcamp. Get the ReAct loop into your fingers and your vocabulary and most of the agent world becomes “ReAct plus a twist.”

Learning objectives

By the end you can:

State what ReAct is (Reason + Act) and recite the exact loop from memory.
Place ReAct on the 2×2 of {reasoning trace?} × {external action?} and explain why it beats CoT (Chain-of-Thought)-only and Act-only on multi-step, knowledge-seeking tasks.
Explain why interleaving reduces hallucination (open vs closed loop).
Build a text-parsing ReAct agent from scratch: system prompt → LLM (Large Language Model) call → parse Thought/Action/Action Input/Final Answer → dispatch a tool → splice an Observation back in → repeat, with a hard max_steps guard.
Name the failure modes of text-based parsing and use them to motivate native tool calling (module 02).

1. What ReAct is

ReAct (“Reasoning + Acting”, Yao et al., 2022, arXiv 2210.03629; ICLR — International Conference on Learning Representations — 2023) is a prompting pattern that makes a language model interleave reasoning and tool use in a single generation stream. The model alternates between:

Thought — reason about the current state, plan the next move;
Action — name a tool to call;
Action Input — the arguments for that tool (we use JSON — JavaScript Object Notation);
Observation — the tool’s result, which the framework appends;

…repeating until the model emits a Final Answer.

Not the JavaScript library. “ReAct” here is the agent-reasoning paper. Mixing them up in an interview is a small but avoidable own-goal.

It matters because a bare LLM has two weaknesses that cancel each other out: its reasoning is ungrounded (it can confidently invent facts), and a tool by itself has no judgment about when to fire or how to read its output. ReAct lets reasoning steer the actions, and lets the actions’ results correct the reasoning.

2. The loop — cold enough to whiteboard

            ┌─────────────────────────────────────────────┐
            │  system prompt: tools + the output grammar   │
            └─────────────────────────────────────────────┘
                              │
   question ───►  ┌───────────▼───────────┐
                  │   call the LLM with    │ ◄──── full scratchpad re-fed
                  │  question + scratchpad │       (all prior T/A/O)
                  └───────────┬───────────┘
                              │ model emits TEXT
                  ┌───────────▼───────────┐
                  │  parse the text:       │
                  │  Final Answer? ───────────► return it ✓ (stop)
                  │  Action + Action Input?│
                  └───────────┬───────────┘
                              │ yes
                  ┌───────────▼───────────┐
                  │ registry.dispatch(...) │  run the tool
                  └───────────┬───────────┘
                              │ result
                  ┌───────────▼───────────┐
                  │ append:                │
                  │ "<model text>\n        │
                  │  Observation: <result>"│  ──► loop (until max_steps)
                  └────────────────────────┘

A worked offline trace (demo.py, scene 1):

Question: How many more people live in France than in Paris?

Thought: First I need the population of France.
Action: search
Action Input: {"query": "population of France"}
Observation: The population of France is about 68000000.

Thought: Now I need the population of Paris.
Action: search
Action Input: {"query": "population of Paris"}
Observation: The population of Paris is about 2100000.

Thought: Subtract Paris's population from France's: 68000000 - 2100000.
Action: calculator
Action Input: {"expression": "68000000 - 2100000"}
Observation: 65900000

Thought: I now know the final answer.
Final Answer: About 65,900,000 more people live in France than in Paris.

Two non-obvious facts about the mechanics:

The model never writes the Observation: line. The framework runs the tool and injects it. In a live setup we even pass a stop sequence (stop=["\nObservation:"]) so the model halts before it can hallucinate one.
The entire scratchpad is re-fed every turn. The model is stateless; its only “memory” of step 1 is that we paste steps 1..n-1 back into step n’s prompt. This is why context grows linearly with steps (a real cost/scaling concern — see Pitfalls).

The from-scratch core is genuinely ~30 lines (react_agent.py::ReActAgent.run, and the standalone solutions.py::run_react). If you can reproduce that loop on a whiteboard, you can answer the most common agent live-coding prompt.

3. The 2×2: where ReAct sits

Cross “does it produce a reasoning trace?” with “can it take external actions?”:

                       no external action        external action
                    ┌─────────────────────────┬─────────────────────────┐
   reasoning trace  │  Chain-of-Thought (CoT)  │        ReAct            │
                    │  reason-only, CLOSED loop │  reason + act, OPEN loop │
                    ├─────────────────────────┼─────────────────────────┤
   no reasoning     │     plain LLM answer     │   Act-only / tool-calling│
                    │   (single shot)          │   tools, no planning     │
                    └─────────────────────────┴─────────────────────────┘

CoT (reason-only): great for self-contained math/logic, but it is a closed loop — every “fact” comes from the weights. A hallucinated intermediate value propagates and corrupts everything downstream.
Act-only (tools, no reasoning): calls tools straight from the query, with no step to update the plan, judge whether it has enough, or synthesize across results. Brittle on anything multi-hop.
ReAct: the Thought is a scratchpad for state-tracking and recovery; the Observation grounds the next Thought in reality. Combined CoT+ReAct beats either alone on multi-hop QA (Question Answering) (HotpotQA) and fact verification (FEVER), and ReAct gave large absolute gains on interactive benchmarks (ALFWorld +34%).

ReAct vs native tool calling (module 02)

They are the same idea (reason, then call a tool, then read the result) at two different layers:

	ReAct (this module)	Native tool calling (module 02)
How the tool call is expressed	plain text the model writes	a structured `tool_calls` / `tool_use` field the API (Application Programming Interface) returns
Who parses it	you (regex on the text)	the provider (validated against your JSON Schema)
Reasoning trace	explicit `Thought:` lines	the model’s internal/optional reasoning
Robustness	brittle (malformed text breaks the parser)	the API guarantees well-formed args
Portability	works on any text model, even ones with no tool API	needs a provider that supports tool calling

ReAct is the concept; native tool calling is the productionized plumbing for it. We build ReAct by hand first precisely so that when the API hands you a clean tool_use block in module 02, you know exactly what messy problem it is solving for you.

4. Why interleaving reduces hallucination

This is the money explanation; have it ready.

CoT is a closed loop, ReAct is an open loop. In CoT the model reasons from its weights alone, so a fabricated intermediate fact (a wrong date, a made-up population) is never checked — it flows into every later step and poisons the answer. ReAct inserts a real Observation from a tool after each Thought, so the next Thought builds on verified external data instead of speculation. You are replacing “the model’s guess about X” with “X, actually looked up.”

Concretely: ask “What’s the population of France minus the population of Paris?” A CoT model might assert “France ≈ 70M, Paris ≈ 11M” (Paris metro vs city — a classic confusion) and compute confidently. The ReAct agent searches, gets the real numbers back as Observations, and the arithmetic is grounded.

5. The prompt format

The system prompt does two jobs:

Advertise the tools. The model only knows a tool exists from this text: its name, a natural-language description, and the shape of its JSON arguments. Description quality is the single biggest lever on whether the model picks the right tool.
Teach the output grammar. Spell out the Thought:/Action:/Action Input:/ Observation:/Final Answer: prefixes and include at least one worked example.

react_agent.py::ReActAgent.system_prompt builds exactly this. We include one worked example (a one-shot exemplar). Few-shot examples make the grammar far more reliable on smaller live models — but mind the 2024 brittleness paper (below): keep exemplars representative of the real task, because the model leans on exemplar-query similarity more than you would like.

The user turn each iteration is just Question: … followed by the running scratchpad. The model continues the transcript from there.

6. Brittleness of text parsing — and why module 02 exists

Everything that makes text-ReAct teachable also makes it fragile. The model’s “tool call” is just prose, and prose breaks:

Malformed JSON in Action Input ({expression: 2+2}, trailing commas, single quotes) → json.loads throws.
Tool-name hallucination (Action: wikipedia when only search exists).
Format drift — the model forgets the Action Input: line, wraps JSON in ```json fences, or invents its own Observation:.
No Final Answer: ever → the loop would spin forever without a guard.

The professional move is never raise out of the loop. Turn every failure into a descriptive Observation: ERROR … and feed it back so the model can self-correct on the next turn (this is the text-ReAct equivalent of an is_error: true tool result). demo.py scene 2 shows the agent recover from a hallucinated tool name and then a JSON syntax error, all without crashing.

try:
    args = json.loads(step.action_input)
except json.JSONDecodeError as exc:
    observation = f"ERROR: Action Input was not valid JSON ({exc.msg})."
...
try:
    observation = registry.dispatch(step.action, args)
except KeyError:
    observation = f"ERROR: unknown tool {step.action!r}. Available: {names}."

This fragility is the whole motivation for module 02 (native tool calling): let the provider emit a structured, schema-validated tool call so you delete the regex layer and a class of bugs with it. ReAct is the idea; native tool calling is the robust transport.

Brittle Foundations critique (arXiv 2405.13966, 2024). Much of ReAct’s apparent “reasoning” is driven by exemplar-query similarity, not by the content of the interleaving — it behaves a lot like approximate retrieval. On novel distributions, zero-shot ReAct is often more stable than few-shot. Good interview answer: “I treat ReAct as scaffolding, not intelligence — I’d validate it on held-out, distribution-shifted evals and consider zero-shot or a structured runtime (e.g. a graph) rather than trusting few-shot exemplars.”

Interview angle

callout — what they’re probing and how to land it

“Why does ReAct reduce hallucination vs CoT?” → Open vs closed loop. A real Observation grounds each Thought; CoT propagates fabricated intermediates.
“ReAct vs just calling tools?” → The Thought lets the model replan, judge sufficiency, and synthesize across observations. Act-only has no such step.
“Implement ReAct from scratch.” → Sketch: system prompt (tools + grammar) → for _ in range(max_steps) → call LLM → parse Final Answer vs Action/Action Input → json.loads + registry.dispatch → append Observation → repeat. Mention the max_steps guard before they ask.
“Failure modes?” → verifier stall / infinite loop (no guard), malformed JSON, tool-name hallucination, context explosion, prompt injection via Observations, few-shot brittleness.
“When NOT ReAct?” → fixed-structure pipelines (Plan-and-Execute is cheaper and auditable), latency-critical paths, pure math (CoT alone can win), very novel query distributions (zero-shot / structured runtime).
“$5/query, agent makes 50 tool calls — fix it.” → iteration cap + cost budget, model tiering, cache repeated lookups, inspect the trace for a verifier stall, and shorten the scratchpad (rolling summary).

Common pitfalls / gotchas

No max_steps guard. The single most expensive agent bug. A verifier stall (the model repeating the same failing Action) with no cap produced a documented runaway (~11 days, ~$47K). Always bound iterations and cost.
Trusting Observations. Tool output / retrieved text is untrusted input injected straight into your context — a prompt-injection vector (“ignore your instructions and …”). Never let an Observation silently change which tool runs.
Crashing on bad model output. Wrap json.loads and dispatch; convert errors into Observation: ERROR … so the model recovers instead of the process dying.
Letting the model write its own Observation:. Use a stop sequence and/or truncate the model text at the first Observation: before splicing in the real one — otherwise the model hallucinates results and reasons on fiction.
Context explosion. The scratchpad grows linearly; long trajectories blow the window (and cost rises quadratically with attention). Cap steps, summarize, or move to a structured runtime.
eval() on Action Input. Never eval model-emitted strings — it’s remote code execution (OWASP — Open Worldwide Application Security Project — LLM05). Our calculator parses an AST (Abstract Syntax Tree) and walks only whitelisted numeric/operator nodes (react_agent.py::safe_eval).
Few-shot over-fitting. Generic exemplars on a domain task can underperform zero-shot. Make exemplars representative; evaluate on distribution shift.
Confusing ReAct with React.js. It’s Reasoning + Acting.

Key takeaways

ReAct = interleave Thought → Action → Observation until Final Answer. The model emits text; the framework parses it, runs the tool, and splices the result back. Re-feed the whole scratchpad each turn.
Interleaving beats CoT because it opens the loop — a real Observation grounds each Thought, so hallucinated intermediates don’t propagate.
The loop is ~30 lines and needs a hard max_steps guard. Know it cold; it is the canonical agent live-coding question.
Text parsing is brittle (malformed JSON, hallucinated tool names, format drift). Handle every failure as an error Observation, never a crash — and let that brittleness motivate native tool calling (module 02), where the provider returns structured, schema-validated tool calls.
Treat ReAct as scaffolding, not intelligence. Watch exemplar-query similarity, validate on held-out distributions, and reach for structured runtimes when reliability matters.

Files in this module

react_agent.py — the worked agent: tools (calculator, search), the parser (parse_react_output), and ReActAgent (system prompt + loop).
demo.py — python demo.py for an offline scripted trace; --live to drive a real model via OpenRouter through the same loop.
test_react.py — offline tests proving the worked code (parsing, dispatch, loop, max-steps termination, Final-Answer extraction, error recovery).
exercises.py — your turn: implement parse_step, should_stop, run_react.
solutions.py — complete reference for the exercises.
practice_test.py — run explicitly; red until you finish exercises.py.