V1 — Human-in-the-Loop
Insert mandatory human review and approval at defined decision boundaries before the agent proceeds — the agent blocks until a human approves, rejects, or modifies the plan.
Also Known As: HITL, Approval Gate, Human Checkpoint, Mandatory Review Gate. (V1 is distinct from — and in direct tension with — V2 Human-on-the-Loop; see Related Patterns.)
Classification: Category V — Reliability · Band V-A Safety and Security · the blocking oversight pattern — the agent cannot proceed past the checkpoint without a human verdict.
Intent
Make the agent halt at the boundary of any action whose cost-of-error exceeds the cost-of-delay, surface the planned action to a human in interpretable form, and resume only on an explicit verdict — so that irreversible, novel, or high-blast-radius actions never execute autonomously.
Motivation
Autonomous agent failure is the dominant production risk for agentic systems. The Composio AI Agent Report (2025) finds 88% of agent projects never reach production, and the most-cited cause is that fully autonomous behaviour in high-stakes contexts destroys value rather than creating it. The pattern that solves this — at the cost of latency — is to block the agent at chosen boundaries until a human approves the next step.
Naive alternatives all fail in characteristic ways. Trusting the model's own confidence score is unreliable: confident-but-wrong is the modal failure mode of capable LLMs. This is not a calibration quirk — token generation is stochastic sampling from a probability distribution, and high probability mass on a token is not equivalent to epistemic certainty; the model has no privileged access to the correctness of its own outputs (mechanism 7). Output-only guardrails (anti-pattern A5) catch a fraction of bad actions but miss the ones the model was trained or prompted to phrase acceptably. Logging without blocking (V14 alone) produces excellent post-incident forensics on damage that has already happened. A monitoring-only architecture (V2 Human-on-the-Loop) is correct for reversible routine actions but wrong for irreversible ones — by the time a human sees the alert, the email has been sent or the row has been deleted.
V1's unique contribution is that the agent cannot proceed. This is not a UX preference about how autonomous the agent feels. It is an architectural property tied to a specific class of actions: those whose blast radius exceeds what an after-the-fact correction can recover. Sending external communications, financial transactions, deleting data, modifying production systems, applying self-modifications to the agent's own principles or code — these are V1 territory by their reversibility profile, regardless of how reliable the agent has shown itself to be on adjacent tasks. The mapping is per-action, not per-agent: the same agent can be V1-gated on send_email and V2-monitored on draft_reply.
Applicability
Use V1 when:
- the action is irreversible — sending external communications, financial transactions, deleting data, modifying production systems, publishing public content;
- the action is novel — outside the agent's evaluated operating envelope (V16 Offline Eval coverage gap);
- the blast radius is high — error affects systems, users, or counterparties beyond the agent's own scope;
- a regulatory regime mandates human oversight (EU AI Act Article 14, sector-specific compliance);
- the action is self-modifying — required by H5 (Constitutional Self-Alignment) and H8 (Meta-Agent Self-Modification) with no exception;
- the agent itself has flagged uncertainty above a calibrated threshold.
Do not use V1 when:
- the action is reversible and routine — choose V2 Human-on-the-Loop, which monitors without blocking;
- latency would defeat the purpose — V2 with strong V14 logging and V17 monitoring covers low-blast-radius high-volume actions;
- the action is fully deterministic and policy-checked — V7 AgentSpec / Declarative Governance with PROHIBIT rules can enforce the constraint without human in the loop;
- the action is internal to the agent's reasoning — checkpointing every thought is theatre. Gate at the external action boundary, not at every reasoning step.
Decision Criteria
V1 is right when an autonomous error in this specific action type would cost more than the delay of waiting for a human verdict.
1. Reversibility test. Classify the action: can its effect be undone within the same session by another tool call? If NO, V1. If YES and the undo is cheap, V2 is acceptable. Threshold: an action whose reversal requires another party's cooperation (sending email, posting to public channels, executing a trade) is not reversible by the agent and is V1 territory.
2. Blast-radius test. Score the maximum harm of a wrong action on a 1–5 scale: (1) ephemeral session-internal, (2) wastes tokens or compute, (3) affects this user's local state, (4) affects external systems or counterparties, (5) regulatory, financial, or reputational damage. Score $\geq$ 4 $\to$ V1. Score $\leq$ 2 $\to$ V2 or V7 alone. Score 3 $\to$ V2 with V14 + V17.
3. Novelty test. Is the action covered by the V16 offline eval suite and within the V17 online quality envelope? If the action is outside the evaluated envelope, V1 is required regardless of reversibility — there is no calibration to trust. Threshold: if the action's parameters were not represented in the most recent eval pass, treat as novel.
4. Coverage by V7. Is there a deterministic policy rule that already governs this action via V7 AgentSpec? If V7 PROHIBIT covers it, V1 is not needed — the policy engine blocks unconditionally, because deterministic rule evaluation has no sampling variance (mechanism 7). If V7 PERMIT covers it but the human still wants discretion, V1 sits between PERMIT and execution.
5. Latency budget. What is the acceptable wait time for human verdict (seconds, minutes, hours)? If the budget is too tight for any human to respond, the action either needs to be V2-monitored with a hard V9 bound, or should not be automated at all — the question being asked is not whether to use V1 but whether to use an agent.
Quick test — V1 is the right pattern when:
- the action is irreversible (cannot be undone autonomously by the agent), and
- the blast radius is $\geq$ 4 or the action is novel (outside V16/V17 envelope), and
- no V7 deterministic rule already blocks the action, and
- the latency budget tolerates a human response.
If the action is reversible and routine, choose V2 Human-on-the-Loop. If the action is fully specifiable as a hard rule, choose V7 AgentSpec (deterministic, no human required). If the latency budget cannot tolerate any wait, reconsider whether the action should be automated at all — never silently downgrade V1 to V2 to avoid the wait. (This downgrade is the anti-pattern: see CRITICAL 2 in Appendix A.)
Structure
Agent → planned action a
│
▼
[ Gate(a) ] ← decides: V1, V2, or pass-through
│
gate = V1
▼
[ Surface ] ← human-readable plan + rationale + expected outcome
│
▼
[ Block & Wait ] ← state checkpointed (V10); execution paused
│
human verdict
│
┌────────┼────────┬─────────────┐
▼ ▼ ▼ ▼
APPROVE REJECT MODIFY ESCALATE
│ (+reason) (edits a) │
│ │ │ ▼
│ ▼ ▼ higher authority
│ re-plan execute a' gate
▼
execute a
│
▼
(V14 logs verdict, prompt, plan, outcome)
Timeout → safe default = ABORT (never proceed)
Participants
| Participant | Owns | Input $\to$ Output | Must not |
|---|---|---|---|
| Checkpoint Gate | the decision whether this action needs V1 | planned action + context $\to$ V1 / V2 / pass-through | use model confidence as the sole signal — gate by action class (reversibility, blast radius, novelty), or it will rubber-stamp confident wrong actions. |
| Plan Surfacer | producing a human-readable representation of the planned action | tool-call payload + rationale $\to$ review artefact (action, why, expected outcome, alternatives) | surface raw JSON or opaque tool arguments — an unreviewable plan is V1 theatre. |
| Blocker | halting agent execution at the checkpoint | gate verdict (V1) $\to$ paused state via V10 | proceed on timeout — the safe default is always ABORT. |
| Human Reviewer | the verdict | review artefact $\to$ {APPROVE, REJECT+reason, MODIFY+edits, ESCALATE} | be presented with so many checkpoints they stop reading. The Gate's calibration is the Reviewer's protection. |
| Modification Channel | structured edits to the plan | reviewer edits $\to$ revised action a' | allow free-text edits that re-enter the agent unchecked — modifications must re-enter the same gate. |
| Escalation Router | routing to higher authority when first reviewer cannot decide | review artefact + escalation reason $\to$ next-level reviewer | be a dead-end — every escalation must terminate in an explicit verdict or a documented abort. |
| Audit Recorder | logging the verdict, prompt, plan, and outcome (delegated to V14) | every checkpoint event $\to$ immutable trace | omit the reason on REJECT — the reason is the training data for future gate calibration. |
Seven narrow responsibilities. The pattern's correctness lives in the Gate (right things get gated), the Surfacer (the human can actually review), and the Blocker (no execution without verdict). The Audit Recorder is the feedback channel that lets the Gate improve over time.
Collaborations
The Agent generates a planned action and submits it to the Checkpoint Gate. The Gate classifies the action by its V1 / V2 / pass-through profile (reversibility, blast radius, novelty, V7 coverage). If V1 fires, the Plan Surfacer composes a human-readable artefact — what the action is, why the agent chose it, what outcome is expected, and what reversal looks like if applied wrongly — and the Blocker checkpoints the agent's state via V10 and halts execution. The Human Reviewer responds with one of four verdicts. APPROVE releases the original action to execution. REJECT returns the agent to re-plan, carrying the reviewer's reason as a constraint. MODIFY routes through the Modification Channel: the edited plan re-enters the Gate (it is not allowed to bypass it) and the new action is then surfaced for confirmation if its class has changed. ESCALATE routes the artefact to higher authority through the Escalation Router. On every verdict, the Audit Recorder writes the prompt, plan, verdict, reason, and downstream outcome to the V14 trace. On timeout — no verdict within the budget — the Blocker's safe default is ABORT and a V14 timeout event.
Consequences
Benefits
- Prevents catastrophic autonomous errors on the action classes where they would be most costly.
- Builds operator and user trust by making irreversibility explicit rather than implicit.
- Generates a high-quality calibration signal — every REJECT carries a reason that can refine the Gate and future agent training.
- Satisfies hard regulatory requirements (EU AI Act Article 14) for human oversight on high-risk actions.
- Provides a clean human escape hatch when the agent encounters an action outside its evaluated envelope.
Costs
- Adds latency on every gated action — typically seconds to minutes for routine review, longer for escalation.
- Requires a Surfacer good enough to make the plan reviewable in seconds, not a JSON dump.
- Operational cost of a human reviewer in the loop; bottleneck when checkpoint volume is high.
- Checkpointing infrastructure (V10) and audit logging (V14) are prerequisites — V1 without them loses work on every pause.
Risks and failure modes
- Automation bias — under time pressure, reviewers rubber-stamp every plan. Mitigation: track APPROVE-without-modification rate; if > 95%, the Gate is over-firing or the Surfacer is unreviewable.
- Checkpoint theatre — too many gates dull human attention until the one that mattered slides through. Mitigation: calibrate the Gate ruthlessly; demote any action class with repeated unmodified approvals to V2.
- Too few checkpoints — only the visible decisions are gated; the agent quietly executes the unrecorded ones. Mitigation: gate by action class (reversibility, blast radius), not by visibility.
- Silent V2 downgrade — teams under latency pressure relabel V1 actions as V2 to remove the block. This is the CRITICAL 2 anti-pattern (Appendix A). Mitigation: the V1/V2 boundary should require explicit governance review, not a runtime config flag.
- Timeout-to-proceed — defaulting to "proceed on no response" inverts the pattern. The safe default is always ABORT.
- Unsurfaceable plan — actions whose effect cannot be summarised for a human reviewer should be redesigned or refused, not waved through.
Implementation Notes
- Gate by action class, not by model confidence. A confident-but-wrong action is exactly the class V1 exists to catch. The reversibility/blast-radius/novelty triple is the right gate input.
- The Surfacer is half the pattern. Plans must be reviewable in under 30 seconds: action, why, expected outcome, what reversal looks like. Raw tool-call JSON is not a review artefact.
- REJECT must carry a reason. A reason-less rejection trains nothing. Make the reason field mandatory and surface aggregated rejection reasons as a Gate-calibration signal.
- MODIFY must re-enter the Gate. Reviewer edits can change the action's class (a small modification can move it from V1 to V2 or vice versa). Never let a modification bypass the gate.
- Timeout defaults to ABORT, always. If the human cannot respond in time, the system does not proceed. If the latency budget is too tight for any human, the action is the wrong fit for V1 — choose V2 with V9 hard bounds, or refuse the automation.
- Pair with V10 (Checkpointing) and V14 (Trajectory Logging). Both are prerequisites, not co-options. V10 saves state so the block doesn't lose work; V14 logs the verdict so the calibration loop closes.
- Track approval-rate-without-modification. > 95% means automation bias or Gate over-firing. < 50% means the agent's planning quality is the real problem and V1 is masking it.
- Demote and promote between V1 and V2 deliberately. When an action class accumulates a long approval history with no modifications, governance review can demote it to V2 with stricter V17 monitoring. When V2 monitoring catches near-misses, promote back to V1. The mapping is reviewed, not set-and-forget.
Implementation Sketch
LLM= configured session (model + setup + per-call prompt);code= wiring.
Composition: V1 wraps any agent action that the Checkpoint Gate classifies as V1-required. It composes with V10 Checkpointing (state save before block), V14 Trajectory Logging (verdict audit), V7 AgentSpec (deterministic gate input), and V9 Bounded Execution (timeout cap). The Surfacer is a Signal-layer artefact (S6 Output Template, S5 Constraint Framing for what must be included). Required by H5 Constitutional Self-Alignment and H8 Meta-Agent Self-Modification for every principle / parameter change.
The chain:
| # | Step | Kind | Draws on |
|---|---|---|---|
| 1 | Agent plans next action | LLM | Agent session (outside V1) |
| 2 | Gate classifies the action: V1 / V2 / pass-through | LLM (or rule) | Gate session; V7 |
| 3 | Branch — if pass-through or V2, exit V1; else continue | code | |
| 4 | Surfacer composes the human-readable review artefact | LLM | Surfacer session; S6 |
| 5 | Checkpoint state (V10) and block execution | code | V10 |
| 6 | Present artefact to human; wait for verdict (bounded by timeout) | code | V9 |
| 7 | Branch on verdict — APPROVE / REJECT / MODIFY / ESCALATE / TIMEOUT | code | |
| 8 | On MODIFY: revised action re-enters at step 2 | code | |
| 9 | Record verdict, prompt, plan, outcome | code | V14 |
Skeleton — wiring only:
hitl_checkpoint(agent_state, planned_action):
gate = Gate(planned_action, context=agent_state) # LLM (or rule) — class V1/V2/pass
if gate.class != V1:
return execute_or_monitor(planned_action, gate) # exits to V2 or pass-through
artefact = Surfacer(planned_action, agent_state) # LLM — review artefact
checkpoint_id = V10_save(agent_state) # code — checkpoint before block
verdict = wait_for_human( # code — bounded wait
artefact,
timeout=budget,
on_timeout=ABORT # safe default is never proceed
)
V14_log(checkpoint_id, planned_action, artefact, verdict) # code — audit
match verdict:
APPROVE → execute(planned_action)
REJECT → return_to_agent(reason=verdict.reason)
MODIFY → hitl_checkpoint(agent_state, verdict.revised_action) # re-enter gate
ESCALATE → route_to(verdict.escalation_target)
TIMEOUT → abort_with_log()
The LLM sessions:
| Session | Model | Setup — loaded once, before first call | Per-call prompt wraps |
|---|---|---|---|
| Gate | small fast generalist, or a deterministic rule engine when the action set is enumerable | role ("you classify whether a planned agent action requires blocking human review"); the reversibility / blast-radius / novelty rubric; the V7 PROHIBIT list to cross-check; output contract (one of V1, V2, PASS, with a one-sentence reason) | the planned action and the relevant context |
| Surfacer | capable generalist — review quality caps the value of the whole pattern | role ("you produce a human-readable review artefact for a planned agent action"); the output template (S6) — fields: action, why, expected outcome, what reversal looks like, alternatives considered; constraints (S5) — no raw JSON; $\leq$ 200 words; never omit the reversal section | the planned action, the rationale trace from the agent, and the relevant context |
Specialist-model note. No fine-tuned specialist is required, but two structural choices change everything. First, the Gate must be deterministic where it can be — when the action set is small and enumerable, a rule engine (or V7) is strictly better than an LLM Gate, because the Gate's failure mode is the pattern's failure mode. When the Gate is an LLM, it is subject to the same stochastic sampling failure as the agent it gates — this is why V7 AgentSpec (deterministic rule engine) is strictly preferable to an LLM Gate for enumerable action sets (mechanism 7). Second, the Surfacer benefits from the strongest available model — reviewability is the bottleneck, and the cost is paid once per checkpoint, not once per turn. For agents handling regulated actions (EU AI Act Article 14 high-risk), pair the Gate with V7 AgentSpec rather than relying on the LLM Gate alone.
Open-Source Implementations
- LangGraph
interrupt()—github.com/langchain-ai/langgraph— the most direct V1 implementation in the major frameworks. Theinterrupt()function pauses graph execution at any node, surfaces a payload to the caller, and resumes only when re-invoked withCommand(resume=...). State persistence is built in. Seedocs.langchain.com/oss/python/langgraph/interrupts. - HumanLayer —
github.com/humanlayer/humanlayer— purpose-built for V1: turn any function call into a human-approval gate via Slack, email, or web UI. Companion to the 12-Factor Agents methodology. - 12-Factor Agents —
github.com/humanlayer/12-factor-agents— Factor 6 (Launch / Pause / Resume) and Factor 7 (Contact Humans With Tool Calls) are the canonical statement of the V1 design. - AutoGen
UserProxyAgent—github.com/microsoft/autogen—human_input_mode="ALWAYS"makes a user-proxy agent block on every message;"TERMINATE"blocks on termination conditions;"NEVER"disables V1. - CrewAI human input —
github.com/crewAIInc/crewAI— task-levelhuman_input=Trueflag pauses agent execution on task completion for human review before continuing.
Known Uses
- Claude Code — file edit and command execution gated by an explicit per-action approval (deny / allow once / allow always per session) — V1 with operator-controlled promotion to pass-through within a session.
- Cursor — agent-mode edits gated by an apply/reject step before changes touch the user's working tree.
- Devin — long-running autonomous coding agent surfaces blocking checkpoints when actions touch external systems or production environments.
- Enterprise procurement and treasury agents — financial-transaction agents almost universally route over a defined threshold to a human approver; below threshold, V2-monitored.
- Email and CRM outreach agents — outbound message agents that draft autonomously but block on
senduntil a human confirms — the canonical V1 split where drafting is V2 and sending is V1. - Production deployment bots — release agents that can plan and stage a deploy autonomously but require human approval to promote to production.
Related Patterns
- Distinct from V2 Human-on-the-Loop — V1 blocks, V2 monitors. The choice is per-action by reversibility / blast radius / novelty, not per-agent by operational preference. (CRITICAL 2 in Appendix A.)
- Requires V10 Checkpointing — the agent must save state to wait for the human verdict; V1 without V10 loses work on every pause.
- Pairs with V14 Trajectory Logging — every verdict, reason, and outcome belongs in the audit trace; V14 is the calibration channel for the Gate.
- Pairs with V9 Bounded Execution — the wait-for-human step needs a timeout bound; the safe default is ABORT, not proceed.
- Composes with V7 AgentSpec — deterministic prohibitions are enforced by V7 without human review; V1 sits in the discretionary zone between V7 PERMIT and execution.
- Required by H5 Constitutional Self-Alignment — every proposed principle change must be V1-gated; no exception. (CRITICAL 7 in Appendix A.)
- Required by H8 Meta-Agent Self-Modification — any significant behavioural modification proposed by an agent about itself must be V1-gated.
- Tension with H6 Continuous Inner Monologue — autonomous background thinking that produces actions must route those actions through V1; H6 should produce insights, not autonomous actions, unless explicitly scoped and gated.
- Triggered by V17 Online Eval — quality drift detected in production fires V1 escalation for at-risk action classes.
- Pairs with S6 Output Template + S5 Constraint Framing — the Surfacer's review artefact is a Signal-layer construct with hard structural requirements.
Sources
- 12-Factor Agents (Dex Horthy / HumanLayer, 2024–25) — Factor 6 (Launch / Pause / Resume) and Factor 7 (Contact Humans With Tool Calls).
- Anthropic — Building Effective Agents (2024–25): checkpoints before irreversible actions as standard agent design.
- LangGraph documentation —
interrupt()and Command-based resume for V1 implementation; the closest framework match to the pattern shown above. - Composio AI Agent Report (2025) — 88% production-failure analysis, autonomous-behaviour failure as primary cause.
- EU AI Act (Regulation 2024/1689) Article 14 — mandatory human oversight requirements for high-risk AI systems.
- NIST AI Risk Management Framework (AI RMF 1.0) — human oversight as a first-class risk control.
- ISO/IEC 42001:2023 — AI Management System standard, human oversight clauses.