Helix Agents Framework Concepts

This document is the canonical reference for Helix Agents framework concepts. It describes current behavior (post-A.2 / A.3); for the v6 → v7 migration delta, see ../upgrade-guides/v6-to-v7-stateless-suspension.md.

For agents: Cross-runtime work (HITL, sub-agents, state store semantics) usually requires fetching this file in addition to the top-level CLAUDE.md. The pointer table in CLAUDE.md lists this file under "framework concepts deep-dive."

Session — The primary unit of agent conversation state. A session contains all messages, custom state, and checkpoints for a conversation. Sessions are identified by sessionId, which is the primary key for all state operations. Multiple runs can occur within a single session (e.g., after interrupts or when continuing a conversation).

Run — A single execution within a session. Each time an agent executes (via execute() or resume()), a new run is created with a unique runId. Runs track execution metadata like turn number, step count, status (running/completed/failed/interrupted), timing, and startSequence (the stream position when the run started, used for filtering chunks in multi-run scenarios). Use getCurrentRun(sessionId) to get the active run and listRuns(sessionId) to see run history. As of v7, AgentResult.status may also be 'suspended_client_tool' | 'suspended_awaiting_children' | 'suspended_step_partial' for HITL agents that paused mid-run; exhaustive switch statements must handle these three additional cases. The result.suspended field carries the routing info (toolCallIds, children, stepId) needed to drive resume.

Multi-turn continuation — executor.execute(agent, msg, { sessionId }) is portable for both fresh sessions and follow-up turns. Calling execute() again with the same sessionId after the prior run reached completed starts a new run/turn that sees the full prior conversation. A still-running session rejects with AgentAlreadyRunningError (single-writer mutex). The portable contract floor is completed: continuation from interrupted / failed / paused is not part of the cross-runtime guarantee — use resume() / retry() instead. runtime-js additionally accepts a broader prior-state set on execute(); that is a JS-specific superset, not a cross-runtime promise. Per-runtime mechanics (GL #74): runtime-js / runtime-dbos create a new run within the session; runtime-temporal reuses the sessionId-derived workflow id (Temporal's default reuse policy starts the new run once the prior closes) and reactivates the 'ended' stream on continuation; CF-DO inherits the JS executor's continuation path. The single missing piece on runtime-temporal was stream reactivation — turn 1's completion ends the sessionId-keyed stream, so execute() must call reactivateStream(streamId) before the new run writes (mirrors retry()). The HTTP path uses this via POST /chat → the consumer's chatHandler (ai-sdk handle-chat-stream "continuing session" path → execute(sessionId)); agent-server's /start (startAgent) is fresh-start-only by design (remote-sub-agent protocol). CFW Workflows multi-turn is not yet supported — it needs the same reactivation plus a per-turn instance id (write-once ids prevent reusing the base id) and getCurrentRun-based handle reconstruction. Tracked in GitLab #109.

Agent — Created with defineAgent(). Has a system prompt, tools, output schema (Zod), LLM config, and max steps. The outputSchema auto-injects a __finish__ tool for structured output.

Agent registry replace API. Runtimes that resolve agents BY NAME (Temporal, Cloudflare Workflows) provide a AgentRegistry.replace(config) method for swapping the registered reference — used most commonly in tests that need per-call hooks on the same agent type. JS and DBOS read the agent reference inline and don't need this API. See ../runtimes/temporal.md for the full API description.

Tool — Created with defineTool(). Has a name, description, Zod parameters schema, and execute function. Tools receive a ToolContext with getState(), updateState() (Immer), emit(), and abortSignal.

Tool Execution Order — When the LLM returns multiple tool calls in one response, they execute in two phases: (1) regular tools run in parallel, their state changes become visible (in-memory for JS runtime, committed to store for Temporal/Cloudflare); (2) finishWith tools run sequentially, seeing the updated state from phase 1. This ensures finishWith tools always see the complete state from all other tools in the batch. In the JS runtime, sub-agents execute alongside regular tools in phase 1. In Temporal/Cloudflare, sub-agents execute after phase 2 as child workflows/instances. Sub-agents always execute even when a finishWith tool succeeds — completion is deferred until sub-agents finish. Companion tools have their own phase (existing behavior, unchanged). v7 wraps this same phase logic inside the new runStepIterator (the step-iterator emits per-phase StepOutcomes), but the observable execution order is unchanged.

Sub-Agents (Ephemeral) — Created with createSubAgentTool(). Parent agents can delegate to child agents. The child runs to completion and returns the result as a tool result. Child sessions are isolated but stream events flow to the parent's stream. Each sub-agent gets its own sessionId. This is the default mode (mode: 'ephemeral' on SubSessionRef). Dispatch failures (failed createSession, child workflow start failure, DO instance creation error) surface to the LLM via tool_error chunk + synthetic tool result message + subagent.dispatch_failed ERROR log. Consistent across runtime-js, runtime-temporal, and runtime-cloudflare DO; parent execution continues so siblings in the same dispatch batch are unaffected.

Persistent Sub-Agents — Configured via persistentAgents on AgentConfig. Unlike ephemeral sub-agents, persistent children can receive follow-up messages and maintain state across multiple interactions. Two modes: blocking (parent waits for child to complete) and non-blocking (parent continues immediately, gets completion notification later). Persistent children are managed through auto-injected companion tools, not createSubAgentTool(). Each persistent child gets a stable session ID: {parentSessionId}-agent-{name}. Children can be named explicitly or auto-named ({agentType}-{counter}). State tracking uses SubSessionRef with mode: 'persistent' and name fields. D1StateStore requires V4 migration for the mode and name columns on __agents_sub_session_refs.

Companion Tools — Auto-injected into parent agents that have persistentAgents configured. Up to six tools prefixed with companion__: spawnAgent (create and start a child), sendMessage (interrupt/resume a running child with a new message), listChildren (list all persistent children), getChildStatus (check a specific child's status and output), terminateChild (kill a running child) — these five are always injected. The sixth tool, waitForResult (blocks until a child completes), is conditionally injected only when at least one persistent agent has mode: 'blocking' configured. Companion tools are handled separately from regular tools in the execution loop.

Remote Sub-Agents — Created with createRemoteSubAgentTool(). Parent agents can delegate to agents running on a separate HTTP service. Uses HttpRemoteAgentTransport for communication via HTTP + SSE. The remote service hosts agents using AgentServer from @helix-agents/agent-server. Remote sub-agents are first-class constructs across all runtimes (JS, Temporal, Cloudflare) with stream proxying, SubSessionRef tracking with remote metadata, deterministic session IDs, crash recovery via transport.getStatus(), and interrupt propagation via transport.interrupt(). Each runtime routes remote sub-agent calls through a dedicated execution path separate from regular tool calls. Client-executed tools (execute: 'client') are not supported inside remote sub-agents — the browser-submitted result has no route across the HTTP boundary. The remote /status response exposes an awaitingClientTool flag, and each runtime's remote-dispatch path fails fast with RemoteSubAgentClientToolUnsupportedError (failureReason: 'client-tool-unsupported') instead of hanging — core executeRemoteSubAgentDispatch (Temporal), executeSingleRemoteSubAgentCall (JS), runExecuteRemoteSubAgent (DBOS), and RuntimeSteps.executeRemoteSubAgentCall (Cloudflare DO). Tracked for future support in GitLab #107.

Persistent-companion continuation + the `finish` heal (cross-runtime)

A persistent companion declares an outputSchema, so it always completes via the auto-injected __finish__ tool. Re-consulting a completed companion (companion__sendMessage / companion__spawnAgent re-using its name) no longer throws or deletes the session — it continues on the preserved child session (CAS completed → active, heal the dangling __finish__ if needed, append the consult, run a new turn with a fresh per-turn maxSteps budget). failed / terminated children still re-spawn fresh. The continuation primitive necessarily diverges per runtime (different orchestration substrates), but the eager __finish__ heal, the legacy reopen heal, and the per-turn stepCount reset are uniform. (For the heal mechanics themselves, see ./step-processing.md §The __finish__ history invariant; for the full companion guide, the Sub-Agents guide → Re-consulting a persistent companion.)

Runtime	Continuation primitive (completed child)	Eager `__finish__` heal	Legacy reopen heal	Per-turn `stepCount → 0`
JS (`runtime-js`)	In-process `continuePersistentChild` — reopen the preserved session and run a new turn in the same run loop.	✅	✅	✅
CF Durable Object	Child DO `/start` on the preserved session (drives the JS executor's existing-session continuation path).	✅	✅	✅
CF Workflows	Fresh write-once instance `agent__<type>__<childSession>__continue__<stepCount>__<toolCallId>`; consult carried as the instance's `newMessages`, appended exactly once by the `!isResumable` continuation branch (no CAS in the companion step).	✅	✅	✅
Temporal (`runtime-temporal`)	Activity-side store reopen (CAS + append consult inside `executeCompanionToolCall`) + `wf.startChild` with a `__continue__<stepCount>` workflow id.	✅	✅	✅
DBOS (`runtime-dbos`)	`startPersistentContinuation` → deterministic `${childSessionId}-continue-${toolCallId}` restart workflow (`DBOS.startWorkflow` dedupes on the id); consult carried as `initialMessage`, appended exactly once by the workflow body's checkpointed append.	✅	✅	✅

Known divergence — Cloudflare pre-heal-upgrade gap. The eager heal runs at completion on every runtime going forward, and the legacy reopen heal covers pre-existing dangling-__finish__ sessions on JS / Temporal / DBOS. On Cloudflare (DO or Workflows), a child that completed under a prior (pre-heal) SDK release may send a malformed transcript on its first re-consult; complete such a child once under the new release before re-consulting it. This is the only documented divergence; all five runtimes converge after one post-upgrade completion. (Replay-idempotency on the durable runtimes — Temporal / DBOS — comes from the deterministic continuation ids plus, for DBOS, the metadata.dbosWorkflowId == ${childSessionId}-continue-${toolCall.id} marker short-circuit, which makes both spawn-continue and sendMessage-continue safe under workflow-body re-execution.)

Stateless Suspension Model (v7) — At every HTTP request boundary, the runtime is free to die. The state store is the only durable thing across requests. When an agent reaches a HITL boundary (client-executed tool, approval-gated tool, sub-agent wait), the runtime writes SessionState.suspensionContext and emits the relevant stream chunk, then returns from the run. There is no in-memory waiter, no setTimeout promise, no DO hibernation guard. Resumption is driven by executor.resume() reading durable suspension context — never by signaling an in-memory waiter. This unblocks long pauses (~80% wall-time reduction on multi-minute HITL waits) and makes deadline semantics deterministic (deadlines measure durable clock time, not in-memory promise lifetime). v7.0 ships this model on all 4 runtimes: JS, Cloudflare DO, Temporal, and Cloudflare Workflows. The runtime-dbos runtime added by main uses its own DBOS-native DBOS.recv/DBOS.send primitives for HITL suspension (Postgres-backed workflow replay) and is not part of the unified suspensionContext model — see the runtime-dbos package docs for its specifics.

Client-Executed Tools (v7) — Created with defineTool({ execute: 'client' }). When the LLM calls such a tool, the runtime writes a pending entry to the session's pendingClientToolCalls map (durable state), emits a tool_start stream chunk, and returns from the run with RunOutcome.kind = 'suspended_client_tool' (which surfaces as AgentResult.status = 'suspended_client_tool'). The runtime does not block in-memory. Consumers call executor.submitToolResult({ kind: 'client-tool-result', toolCallId, result | error }) (or the 'approval-response' variant for approval-gated tools — both share the same SubmitToolResult union); the submission writes the result to durable state and triggers a fresh run via executor.resume(). The canonical cross-runtime signal for "awaiting client submission" is pendingClientToolCalls map presence; session-level SessionStatus remains 'active'. Mixing execute: 'client' with finishWith: true is rejected at defineTool time, as is mixing execute: 'client' with requireApproval. The framework maintains SessionState.clientToolCallOwnership on root sessions to route submissions to the owning sub-agent; SessionState.rootSessionId points each sub-agent at its root. Submissions always go against the ROOT sessionId. Per-runtime status:

runtime-js: durable state writes + executor.resume() on submission. No in-memory promise map. Process restart is safe — pending entries are recovered from the state store on the next request that touches the session. The runLoop polls stateStore.checkInterruptFlag (atomic check-and-clear) at the top of each step iteration, so durable interrupts written by other processes are observed immediately. This brings JS to parity with CF DO and CFW Workflows on cross-process interrupt semantics.
runtime-cloudflare (DO path): durable state writes via DOStateStore. The hibernation guard is removed in v7; DOs are free to evict during HITL waits. Deadlines are enforced at request time via findExpiredPending (no alarm subscriber).
runtime-temporal: Workflow exits cleanly on every HITL boundary (mirrors CFW Workflows). Suspension state is durable in the session store; the workflow returns AgentWorkflowResult { status: 'suspended_*' } and Temporal releases the workflow. executor.resume(sessionId) starts a NEW workflow instance with workflow ID ${prefix}__${agentType}__${sessionId}__resume-${N} (single-dash suffix; WorkflowIdReusePolicy.ALLOW_DUPLICATE). The new workflow's mode='resume' branch calls the applyResultsAndReload activity, which drains submitted client-tool results into messages, fires onMessage + afterTool hooks, synthesizes timeouts for expired deadlines, and drains completed sub-agent children via recordSubSessionResult. submitToolResult is durable-only — no Temporal signal is sent (the workflow has already exited). Sub-agents are child workflows started via wf.startChild; on parent suspension, in-flight children are marked failed:'parent_suspended' (mitigation #3) and re-spawned via the __resume-N workflow ID convention on parent's resume (γ-cascade, spec §5.2). Approval-gated tools share the same suspension primitive (tool_approval_request chunk + 'suspended_client_tool' status); approve / deny submissions are routed via the same durable submitToolResult flow.
runtime-cloudflare (Workflow path): Workflow returns early from runAgentWorkflow on HITL boundaries with status: 'suspended_*' and durable suspension state via commitSuspendedStep activity. executor.resume() starts a new workflow instance with mode: 'resume' that drains submissions via applyResultsAndReload and continues. Sub-agents cascade up — child suspensions propagate to parent's suspended_awaiting_children. γ-cascade re-spawn on parent resume (FU-A2-40, mirrors Temporal FU-A2-09): when the parent suspends mid-sub-agent dispatch, commitSuspendedStep marks each suspendedAwaitingChildren entry's child session as failed:'parent_suspended'. On resume, applyResultsAndReload surfaces those children via childrenToRespawn; the workflow body re-dispatches each via workflowBinding.create({ id: 'agent__<type>__<id>__respawn-<attempt>' }), polls the child's durable state until terminal, and records the outcome via recordSubSessionResult. A drain-clear step then resets the parent's suspension discriminators when fully resolved. Eliminates v6's billable wall-time during HITL waits (~80% reduction on multi-minute approvals).
runtime-dbos: uses its own DBOS-native suspension primitive (DBOS.recv(toolCallId) over Postgres-backed workflow replay) rather than the unified suspensionContext model. Both client-executed tools (dispatchClientTool) and approval-gated tools (dispatchApprovalGatedTool) share the same pendingClientToolCalls map + DBOS.recv wake. For approval-gates, approve (submitToolResult({ kind: 'approval-response', approved: true })) runs the tool with the original input; deny (approved: false) records a synthetic tool_error result ('Tool call was not approved by the user') and skips execute(). Function-form requireApproval is evaluated inside a @DBOS.step (ApprovalGateStep.evaluateApprovalGatePredicateStep) so its boolean is checkpointed in workflow history and the suspend-vs-run decision is replay-deterministic even for non-pure predicates; fail-closed semantics (throw → suspend) live inside the step so the checkpointed value is always a boolean (parity with runtime-temporal's activity-wrapped predicate evaluation; GL #111 Batch C / gap 5). onAgentSuspended / onAgentResumed fire for both client-tool and approval-gate suspend/resume cycles (the recv-driven auto-continue path wakes and resumes). Per-call hooks (agent.hooks field on AgentConfig, plus ExecuteOptions.hooks / ExecuteOptions.hookManager) fire on every execute() / resume() / retry() via the process-local HookManagerRegistry indexed by DBOS.workflowID (GL #111 Batch D / gap 4): executeImpl / resumeImpl / retryImpl compute the merged manager BEFORE DBOS.startWorkflow (constructor → agent.hooks → options.hooks order, mirroring runtime-js/src/js-agent-executor.ts:3065-3108 buildHookManager) and register under the new workflowId; HookStep / ExecuteToolStep resolve the manager on every invocation and fall back to the constructor-bound static on cross-worker recovery (documented split-brain risk mirroring Temporal/CF replaceAgent semantics at runtime-cloudflare/src/registry.ts:182-193). All five GL #111 gaps (1–5) are closed.

Known limitation — doubly-nested HITL (ephemeral sub-agent calling a client-executed tool) is not supported. Single-level HITL works on every runtime: a top-level agent's client-executed (or approval-gated) tool suspends the run with 'suspended_client_tool' and resumes via submitToolResult. The doubly-nested case — where an ephemeral child sub-agent itself calls a execute: 'client' tool and is expected to suspend the parent with a routable sub-session id — is not a supported path. Observed behavior is that the parent run returns 'completed' rather than suspending; the child's client-tool dispatch does not propagate a suspension up through the sub-agent boundary. Agents that need inner-agent human approval should either (a) hoist the client-executed tool to the top-level agent, or (b) use a persistent sub-agent (which has its own stable session id and can be driven through follow-up turns) rather than an ephemeral one. The cross-runtime matrix scenario for this case (cross-runtime-subagent-resume-matrix-suspended.integ.test.ts, S6) is pinned as a negative-invariant test (FU-MATRIX-DOUBLY-NESTED-HITL): it asserts the parent returns 'completed' (not 'suspended_client_tool') and that exactly one subagent_start/subagent_end pair is emitted — so if a future change starts supporting nested HITL, the assertion fails and forces a conscious revisit. Tracked in GitLab #73 — currently documented-as-unsupported rather than scheduled for implementation.

Hook firing on the timeout path is consistent across all 4 runtimes. When a client-tool deadline elapses, every runtime appends a synthetic tool_error message, emits a tool_end chunk, records usage with success: false, and fires onMessage + afterTool hooks with the timeout payload. CFW Workflows is the reference implementation; runtime-js (and CF DO via runtime-js) and runtime-temporal were brought to parity in commit 799aeea77.

Lifecycle hook firing order is canonical across all 5 stateless-suspension runtimes (DBOS achieved full parity in FU-DBOS-ONMESSAGE-ONSTATECHANGE). For the regular (and approval-gated approve) tool execution path, every runtime fires user-facing AgentHooks in the canonical sequence:

beforeTool → execute → onStateChange → onMessage → afterTool

onStateChange reflects the immediate state mutation from execute; onMessage surfaces the result-as-message; afterTool is universal cleanup with the full result payload. Pre-2026-05-02 each of runtime-js, runtime-temporal, and CFW Workflows fired in a different order (runtime-js fired onMessage AFTER afterTool; runtime-temporal fired onMessage BEFORE onStateChange); sub-projects #2 + #3 unified the order so portable hook code can rely on cross-runtime sequencing. Per-runtime regression guards live in:

packages/runtime-js/src/__tests__/js-agent-executor-hooks.test.ts (regular path) and approve-path-hooks.test.ts (approve drain path)
packages/runtime-temporal/src/__tests__/v7-activities-hooks.test.ts (regular path) and v7-approve-path-hooks.test.ts (approve drain path)
packages/runtime-cloudflare/src/__tests__/approve-path-hooks-do.test.ts (DO approve drain path)
packages/e2e/src/__tests__/approval-gate-hook-parity.integ.test.ts (cross-backend)

Implementation note: all multi-tool runtimes (runtime-js, Cloudflare DO, CFW Workflows, runtime-dbos, runtime-temporal) execute a step's regular server tools in PARALLEL. Each tool seeds from the same committed / pre-step base, and the merge CONTRACT is identical everywhere — append ops compose (all survive regardless of arrival order), replace ops are last-write-wins. The MECHANISM by which writes are merged differs by runtime: runtime-js and Cloudflare DO (which inherits the JS executor) merge IN MEMORY between tools within a single runLoop step — no durable staging. The durable runtimes — runtime-temporal, CFW Workflows, and runtime-dbos — instead STAGE each tool's writes durably (stageChanges) and promote them atomically at the step boundary (saveStateAndPromoteStaging), because the orchestrator is a replay-deterministic sandbox that can't hold cross-activity in-memory state. The per-runtime mechanics for keeping hooks in canonical order under parallelism differ:

runtime-js dispatches phase-1 tools via Promise.all. To preserve LLM-input ordering of state.messages while maintaining the canonical hook order, runServerTool (in packages/runtime-js/src/run-loop.ts) defers afterTool firing back to the iterator's collection loop in packages/core/src/orchestration/step-iterator.ts via ExecuteServerToolResult.deferredAfterTool. The iterator pushes the message → fires onMessage → fires the deferred afterTool per result, in input order. Cloudflare DO inherits this path via the JS executor.
runtime-temporal dispatches one executeToolActivity per regular server tool via Promise.all from the workflow body; each activity STAGES its writes via stageChanges. Because the workflow sandbox can't run user hooks, a dedicated fireDeferredToolHooks activity fires onMessage / afterTool in index order after the Promise.all settles; promotion reuses the existing commitStep / commitSuspendedStep (saveStateAndPromoteStaging over the staged rows). CFW Workflows uses the same staged-then-promote shape with its own per-tool helpers. runtime-dbos runs each tool as an independent durable step and defers onMessage / afterTool to the workflow body (see the DBOS note below).

DBOS hook firing order — full parity (FU-DBOS-ONMESSAGE-ONSTATECHANGE closed). DBOS fires the canonical hook sequence (beforeTool → execute → onStateChange → onMessage → afterTool) with the same post-Promise.all timing as JS / Temporal / CF. The post-2026-05 implementation:

beforeTool + onStateChange fire inside runExecuteTool (the @DBOS.step body), so they are checkpointed by the step boundary.
onMessage + afterTool are deferred to the workflow body (shared.ts) so they land AFTER the phase-1 Promise.all settles AND after each tool-result message has been appended to the conversation log. ToolStepResult.durationMs is the deferred-afterTool signal — present on results from runExecuteTool (server-tool / approve-resume), undefined on client-tool / approval-deny results (their afterTool is fired inline by dispatchClientTool / not fired at all on deny — cross-runtime contract).

The previous "timing caveat" divergence note has been resolved by this work. Tracing pipelines that compare DBOS to JS / Temporal / CF observe identical sequencing now.

Hook firing parity table:

Runtime	beforeTool	onStateChange	onMessage	afterTool	Promise.all timing
runtime-js	✓	✓	✓	✓ deferred	After Promise.all
runtime-temporal	✓	✓	✓	✓ deferred	After Promise.all
CFW DO (via `runtime-js`)	✓	✓	✓	✓ deferred	After Promise.all
CFW Workflows	✓	✓	✓	✓ inline	After Promise.all
runtime-dbos	✓	✓	✓	✓ deferred	After Promise.all

onAgentSuspended / onAgentResumed:

Runtime	client_tool reason	awaiting_children reason	Notes
runtime-js	✓	✓	Reference impl
runtime-temporal	✓	✓	Matches reference
CFW DO	✓	✓	Matches reference
CFW Workflows	✓	✓	Matches reference
runtime-dbos	✓	✓ (post v7.0-final)	Earlier v7 versions skipped `awaiting_children` due to a guard bug; fixed per second-round review P1.1

Resume runId semantics (cross-runtime)

When a session is resumed, the onAgentResumed hook fires with { runId, previousRunId, sessionId, resumedFromCheckpointId }. The relationship between runId and previousRunId differs by runtime:

JS / Temporal / Cloudflare / DBOS Branch 2 (fresh resume workflow): runId !== previousRunId. Each resume() call allocates a new run record via beginRun and starts a fresh workflow (or run loop) under the new runId.
DBOS Branch 1 (recv-driven in-place continuation): runId === previousRunId. The cross-runtime client-tool submit path detects an existing PENDING workflow blocked on DBOS.recv, returns a ResumedHandle wrapping the SAME workflow, and the workflow body wakes inside the existing dispatchClientTool call. No new workflow, no new beginRun, no new runId. The self-loop is the explicit signal that this is a recv-driven auto-continue, not a fresh resume (fire site: packages/runtime-dbos/src/workflows/shared.ts:1579-1601, where previousRunId: runId is set).

previousRunId value contract (uniform across all four runtimes): regardless of whether runId is fresh (JS / Temporal / CF / DBOS Branch 2) or a self-loop (DBOS Branch 1), previousRunId is always populated with the SUSPENDED run's runId. This holds on every runtime:

JS — js-agent-executor.ts resume() sets previousRunId: previousRun?.runId (the suspended run, captured via getCurrentRun at the top of resume() before the new run is created). It must NOT read state.runId at the fire site — loadAgentState(sessionId, runId) overwrites that to the fresh resume runId. (resumeLoop already used the correct previousRun?.runId; the public resume() path was repaired in the greptile-P2 fix-forward.)
Temporal — executor.resume() captures the suspended runId via getCurrentRun BEFORE allocating the resumed runId, threads it through AgentWorkflowInput.previousRunId → applyResultsAndReload → the onAgentResumed fire site. (Before the greptile-P2 fix-forward it fired previousRunId: undefined — repaired so the contract holds.)
Cloudflare — the resume workflow sets previousRunId: existingState.runId (the suspended run, from loadAgentState → getCurrentRun). (Before the fix-forward it mistakenly passed existingState.checkpointId — a checkpointId, not a runId — repaired.)
DBOS — Branch 2 threads previousRunId through the resume workflow input; Branch 1's recv-wake self-loop sets previousRunId: runId (which equals the suspended runId because resume continues in-place).

Hook consumer contract: Use previousRunId (always populated with the suspended runId on resume) as the linkage signal for span stitching, audit linkage, and tracing parent identification. Do NOT rely on runId !== previousRunId for resume detection — that fails on DBOS Branch 1. tracing-langfuse does this correctly today: packages/tracing-langfuse/src/langfuse-hooks.ts:521-522 uses previousRunId only as metadata, and parent-span linkage uses tracingContext.lastActiveSpanId / rootSpanId, not runId equality.

The cross-runtime e2e parity test (packages/e2e/src/__tests__/lifecycle-hooks-parity.integ.test.ts Scenario 1) enforces both contracts: (1) an unconditional resumedCtx.previousRunId === suspendedCtx.runId assertion that holds on every runtime (the previousRunId value contract above), and (2) a runtime-conditional runId assertion — DBOS asserts runId === previousRunId, other runtimes assert runId !== previousRunId. The runId-conditional closes FU-DBOS-RESUME-RUNID-CLIENT-TOOL via Option D (test relaxation + documented divergence) per the FU's own closure criterion #4; the unconditional previousRunId assertion was added in the greptile-P2 fix-forward and is what surfaced (and is now backed by) the Temporal + CF previousRunId repairs.

Hook customState reconcile after preStep capture (commit 8654a2686) — Hooks that fire AFTER the iterator captures preStepCustomState (assistant onMessage, phase-1/phase-2 beforeTool/afterTool) call hookContext.updateState, which mutates nextState.customState in-place via Immer. They do NOT contribute to the staging-changes pipeline that commitStep promotes. After commitStep returns, the iterator reconciles by writing the freshest nextState.customState back to the store as a follow-up saveState IF it differs from the just-committed value. Hook-less steps still do exactly one durable write per step; steps that mutate via post-snapshot hooks do two (commit + reconcile). Best-effort: a follow-up save failure logs a warning but doesn't fail the step.

InMemoryStreamManager cursor rebase on cleanup (commit 65eaaf235) — InMemoryStreamManager tracks reader sequences as literal indices into the chunks array. cleanupToStep filters orphan-step chunks, shifting surviving chunks earlier. Active readers' currentSequence pointers are now rebased in cleanupToStep (via a Set<ReaderCursor> on the stream) so they observe chunks emitted just before cleanup (e.g. the run_interrupted boundary marker the runtime-js soft-interrupt path emits). Active createReader / createResumableReader consumers now correctly observe chunks that survived a cleanup — previously these were silently invisible to the reader.

D1 saveStateAndPromoteStaging atomicity (commit 7509872e3) — D1StateStore.saveStateAndPromoteStaging now correctly surfaces concurrent-CAS losses as StaleStateError (was sometimes D1StateError, breaking retry paths that gate on instanceof StaleStateError). UNIQUE-constraint violations on __agents_messages.(session_id, sequence) are caught and re-thrown as StaleStateError. The trailing DELETE on __agents_staging runs OUTSIDE the atomic batch — only after the version-pinned UPDATE confirms changes==1. Observable: rare network failures between successful main commit and staging DELETE leave a stale staging row that's harmless and self-healing (next stageChanges upserts the same key).

Driving the agent loop after submitToolResult: In v7, submitToolResult is a durable write only — it does NOT auto-resume the agent loop. After submission, consumers continue the loop in one of two ways:

Use the framework's chat plumbing — handleChatStream (server) and useChat + useResumeClientTools (React) drive the resume internally. This is the recommended path for typical web app deployments.
Call executor.resume(agent, sessionId) explicitly — returns a new handle observing the resumed run. Use this when calling the executor directly (custom server, integration tests). The pattern: await executor.execute(...) returns 'suspended_*' → await executor.submitToolResult(...) writes durable result → const newHandle = await executor.resume(...) drives the loop forward → await newHandle.result() resolves with 'completed' (or whatever terminal state the resumed run reaches).

This separation is by-design for v7 stateless purity — submission and resumption can happen in different processes, with no in-memory bridge between them.

Structured Logger events: client_tool.suspended, client_tool.submitted, client_tool.timeout, client_tool.aborted, client_tool.ownership_write_failed/ownership_clear_failed/ownership_retry, client_tool.validation_failed. Records CLIENT_TOOL_WAIT_MS_METRIC per call (now measured from durable suspension write to durable submission write); aggregate via __agents_usage (Postgres/D1 column source_type = 'client_tool_wait_ms', with per-tool breakdown via source_name). See ../guide/client-executed-tools.md for the full guide and ../upgrade-guides/v6-to-v7-stateless-suspension.md for v6→v7 migration steps including the operator runbook for force-failing stuck calls.

Approval-Gated Tools (v7) — First-class HITL primitive on defineTool: pass requireApproval: true (always require approval) or requireApproval: (input, ctx) => boolean (function form, evaluated per-call). When the gate matches, the runtime emits a tool_approval_request stream chunk with the parsed input and suspends with 'suspended_client_tool' (the same primitive carries both client-tool and approval flows; routing happens off the kind field of the submission, not the stream-chunk type). Resume by calling executor.submitToolResult({ kind: 'approval-response', toolCallId, approved, reason? }). On approved: true, the original execute() runs normally with the original input. On approved: false, the runtime emits tool_error ('Tool call was not approved by the user') and skips execute() entirely. The function form fails-closed: an exception inside the evaluator is treated as requireApproval = true (matches the Mastra precedent — fail safe by requiring approval rather than silently bypassing). requireApproval is mutually exclusive with execute: 'client' and finishWith: true; both combinations are rejected at defineTool time. All 4 v7-stateless runtimes support approval flows on the v7 stateless model: the runtime suspends durably, and submitToolResult({ kind: 'approval-response', ... }) triggers a fresh run via executor.resume() (or via the framework's chat plumbing). CFW Workflows now uses the same v7 stateless model — workflow exits on approval-gate match; resume drains the approve/deny submission via applyResultsAndReload. runtime-dbos also supports approval-gate suspension (GL #75), but via its own DBOS-native DBOS.recv primitive rather than the suspensionContext model: dispatchApprovalGatedTool suspends on a requireApproval match, approve runs execute() with the original input, deny emits a synthetic tool_error ('Tool call was not approved by the user') and skips execute(), and function-form requireApproval is evaluated inside a @DBOS.step so its boolean is checkpointed in workflow history (fail-closed semantics live inside the step; replay-deterministic suspend-vs-run decision even for non-pure predicates — GL #111 Batch C / gap 5). Per-call agent.hooks (and options.hooks / options.hookManager) fire on every workflow start via the process-local HookManagerRegistry indexed by DBOS.workflowID (GL #111 Batch D / gap 4 — cross-worker recovery falls back to the constructor-bound static, matching Temporal/CF replaceAgent split-brain semantics). All five GL #111 gaps (suspended.toolCallIds, exactly-once submit, client_tool_timeout tool_end, per-call hooks, checkpointed requireApproval) are closed.

Agent Server — The @helix-agents/agent-server package provides AgentServer for hosting agents over HTTP. Accepts any AgentExecutor implementation and exposes the following routes:

Executor routes (always wired): POST /start, POST /resume, GET /sse, GET /status, POST /interrupt, POST /abort, POST /submit-tool-result, GET /workspace.
Chat handler routes (wired when chatHandler is configured): POST /chat, GET /chat/{sessionId}/stream, POST /chat/{sessionId}/submit-tool-result, POST /chat/{sessionId}/interrupt, POST /chat/{sessionId}/abort. These layer on top of the executor and provide the canonical chat-style flow used by useChat + useResumeClientTools.

Transport adapters: createHttpAdapter() (generic), createExpressAdapter() (Express). Tracks active execution handles in memory for interrupt/abort — these only work on the same server instance that started execution (handles are lost on restart; sessions remain recoverable via resume). Fail-closed auth: the constructor throws if neither authenticate hook nor explicit allowUnauthenticated: true is configured.

v7 removed the v6 INTERRUPT_NOT_LOCAL 503 — interrupts are now durable writes (via stateStore.setInterruptFlag) picked up by the runLoop at the next checkpoint, regardless of which process holds the in-memory handle. HTTP clients no longer need to retry against the "owning" server.

Runtime — Executes the agent loop. JSAgentExecutor runs in-process, TemporalAgentExecutor uses Temporal workflows for durability, DBOSAgentExecutor uses DBOS Transact (Postgres-backed workflow replay) for durability and supports the same client-executed-tools surface as Temporal and Cloudflare.

Workspace and HITL runtime support — Two orthogonal capabilities, with overlapping (but not identical) runtime support:

Workspaces (agent.workspace): runs on JS runtime and Cloudflare Durable Object runtime (via @helix-agents/agent-server). Temporal, CF Workflows, and DBOS do not support workspaces (Temporal and CF Workflows fail-fast at run-start; DBOS silently passes workspaces: undefined).
HITL (client-executed tools, requireApproval): runs on all 5 runtimes — JS, Cloudflare Durable Object, Cloudflare Workflows, Temporal, and DBOS. The first 4 use the v7 stateless suspension model (durable-state-only suspension via SessionState.suspensionContext); DBOS uses its own DBOS-native DBOS.recv / DBOS.send primitives over Postgres-backed workflow replay (functionally equivalent but architecturally separate from the unified suspensionContext model).

Runtime	Workspaces	HITL	Notes
JS (`runtime-js`)	Full	Full (v7 stateless)	All providers; recommended for dev + non-DO production.
CF Durable Objects (via `agent-server`)	Full	Full (v7 stateless)	All providers; recommended for CF production.
Cloudflare Workflows (`runtime-cloudflare/src/workflow.ts`)	Fail-fast	Full (v7 stateless)	All providers; recommended for Cloudflare Workflows production. Workspaces remain unsupported (run-start fail-fast).
Temporal (`runtime-temporal`)	Fail-fast	Full (v7 stateless)	All providers; recommended for Temporal-backed production. Workspaces remain unsupported (run-start fail-fast).
DBOS (`runtime-dbos`)	Fail-fast	Full (DBOS-native)	Postgres-backed; recommended when consumers already use DBOS Transact. Workspaces remain unsupported (run-start fail-fast, same guard as Temporal / CF Workflows).

Per-turn run records (listRuns) — full parity. Every execute() and resume() turn writes exactly one run record on all runtimes, so listRuns(sessionId).runs.length grows by one per turn and the turn numbers are contiguous (1, 2, 3, …):

Runtime	Run record per `execute()`/`resume()` turn	Notes
JS	✓	Reference impl — `createRun` per turn (fresh + continuation).
Temporal	✓	Reached parity via FU-TEMPORAL-CONTINUATION-RUN-RECORD: the executor now hoists `createRun` out of the `if (!existingState)` guard so a continuation `execute()` records a turn.
CFW Workflows	✓	`createRun` per workflow instance (execute + `mode:'resume'`).
CF DO	✓	Via `runtime-js` run loop.
runtime-dbos	✓	One run record per workflow start (Branch 2 fresh resume); Branch 1 recv-wake continues the same run (see "Resume runId semantics" above).

The historical Temporal divergence (a continuation execute() on an existing/completed session skipped createRun, under-reporting turn count as 1 instead of N) is closed. See ./session-model.md §Session vs Run for the storage-model statement of this contract.

State Store — Persists session state (messages, custom state, checkpoints). Uses SessionStateStore interface with sessionId as the primary key. Implementations: InMemoryStateStore for dev, RedisStateStore for prod, PostgresStateStore for prod (works across all runtimes including Cloudflare Workers via Neon/Hyperdrive). All implementations guarantee atomic createSession() — concurrent calls for the same sessionId result in exactly one winner (others throw). This is the foundation for preventing duplicate execution across all runtimes.

Stream Manager — Handles real-time streaming of agent events. Implementations: InMemoryStreamManager (store-memory), RedisStreamManager (store-redis), DOStreamManager (runtime-cloudflare; binding-side manager that talks to the streaming Durable Object), and DurableObjectStreamManager + the StreamServer Durable Object (store-cloudflare; the DO-resident SQLite-backed implementation). The Cloudflare DO managers are the push transports that broadcast the truncated wire event for G4 (truncation surfacing).

LLM Adapter — Abstracts the LLM provider. VercelAIAdapter wraps the Vercel AI SDK. MockLLMAdapter for testing.

Checkpoint — Complete state snapshot saved after each step. Enables crash recovery, time-travel, and branching. Checkpoints are scoped to a session.

Lock Manager — Distributed coordination interface. Prevents concurrent execution of the same agent across processes. Implementations: NoOpLockManager, InMemoryLockManager, RedisLockManager, PostgresLockManager, DurableObjectLockManager.

Logger — All SDK components accept an optional Logger interface (info, warn, error, debug? methods) defined in core/src/types/logger.ts. Defaults to noopLogger (silent). Use consoleLogger for development. Compatible with pino, winston, and other structured logging libraries. Configured via constructor options on executors, state stores, adapters, and tracing hooks. Zero bare console.* calls exist in production source files — all logging goes through Logger.

Tracing — @helix-agents/tracing-langfuse is the supported tracing adapter. As of v7, it seeds the Langfuse trace ID from sessionId (not runId) so that a single conversational session — which spans many runs once HITL boundaries are involved — appears as a single trace in the Langfuse UI. New onAgentResumed and onAgentSuspended hook handlers emit matching event spans inside the session-scoped trace, so you can visually see where the run paused and where it resumed. The legacy core/tracing/tracing-hooks.ts adapter is HITL-incompatible: it relies on an in-memory tracingStateMap that the stateless-suspension model cannot populate across process restarts, and v7 fail-fasts when requireApproval or client-executed tools are run with the legacy adapter. Use @helix-agents/tracing-langfuse (or implement the v7 hook interface in your own adapter) before upgrading.

Embedding Executor — Controls how vector embeddings are computed after a memory is saved. InlineEmbeddingExecutor (default) computes synchronously. BackgroundEmbeddingExecutor fires-and-forgets with maxConcurrency limit and shutdown() drain. Memories are saved with embeddingStatus: 'pending' and immediately FTS-searchable; the executor updates them to 'complete' once the embedding is computed. MemoryManager.processUnembeddedMemories() recovers orphaned pending memories (e.g., after embedding service failures). Configured via embeddingExecutor on MemoryConfig.

Skills (Progressive Disclosure)

Skills give an agent a library of specialized capabilities (workflows, runbooks, reference protocols) without paying for all of them on every request. They implement Anthropic's Agent Skills pattern — 3-level progressive disclosure — on Helix's existing append-only / cacheable substrate. Configured via AgentConfig.skills (see the design spec).

The 3-level model

Level 1 — catalog (always resident). Every skill's name + description (~tens of tokens each) is rendered into a ## Skills section appended to the system prompt. The catalog is deterministically sorted by name so it is byte-stable within a session — it lives in the cached prefix and never invalidates it. The fragment is produced by generateSkillsSystemPromptFragment(metadata[]); an empty skill set produces '' (total no-op). The catalog carries a load-bearing guardrail ("A Skill is NOT a tool — call load_skill"), because models otherwise try to invoke skill names as if they were tools.
Level 2 — body (loaded on demand). The full skill body is loaded by the auto-injected load_skill tool, which returns the body as the tool result (immediate — the model acts on it in the same continuation, no wasted round-trip). Because tool results only ever append to history, this is append-only and cache-safe. load_skill also emits an informational skill_loaded custom stream event ({ name }, surfaced as a data-skill_loaded AI-SDK event) that consumers MAY render but are not required to handle.
Level 3 — resource files (read on demand). Bundled reference/script/asset files are read by the auto-injected read_skill_file tool, which supports optional startLine/endLine ranges and a path-traversal guard. The skills feature discloses content; it does not execute code — Level-3 "scripts" are readable, but running them is delegated to the agent's own shell/workspace tools. This keeps the feature decoupled from any execution environment.

`SkillProvider` and the two providers

Skills resolve to plain data behind a small async interface:

interface SkillProvider {
  listSkills(): Promise<SkillMetadata[]>; // Level 1
  getSkill(name: string): Promise<Skill | null>; // Level 2
  readResource(name: string, path: string, range?: ReadResourceRange): Promise<string | null>; // Level 3
}

Two providers ship in v1:

inCodeSkillProvider (in @helix-agents/core) — the "plain data" mode: skills are TypeScript data bundled with the agent. Dependency-free and Workers-safe (no node:fs). Use this on Cloudflare Workers.
fileSystemSkillProvider (in @helix-agents/skill-fs) — reads Anthropic-format SKILL.md directories (<root>/<skill-name>/SKILL.md with YAML frontmatter + markdown body, optional references//scripts//assets/). Node only (node:fs/promises + yaml); not usable on Cloudflare Workers.

A third delivery path, the build-time bake (@helix-agents/skill-cli), is not a runtime provider: it resolves remote skill packages at BUILD time and feeds the result back through inCodeSkillProvider (see below).

Remote skill packages (build-time bake)

@helix-agents/skill-cli lets you author skills as remote packages — a git repo, or a Claude plugin marketplace — yet still ship them through the Workers-safe in-code provider. Its helix-skills sync command resolves each manifest source, pins it at a version (tag/ref/sha), and bakes the selected skills into a generated SkillDefinition[] TypeScript module plus a committed lockfile (one sha256 integrity hash per manifest entry; --check fails CI on drift). You import { skills } from the generated module and pass it to defineAgent({ skills }), which routes through inCodeSkillProvider.

Because resolution happens entirely at BUILD time and the output is plain in-code data, the baked skills inherit the in-code provider's full cross-runtime support — including Cloudflare Workers — with zero runtime fetch and no node:fs. The CLI itself is Node-only and is never imported by the agent at runtime; only its generated output is. The SKILL.md parsing it shares with the filesystem provider lives in core as parseSkillFile (pure, Workers-safe). See the Skills guide → Loading remote skill packages and the @helix-agents/skill-cli reference.

Configuration

Set AgentConfig.skills to either a SkillProvider or an array of in-code SkillDefinitions (sugar for inCodeSkillProvider). When present, the framework appends the catalog to the system prompt and auto-injects load_skill + read_skill_file into the tool list (via shared buildEffectiveTools). The two tool names are reserved (RESERVED_TOOL_NAMES) — user tools cannot shadow them, and skill names are [a-z0-9-] (no underscores) so they can never collide with the tool names either. Sub-agents do NOT inherit a parent's skills — a sub-agent uses skills only if its own config declares them (matching how sub-agents already scope their own tools/state).

Minimal in-code example:

const agent = defineAgent({
  name: 'assistant',
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: { model },
  skills: [
    {
      name: 'pdf-processing',
      description:
        'Extract text and tables from PDFs, fill forms, merge documents. Use when working with PDF files.',
      body: '# PDF processing\n…full instructions…',
    },
  ],
});

Filesystem example (one line):

import { fileSystemSkillProvider } from '@helix-agents/skill-fs';
// …
skills: fileSystemSkillProvider({ roots: ['./skills'] }),

Preloaded skills

AgentConfig.preloadSkills?: string[] injects the named skills' full bodies into the system prompt on every step — they are always in context, with no load_skill call needed. The bodies render as an ### Active Skills block inside the same deterministically-sorted, cache-stable fragment as the catalog (so the preloaded set is fixed per session and the prefix stays byte-stable). Preloaded skills also appear in the loadable <available_skills> catalog marked loaded="true" — a static marker (cache-safe, decided at config time) that tells the model not to reload them, while load_skill remains a recovery path (e.g. if history compaction later drops the system-prompt-injected body, load_skill can re-fetch it).

Each name must resolve in the agent's provider; unknown names warn-and-skip at resolution time (one bad name never crashes the agent or breaks the rest). Sub-agents do NOT inherit a parent's preloadSkills — same scoping as skills and tools. There is no token budget on preloaded bodies (future work, same as the catalog).

defineAgent({
  /* … */ skills: [
    /* … */
  ],
  preloadSkills: ['deploy-runbook'],
});

Cache-safety contract

Skill loading is purely additive by construction:

The Level-1 catalog is a stable, sorted system-prompt fragment that is NEVER annotated with per-skill loaded-state (no "✓ loaded" marks) — annotating it would make the cached prefix volatile and bust the cache on every load. "Already loaded" handling lives entirely in the load_skill result, never in the catalog.
Level-2 bodies and Level-3 resources arrive append-only as tool results, covered by anthropicCache's breakpoints (the system anchor + the latest turn's tool-result batch). The skills feature required no changes to anthropicCache / applyCacheStrategies / the Vercel adapter.

The only inherited caveat (identical to memory injection): on the turn a body lands, the latest-turn breakpoint sits on/after it, so that one breakpoint won't cache-hit across that turn boundary; the system anchor still does.

Cross-runtime support matrix

All logic lives in shared core (buildEffectiveTools, buildMessagesForLLM, the skill-injection helpers), plus a one-line per-run catalog-resolution hook at each runtime's buildMessagesForLLM call site (the same place memory retrieval is resolved — where IO / non-determinism is allowed). The tools therefore work on all runtimes; the catalog string is threaded on JS / Temporal / Cloudflare / DBOS.

Runtime	In-code provider	Filesystem provider
JS	✅	✅
Temporal	✅	✅ (resolve catalog + tool reads in activities)
DBOS	✅	⚠️ (catalog resolved in the workflow body — see note)
Cloudflare DO / Workflows	✅	❌ (no `node:fs`; use in-code or a future workspace/D1 provider)

fileSystemSkillProvider works wherever node:fs exists — i.e. NOT Cloudflare Workers; use the in-code provider there. On Temporal the provider's filesystem IO runs inside the per-step activity (where IO is allowed), never in workflow code. On DBOS, however, the catalog is resolved in @DBOS.workflow-body code (not yet wrapped in a @DBOS.step), so a fileSystemSkillProvider would run replay-sensitive node:fs IO in workflow code — prefer the in-code provider or build-time-baked skills on DBOS (the @helix-agents/skill-cli bake path makes baking ergonomic). In-code catalogs are deterministic data and need no special handling. (Tracked as FU-SKILL-DBOS-CATALOG-IN-WORKFLOW-BODY — wrap the DBOS resolution in a checkpointed step to mirror Temporal.)

Known limitations (v1)

Re-loading a skill returns the body again. ToolContext exposes no transcript access, so the load_skill tool cannot dedup on its own — v1 returns the body on every call (correct + cache-safe; rarely triggered because the model sees its own prior load_skill results). collectLoadedSkillNames(messages) ships as the building block for programmatic dedup, but the dispatch-layer short-circuit is deferred.
fileSystemSkillProvider traversal guard is lexical. It rejects resolved paths that escape the skill dir but does NOT follow symlinks — safe for operator-provisioned skill directories.
fs staleness re-scan detects root-entry add/remove, not in-place edits. Editing an existing skill's files in place won't be picked up until restart or a touch of the root.
DBOS resolves an async provider in the workflow body. Fine for the in-code provider (deterministic data); a provider that fetches at RUNTIME would move catalog resolution into a @DBOS.step(). A runtime-fetching remote provider is still deferred (see the design spec future work). Note that build-time remote loading already shipped via @helix-agents/skill-cli, which sidesteps this entirely by baking remote packages down to in-code data (deterministic, no per-step IO).

Helix Agents Framework Concepts ​

Persistent-companion continuation + the __finish__ heal (cross-runtime) ​

Resume runId semantics (cross-runtime) ​

Skills (Progressive Disclosure) ​

The 3-level model ​

SkillProvider and the two providers ​

Remote skill packages (build-time bake) ​

Configuration ​

Preloaded skills ​

Cache-safety contract ​

Cross-runtime support matrix ​

Known limitations (v1) ​