Client-Executed Tools

Guide for declaring tools that execute in the browser (or any non-runtime party), and receiving their results back into the agent loop.

v7 model: stateless suspension

Major change in v7

The client-tool model was redesigned in v7. The legacy in-memory wait (promise + setTimeout) is gone; the runtime now suspends durably and resumes only when an explicit executor.resume() reads SessionState.suspensionContext. If you are upgrading from v6, read the v6 to v7 migration guide end-to-end before deploying.

When the LLM emits a client-tool call, the runtime:

Validates the input against inputSchema and emits a tool_start stream chunk.
Persists a pending entry in SessionState.pendingClientToolCalls (keyed by toolCallId) and writes the suspension context to the state store atomically with the step's other state changes (saveStateAndPromoteStaging).
Returns RunOutcome.suspended_client_tool from the run loop. The in-flight JavaScript loop dies; nothing waits in memory.
handle.result() resolves with status: 'suspended_client_tool'. The chat handler closes the SSE stream cleanly.

Resumption is explicit:

The client submits via submitToolResult({ kind: 'client-tool-result', ... }). The submission writes the result into pendingClientToolCalls[toolCallId].response.
The chat handler (or your code) calls executor.resume({ sessionId }).
The new run reads the suspension context, processes the result as a tool-result message, and continues the loop.

Per design §10.7, this is the same machinery used by approval gates — both primitives share pendingClientToolCalls + suspensionContext, discriminated by the submission's kind field.

Security

The POST /submit-tool-result endpoint accepts a sessionId + toolCallId and submits a tool result into the LLM's context. It MUST be gated by authentication — anyone who can reach the endpoint can inject arbitrary data into an agent's reasoning.

This also applies to the other endpoints exposed by AgentServer (/start, /resume, /sse, /status, /interrupt, /abort) — none of them are safe to expose publicly without auth.

Recommended patterns

Session-bound tokens: issue a short-lived HMAC or JWT when starting a session. The token encodes the authorized sessionId; on each submit, reject any request whose body's sessionId doesn't match the token.
Server-issued session IDs: don't let clients choose sessionId values. Generate them server-side, keyed to authenticated user identity.
Reverse proxy: deploy behind a proxy (Cloudflare Access, Envoy, nginx + OAuth2 proxy) that enforces your auth before requests reach the agent server.

Using the `authenticate` hook

AgentServer accepts an authenticate hook. The hook is invoked before every endpoint handler runs. Return true to accept, false for a generic 401, or { error, status? } for a custom rejection (e.g., 403 for "authenticated but not authorized for this session").

import { AgentServer } from '@helix-agents/agent-server';

const server = new AgentServer({
  executor,
  agents: { /* ... */ },
  stateStore,
  streamManager,
  authenticate: async (req, operation) => {
    const token = String(req.headers.authorization ?? '').replace(
      /^Bearer\s+/,
      ''
    );
    if (!token) return { error: 'missing_auth', status: 401 };

    const session = await verifyToken(token); // your impl
    if (!session) return { error: 'invalid_token', status: 401 };

    // Session-bound check: the body's sessionId must match the token's.
    if (operation === 'submit-tool-result') {
      const body = req.body as { sessionId?: string };
      if (body?.sessionId !== session.sessionId) {
        return { error: 'session_mismatch', status: 403 };
      }
    }

    return true;
  },
});

When the hook is not configured, the server logs a WARN at startup to make misconfigurations visible. Do not rely on the warning alone — gate the endpoints explicitly.

Rate limiting

The framework does not ship rate limiting for /submit-tool-result. Even with authentication, a bad actor (or a buggy client) can flood the endpoint. Gate it upstream:

Node / Express / Fastify: use express-rate-limit or fastify-rate-limit with a per-session key (derived from the authenticate hook or from the request's sessionId body field). Recommended baseline: 60 submits/min per session, matching ordinary client-tool pacing.
Cloudflare Workers: Cloudflare Rules Rate Limiting at the Worker level, OR a sub-request to a dedicated rate-limit Durable Object keyed on sessionId. Under CF DO, the per-session fetch path naturally serializes (each sessionId → single DO instance → single alarm/request queue), which provides a coarse rate limit for free but doesn't bound total QPS across shared infrastructure.
Body size: the framework's Content-Length gate (413 Payload Too Large) defends against full-body allocation DoS, but transport-level body limits (e.g. express.json({ limit: '2mb' })) still provide defense-in-depth and cheaper rejection before the framework runs.

Service identification in logs

If you run multiple AgentServer instances in a fleet (per-service, per- region, etc.), set AgentServerConfig.serviceName to disambiguate log messages. The unauthenticated-mode warning (and future server-level logs) include the service name so on-call can triage which deployment is misconfigured without grepping hostname-per-process.

When to use

Some tool calls can't or shouldn't run on the server:

Browser-only capabilities — manipulating live editors, the DOM, indexedDB, wallet signing, etc.
Client-side state — tools that need access to the user's local data.
Human-in-the-loop (future) — approval dialogs, parameter editing, manual result entry. Today, client-executed tools provide the primitive that future HITL patterns will build on.

Declaring a client tool

Use execute: 'client' on defineTool:

typescript

import { defineTool } from '@helix-agents/core';
import { z } from 'zod/v4';

export const editContentTool = defineTool({
  name: 'editContent',
  description: 'Apply structured edits to the current document.',
  inputSchema: z.object({
    edits: z.array(
      z.object({
        selector: z.string(),
        replacement: z.string(),
      })
    ),
  }),
  outputSchema: z.object({
    applied: z.number().int().min(0),
    failed: z.number().int().min(0),
    newVersionId: z.string().optional(),
  }),
  execute: 'client',
});

Mixing execute: 'client' with finishWith: true is rejected at defineTool time.

Wire the client (v7)

v7 deletes the bespoke HelixChatTransport and uses AI SDK v6's native DefaultChatTransport plus two helpers from @helix-agents/ai-sdk:

prepareHelixChatRequest({ api, resumeFromSequence, existingMessageId }) — the request builder you pass to DefaultChatTransport's prepareSendMessagesRequest hook. Drives stream-close-and-reopen resume.
useResumeClientTools({ chat, toolHandlers }) — the React hook that watches for tool-{name} parts in state: 'input-available', invokes your handler, and posts the result via chat.addToolOutput.

tsx

'use client';

import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import {
  prepareHelixChatRequest,
  useResumeClientTools,
} from '@helix-agents/ai-sdk/react';

const transport = new DefaultChatTransport({
  api: '/api/chat',
  prepareSendMessagesRequest: prepareHelixChatRequest({
    api: '/api/chat',
    resumeFromSequence: snapshot?.streamSequence,
    existingMessageId: snapshot?.existingMessageId,
  }),
});

const chat = useChat({ transport });

useResumeClientTools({
  chat,
  toolHandlers: {
    editContent: async (input, { toolCallId, abortSignal }) => {
      // Server validated `input` against the tool's inputSchema; safe to use.
      return await runEditsOnClient(input, { signal: abortSignal });
    },
  },
  onError: (err, { toolName, toolCallId }) => {
    console.warn('client tool failed', toolName, toolCallId, err);
  },
});

The hook replaces ~280 LOC of manual dispatcher scaffolding (the processedToolCallsRef + seededForSessionRef + 500ms forceUpdate pattern v6 customers maintained). Failures are surfaced to AI SDK automatically via chat.addToolOutput({ state: 'output-error', ... }), so the LLM sees a normal failed tool result on the next step.

Approval-gated tools (requireApproval) emit tool-approval-request parts that useResumeClientTools intentionally skips — those route through chat.addToolApprovalResponse(...). See the approval gates guide for the approval UX.

See the canonical reference at examples/client-tools/.

Direct API (without Vercel AI SDK)

typescript

import { JSAgentExecutor } from '@helix-agents/runtime-js';

const executor = new JSAgentExecutor(stateStore, streamManager, llmAdapter, {
  logger: myLogger, // optional
  agentRegistry: { [myAgent.name]: myAgent }, // enables output-schema validation on submit
});
const handle = await executor.execute(agent, { message: 'edit it' }, { sessionId });

// Wait for the run to suspend at the client-tool boundary.
const result = await handle.result();
if (result.status === 'suspended_client_tool') {
  // Submit the client-side result (kind discriminator was added in v7).
  await executor.submitToolResult({
    kind: 'client-tool-result',
    sessionId: handle.sessionId,
    toolCallId: result.suspended.toolCallIds[0],
    result: { applied: 3, failed: 0 },
  });

  // Explicitly resume — submission no longer wakes an in-memory waiter.
  const resumed = await executor.resume({ sessionId: handle.sessionId });
  await resumed.result();
}

The kind discriminator selects the variant. For client tool results, kind: 'client-tool-result' is optional (it's the default — v6 callers that submit { sessionId, toolCallId, result } without kind still parse correctly). For approval-gate decisions, kind: 'approval-response' is required along with the approvalId from the tool_approval_request chunk.

Wire agentRegistry so the framework validates client-submitted results against each tool's outputSchema. The registry accepts a Record<agentType, AgentConfig>, a Map, or a (agentType) => AgentConfig | undefined callback for dynamic resolution. Without it, submit-side validation is skipped — clients can submit arbitrary shapes and the LLM may receive malformed data.

Routing invariant

Consumers never submit against a sub-agent's sessionId. Always use the root sessionId. The framework maintains SessionState.clientToolCallOwnership on root sessions and routes the submission to the owning sub-agent transparently.

If your agent tree is:

Root session parent-abc
Sub-agent session child-xyz (with rootSessionId: 'parent-abc')

...and the sub-agent's LLM emits a client tool call tc-42, the client still submits with:

typescript

executor.submitToolResult({
  kind: 'client-tool-result',
  sessionId: 'parent-abc', // ALWAYS the root
  toolCallId: 'tc-42',
  result: ...,
});

The framework resolves clientToolCallOwnership['tc-42'] → 'child-xyz' and writes the response into the owning sub-agent's pendingClientToolCalls durable map. The next executor.resume({ sessionId: 'parent-abc' }) cascades down to the suspended child via the same ownership table. (For runtime-dbos, which uses workflow-native messaging, the dispatch is DBOS.send to the owner's workflow ID resolved from clientToolCallOwnership — see the runtime-dbos client tools doc.)

Per-runtime durability (v7)

Property	runtime-js	runtime-cloudflare (DO)	runtime-temporal	runtime-cloudflare (Workflow)	runtime-dbos
HITL support in v7.0	Full	Full	Full	Full	Full (DBOS-native)
Durable across restart	Yes (state store)	Yes (DO SQLite + state store)	Yes (state store; workflow exits on HITL)	Yes (workflow replay + state store)	Yes (workflow replay)
Max practical wait	State-store TTL	State-store TTL	State-store TTL (workflow exits on HITL)	State-store TTL (workflow exits on pause)	Days+
Submit routing	`submitToolResult` + `resume`	`submitToolResult` + `resume`	`submitToolResult` (durable) + `resume` (new instance)	`submitToolResult` + `resume` (new instance)	`DBOS.send` (idempotent)
Hibernation guard	n/a	Removed in v7 (DOs free to evict)	n/a (workflow exits on HITL)	n/a (workflow exits on pause)	n/a

In v7, the in-memory wait is gone on every runtime. runtime-js is durable across process restarts (state lives in the state store, not in the JS heap). The Cloudflare DO hibernation guard that v6 used to keep DOs awake during pauses is removed; alarm-driven deadline enforcement happens in findExpiredPending at request time.

Temporal uses the v7 stateless model: the workflow exits at HITL boundaries with status: 'suspended_*' and durable suspension state via commitSuspendedStep. submitToolResult is a durable write only — no Temporal signal is sent (the workflow has already exited). executor.resume() starts a NEW workflow instance with workflow ID suffix __resume-${N}; the new workflow's mode='resume' branch calls applyResultsAndReload to drain submitted results into the message log, fire onMessage + afterTool hooks, and proceed.

Cloudflare Workflows uses the v7 stateless model: the workflow exits at HITL boundaries with status: 'suspended_*' and durable suspension state via commitSuspendedStep. executor.resume() starts a new workflow instance with mode: 'resume' that drains queued submissions via applyResultsAndReload and continues. This eliminates billable wall-time during HITL waits (~80% reduction on multi-minute approvals).

DBOS specifics

runtime-dbos uses DBOS Transact's native messaging primitives — DBOS.recv for the suspension and DBOS.send for the wake-up — backed by the dbos.workflow_events table in Postgres. There is no separate timer / alarm subsystem: the recv deadline is a workflow-body argument, and the platform handles both the suspend and the timeout. runtime-dbos is the only v7.0 HITL-capable runtime that does not participate in the unified suspensionContext model — it suspends inside the workflow body, not at request boundaries.

Recv re-arms naturally on replay. When the DBOS process restarts mid-wait, the workflow body re-runs from the top; deterministic @DBOS.step calls return cached results, and the await DBOS.recv(toolCallId, deadlineSec) re-arms exactly where it was. Buffered messages from DBOS.send are consumed by the re-armed recv. No runtime_restarted error path is needed.
Submit idempotency is platform-level. DBOSAgentExecutor.submitToolResult passes the toolCallId as the idempotencyKey argument to DBOS.send, so duplicate submits are deduplicated by DBOS itself. The framework's pendingClientToolCalls[toolCallId].submittedAt check is defense-in-depth.
Persistent mode shares the workflow. Persistent agents use DBOS.recv('inbox') to drive turns and DBOS.recv(toolCallId) to wait for client-tool results. Both run in the same long-lived workflow on different topics — they don't interfere, and idle-TTL hibernation works on either.
Sub-agent ownership routing dispatches to a workflow ID. routeSubmitToolResult reads clientToolCallOwnership[toolCallId] from the root session, resolves the owner's workflow ID via resolveOwnerWorkflowId (deterministic for standard mode; from state.metadata.dbosWorkflowId for persistent mode), and calls DBOS.send to that workflow.
Interrupt during a wait. interrupt() cancels the workflow; the recv rejects. Cleanup runs in lifecycle/interrupt.ts (clearPendingClientToolsOnInterrupt, called immediately after DBOS.cancelWorkflow), NOT from the workflow body's catch path — once DBOS.cancelWorkflow has fired, every @DBOS.step() call from inside the workflow throws via DBOS's checkIfCanceled rejection, so the workflow body cannot reach clearPendingClientToolCallStep. The cleanup talks directly to the state store (loadState/saveState) and clears both pendingClientToolCalls (on owner) and clientToolCallOwnership (on root). Late submits after cancellation return unknown_tool_call, not already_completed. See packages/runtime-dbos/docs/client-tools.md#limitations for the rationale.

For the full set of races and recovery paths, see packages/runtime-dbos/docs/race-conditions.md and the DBOS-specific guide at packages/runtime-dbos/docs/client-tools.md.

Timeouts

Every suspension has a deadline. Configure per-tool:

typescript

defineTool({
  name: 'editContent',
  // ...
  execute: 'client',
  clientToolTimeoutMs: 60_000, // 60 seconds
});

Fallback chain: per-tool → AgentConfig.clientToolTimeoutMs → DEFAULT_CLIENT_TOOL_TIMEOUT_MS (5 minutes).

On timeout, the runtime emits a tool_end chunk with error: 'client_tool_timeout'. The LLM sees a normal failed tool result and can react (retry, give up, report to the user).

HTTP contract

In v7, the canonical chat-handler route is POST /chat/{id}/submit-tool-result (registered by AgentServer's chat handler middleware). The legacy executor-level POST /submit-tool-result route still works but does not emit chat- specific resume cookies.

Content-Type: application/json

{
  "kind": "client-tool-result",
  "sessionId": "root-session-id",
  "toolCallId": "tc_123",
  "result": {...}
}

Or for errors:

{
  "kind": "client-tool-result",
  "sessionId": "root-session-id",
  "toolCallId": "tc_123",
  "error": "user_cancelled"
}

For approval-gate responses:

{
  "kind": "approval-response",
  "sessionId": "root-session-id",
  "toolCallId": "tc_123",
  "approvalId": "<runId>::tc_123",
  "approved": true
}

Responses:

200 { "status": "accepted" } — submission recorded.
200 { "status": "already_completed" } — tool already had a result (idempotent).
404 { "status": "unknown_tool_call" } — no pending call matches.
400 { "error": "...", "code": "INVALID_REQUEST", "details": [...] } — payload failed the envelope schema (missing sessionId / toolCallId, bad JSON, etc).
400 { "error": "...", "code": "INVALID_RESULT", "toolName": "...", "toolCallId": "...", "issues": "..." } — the submitted result didn't match the owning tool's outputSchema. The error message includes the Zod failure detail so the client can surface it to the user or retry. Error-submissions (error set, result undefined) skip this validation — they're always accepted regardless of schema.
411 { "error": "length_required", "code": "LENGTH_REQUIRED" } — request had no Content-Length header and a non-empty Transfer-Encoding header (typically chunked). Consumers MUST declare Content-Length on submits so the framework's body-size gate can enforce the configured cap before parsing. See the Security section above for rate-limit / body-size guidance.
413 { "error": "payload_too_large", "code": "PAYLOAD_TOO_LARGE" } — declared Content-Length exceeded 4× the configured maxResultBytes (default 1 MB × 4 = 4 MB). Raise the limit via createSubmitToolResultSchema({ maxResultBytes: ... }) and pass the resulting schema as AgentServerConfig.submitToolResultSchema. Typed client-side error: HelixPayloadTooLargeError from @helix-agents/ai-sdk.

Programmatic error classification

Consumers handling tool-result errors should prefer the exported type guards over string-matching on errorDetail.code:

import { isKnownErrorCode, isClientToolErrorCode } from '@helix-agents/core';

if (!result.success && result.errorDetail?.code) {
  if (isClientToolErrorCode(result.errorDetail.code)) {
    // 'client_tool_timeout' | 'aborted' | 'runtime_restarted' | 'disposed' | 'validation_failed'
    handleClientToolFailure(result);
  } else if (isKnownErrorCode(result.errorDetail.code)) {
    // Framework error code (provider_*, tool_*, state_*, etc.)
    handleFrameworkError(result);
  } else {
    // Extension code — custom integration or future addition
    handleCustomError(result);
  }
}

The guards narrow result.errorDetail.code: ErrorCode | (string & {}) to the specific unions, so downstream switch statements get compile-time exhaustiveness checks. KNOWN_ERROR_CODES is derived from the ErrorCode union via a compile-time completeness assertion — new codes added to the framework automatically land in the Set.

Durability of `already_completed`

After this release, already_completed is a durable signal, not best-effort. The framework persists a completedClientToolCalls marker on the root session whenever ownership is cleared (i.e., when a client tool completes). Duplicate submits within AgentConfig.completedTombstoneRetentionMs (default 24h) return already_completed regardless of:

The runtime restarting (Node process crash, deploy)
The Cloudflare Durable Object being evicted
The Temporal worker being replaced
The in-memory completed cache (10-min TTL) expiring

Past the retention horizon, duplicates surface as unknown_tool_call (HTTP 404). Both responses should be treated identically by clients — both indicate "the framework does not need this submission; treat as no-op." The framework's canonical helix-chat-transport.ts already gates on !res.ok && res.status !== 404 and applies this no-op pattern; consumers writing custom transports should mirror it.

Configuring retention

const myAgent = defineAgent({
  // ...
  completedTombstoneRetentionMs: 60 * 60_000, // 1 hour
  // Defaults to 24h. Generous for any reasonable browser-retry /
  // service-worker-retry / deploy-rollover window. Lower it if you
  // have unusual storage or forensic constraints.
});

Storage cost

Markers are persisted in the root session's completedClientToolCalls JSON map (Postgres JSONB / D1 + DO SQLite TEXT / Redis hash field). Each marker entry is a {toolCallId, completedAt} pair, ~50-100 bytes serialized. Markers are reaped opportunistically on the next clearOwnership (the next time a client tool completes on the same session). Steady-state size is bounded by retention × concurrent-call-rate.

Migration

Schema migrations run automatically on first startup of the new runtime. They are additive (new column with default NULL); old code reading the new column ignores it, and old rows return completedClientToolCalls: undefined, which falls through to today's behavior. No data backfill required. Rolling deploys are safe in either direction.

Postgres: migration v5 adds completed_client_tool_calls JSONB.
Cloudflare D1: migration V8 adds completed_client_tool_calls TEXT.
Cloudflare DO SQLite: migration v4 adds completed_client_tool_calls TEXT.
Redis: new hash field completedClientToolCalls. No migration needed — Redis hash fields are dynamic.
In-memory: passthrough, no schema concept.

Known limitations

Remote sub-agents cannot emit client tool calls in v1 — the submit path has no route to the remote server's pending state. Future work.
runtime-js durability is best-effort; orphan calls resolve with runtime_restarted on resume after a process restart. For durable client-tool waits, use runtime-temporal, runtime-dbos, or runtime-cloudflare.
Mid-execute suspend (ctx.suspend()) is not part of the public API in v1. Tools may opt into client execution only via execute: 'client' (the "whole body is a suspend" shape). The internal plumbing is designed to permit ctx.suspend() as a future addition without breaking changes.
Error message localization: humanizeClientToolError() emits English text that's delivered verbatim to the LLM. Consumers running agents in non-English contexts will see English error messages in transcripts. There is no injection point for custom humanizers today — if your deployment needs localized error text, intercept the tool-result message in an afterTool / onMessage hook and rewrite the error field before it reaches the next LLM turn. Full humanizer injection is planned for a later release.

Operating in Production

Runbook for diagnosing and remediating client-executed tool issues in production. All three runtimes emit the same structured Logger events, so the diagnostic flow is runtime-agnostic unless noted.

Monitoring

The framework emits structured Logger events (info / warn / error via the configured Logger) and OpenTelemetry spans via tracing-langfuse. Wire a Logger (pino, winston, or any compatible structured logger) into every JSAgentExecutor, TemporalAgentExecutor, DBOSAgentExecutor, Cloudflare executor, and AgentServer so the events below land in your log aggregator.

Log-level configuration. client_tool.suspended and client_tool.submitted are emitted at debug level (they fire on every client-tool call — 2× per call — and would flood at high volume under default info-level loggers). Timeout / abort / ownership-retry events remain at warn / error. To debug a specific "why is this tool stuck?" scenario, raise the logger's level to debug:

// pino
const logger = pino({ level: process.env.HELIX_LOG_LEVEL ?? 'info' });
// Then deploy with HELIX_LOG_LEVEL=debug for a narrow window.

If your deployment can't restart-to-debug, consider adding a wrapper logger that demotes info → debug only for client_tool.* events while leaving the rest at info. Full per-event log-level control is planned for a later release.

Multi-service deployments. Set AgentServerConfig.serviceName to a stable identifier (e.g. "api-v2", "worker-pool-west"). AgentServer wraps the logger via withContext({ service: serviceName }) so every live telemetry event carries the tag. Fleet-wide aggregators can then filter on service to triage which deployment is emitting timeouts.

Key log events to watch:

client_tool.suspended (debug) — tool call suspended awaiting submit; payload includes sessionId, runId, toolCallId, toolName, deadlineAt.
client_tool.submitted (debug) — submit landed; payload includes sessionId, toolCallId, hasError (true when the client submitted error instead of result).
client_tool.timeout (warn) — deadline expired without a submit; payload includes sessionId, toolCallId, deadlineAt.
client_tool.aborted (warn) — agent interrupted/aborted while waiting.
client_tool.orphan_recovery_started / client_tool.orphan_resolved (warn) — runtime-js only; emitted when a process restart leaves pending calls that must be drained with runtime_restarted.
client_tool.ownership_write_failed / client_tool.ownership_clear_failed (error) — OCC (optimistic-concurrency) retry budget exhausted while writing the ownership map on the root session. Usually indicates sustained contention or a state-store outage.
client_tool.ownership_retry (debug) — per-attempt OCC retry; turn on debug logging if you need to correlate retries with a failure.
client_tool.send_event_failed / client_tool.signal_failed (error) — Cloudflare Workflow sendEvent, Temporal signal, or DBOS DBOS.send failed after the status check passed. Submit payload was persisted, but the waiting runtime did not wake up.
client_tool.validation_failed (warn) — client submitted a malformed result that failed the tool's outputSchema. Only emitted when the executor was configured with an agentRegistry (otherwise submit-side validation is skipped).
client_tool.submit_unknown (warn) — H5: submit couldn't be routed to a pending call. Payload includes sessionId, toolCallId, and a reason discriminator: root_state_not_found (session doesn't exist), ownership_map_absent (no ownership map on root), or ownership_missing_toolcall (ownership exists but this toolCallId has no entry). A trickle is normal during shard rebalancing; persistent volume indicates mis-routing (see completed cache scoping below).
client_tool.clear_failed_exhausted (error) — N5: OCC retry budget exhausted while clearing a pending entry. Entry is stranded in state; next orphan drain may emit a duplicate tool-result message. Manual cleanup may be needed.
client_tool.orphan_drain_clear_entry_failed (debug) — V4: per-iteration pending-entry clear failed during orphan drain. Each failure is recorded at debug level; the drain emits a single warn-level summary (orphan_drain_clear_entries_summary, BB7) at the end of the loop listing the total count + first-error for aggregation. Persistent occurrences indicate a state-store outage or OCC thrash.
client_tool.orphan_drain_clear_entries_summary (warn) — BB7 (round 29): summary line emitted when any per-iteration clears failed during a single drain. Payload carries failedCount, firstToolCallId, firstError. Replaces the previous one-warn-per-failure pattern that flooded logs during sustained state-store outages.
client_tool.orphan_drain_bulk_clear_failed (error) — Z4 + BB7: the post-loop bulk clear of the pending map failed. The per-iteration clears remain authoritative; the next resume will drain any leftover entries (the BB6 idempotency guard prevents duplicate tool-result messages). Operator action:
1. Check state-store health — connection pool saturation, CPU, error rate.
2. If the error is StaleStateError, a concurrent writer is modifying the session. Look for a runaway retry loop or a second runtime attached to the same sessionId.
3. Cross-reference the orphan_drain_clear_entries_summary event for the same drain — co-occurrence means the store is degraded for this session across both paths.
4. Sustained >1% of drains over 5 min — page the state-store on-call.
client_tool.orphan_drain_clear_ownership_failed (warn) — H3: clearing the root-session clientToolCallOwnership entry for a drained tool failed. Ownership entry may leak on the root session; the next submit against the same toolCallId will return unknown_tool_call (or re-associate if a new pending call registers).
client_tool.orphan_recovery_started / client_tool.orphan_resolved (warn) — S1: emitted when a JS process restart leaves pending calls that must be drained with runtime_restarted. orphan_recovery_started fires once per drain with the count; orphan_resolved fires once per entry with outcome: 'inherited' or the error string.
submit_tool_result.body_too_large (warn) — N4: Cloudflare DO rejected a submit with HTTP 413 because the declared Content-Length exceeded 4× the active result-byte limit. Indicates either a misbehaving client or a need to raise the schema's maxResultBytes via createSubmitToolResultSchema({ maxResultBytes }).
submit_tool_result.schema_limits_unreadable (debug) — X1: the consumer-supplied submitToolResultSchema could not expose its SUBMIT_SCHEMA_LIMITS stamp, so the framework fell back to DEFAULT_SUBMIT_MAX_RESULT_BYTES. Fires at most once per Durable Object lifetime (AA2 warn-once guard) to avoid flooding. Action: rebuild the schema via createSubmitToolResultSchema({...}) or re-stamp the symbol after wrapping; or, if default limits are intentional, ignore.

Privacy note: log payloads carry sessionId and toolCallId — stable identifiers that can be joined with user-identifying state. Treat them as PII-adjacent when forwarding to third-party log aggregators; configure field-level scrubbing as appropriate.

AA10 + BB17 (submitted error strings) — the framework does NOT auto-redact: error strings passed via submitToolResult({ error }) are persisted to conversation history as tool-result messages and forwarded to the LLM on the next step so it can react (retry, ask the user, give up). They are also emitted on the stream as the error field of the tool_end chunk.

There is no framework-level scrubber. There is no configuration flag to enable one. Sanitization is entirely the consumer's responsibility. If your tool error path could surface secrets or PII (stack traces, auth tokens, email addresses, internal IDs, database error messages containing row-level data), you MUST sanitize before calling submitToolResult. Consequences of failing to:

Assume the string will be seen by the LLM. Phrasing the error conversationally ("the user cancelled the confirmation dialog") helps the model recover gracefully; machine-readable codes (confirm_cancelled) are opaque to it. The framework's built-in codes (client_tool_timeout, aborted, runtime_restarted, disposed, validation_failed) are mapped to humanized paragraphs via humanizeClientToolError(); user-supplied errors pass through verbatim.
The string is persisted to conversation history. Session snapshots, conversation exports, fine-tuning-dataset dumps, and anything reading getAllMessages(sessionId) will carry it.
The string is written to the stream. Anything listening to the stream (the browser, the caller UI, a relay service) sees it in the clear.
The string is logged. client_tool.orphan_resolved and tool_end payloads carry the humanized text; the raw machine code (if one of the built-ins) is preserved in the errorCode field for aggregation.

Metrics: CLIENT_TOOL_WAIT_MS_METRIC records the wall-clock duration of each client-tool wait, keyed per tool call at resolve time. The metric stores type: 'client_tool_wait_ms' (→ source_type column) with the owning tool's name in source_name. The data JSON column's value field holds the durationMs.

K2 outcome counters (wired in M2 remediation round): one-per-occurrence counters for the three most common failure modes. Aggregate with SUM(value) (or COUNT(*)) for rate dashboards / alerts.

client_tool_timeout — deadline elapsed before a submit arrived.
client_tool_abort — session was interrupted or aborted while suspended.
client_tool_runtime_restarted — orphan drained after a process restart (runtime-js) or DO eviction (Cloudflare DO).
schema_limits_fallback (CC5/BB15, Cloudflare DO only) — submit landed with submitToolResultSchema that did not expose its SUBMIT_SCHEMA_LIMITS stamp; the Content-Length gate fell back to DEFAULT_SUBMIT_MAX_RESULT_BYTES (1 MB × 4). Batched in-memory and flushed at thresholds / alarms to avoid per-submit INSERT pressure (CC8). The associated log event submit_tool_result.schema_limits_unreadable fires at most once per DO lifetime; this counter is the continuous fleet signal.
usage_store_factory_fallback (CC5, Cloudflare DO only) — consumer-supplied usageStore factory was unavailable and the resolve fell back to the internal DOUsageStore. Fires on both the initial failure AND every subsequent resolve within the negative-cache window (5s default, configurable via DurableObjectAgentConfig.usageStoreFactoryNegativeCacheMs). The associated warn event usage_store.consumer_factory_failed fires at most once per healthy→failing transition per DO lifetime; this counter is the continuous fleet signal.

Each counter uses the same source_type / source_name convention as CLIENT_TOOL_WAIT_MS_METRIC: source_name is the tool name, data.value is 1. Example aggregations:

sql

-- Postgres: per-tool timeout rate (timeouts / total waits) over the last hour
WITH waits AS (
  SELECT source_name AS tool_name, COUNT(*) AS total
  FROM __agents_usage
  WHERE source_type = 'client_tool_wait_ms'
    AND timestamp > (extract(epoch from now()) * 1000 - 3600000)
  GROUP BY source_name
),
timeouts AS (
  SELECT source_name AS tool_name, SUM((data->>'value')::int) AS timed_out
  FROM __agents_usage
  WHERE source_type = 'client_tool_timeout'
    AND timestamp > (extract(epoch from now()) * 1000 - 3600000)
  GROUP BY source_name
)
SELECT w.tool_name,
       w.total AS total_waits,
       COALESCE(t.timed_out, 0) AS timeouts,
       ROUND(100.0 * COALESCE(t.timed_out, 0) / w.total, 2) AS timeout_pct
FROM waits w
LEFT JOIN timeouts t USING (tool_name)
ORDER BY timeout_pct DESC;

Cardinality guidance: per-tool breakdowns via source_name are stored row-by-row in the usage table and aggregated by your SQL layer — low cost. If you export these counters to a time-series system (Datadog, Prometheus), aggregate AT TYPE LEVEL (drop source_name) unless your tool set is small (<50 tools) — otherwise per-tool cardinality can cost more than the signal is worth.

Other alert-worthy thresholds (tune to your baseline):

rate(client_tool_timeout[5m]) > 0.1 per session — sustained timeout rate above 10% usually indicates a tool or client pathology.
rate(client_tool.submit_unknown[5m]) > 1 per session — above ~1/min per session suggests mis-routed submits.
rate(client_tool.clear_failed_exhausted[1h]) > 0 — any occurrence is worth investigating (state inconsistency risk).

Example aggregations (assumes the executor is wired with a store-backed UsageStore):

sql

-- Postgres: p50 / p95 wait per tool over the last hour
SELECT source_name AS tool_name,
       percentile_cont(0.5) WITHIN GROUP (ORDER BY (data->>'value')::float) AS p50_ms,
       percentile_cont(0.95) WITHIN GROUP (ORDER BY (data->>'value')::float) AS p95_ms,
       COUNT(*) AS wait_count
FROM __agents_usage
WHERE kind = 'custom'
  AND source_type = 'client_tool_wait_ms'
  AND timestamp > (extract(epoch from now()) * 1000 - 3600000)
GROUP BY source_name
ORDER BY p95_ms DESC;

sql

-- D1 (SQLite): average wait per tool over the last hour
SELECT source_name AS tool_name,
       AVG(CAST(json_extract(data, '$.value') AS REAL)) AS avg_ms,
       COUNT(*) AS wait_count
FROM __agents_usage
WHERE kind = 'custom'
  AND source_type = 'client_tool_wait_ms'
  AND timestamp > (strftime('%s', 'now') * 1000 - 3600000)
GROUP BY source_name
ORDER BY avg_ms DESC;

Traces: when tracing-langfuse is wired up, the client-tool wait is captured as a span nested under the tool-execution span. Use the toolName and toolCallId span attributes to correlate with the Logger events above.

Diagnosing a timeout

Symptom: user reports "the agent said my tool timed out" or the LLM reports an error: 'client_tool_timeout' tool result on the transcript.

Look up the session's logs for client_tool.timeout. Capture toolCallId and deadlineAt.
Cross-reference with client_tool.suspended for the same toolCallId to see sessionId, runId, and the original deadline. The wall-clock wait is deadlineAt - suspended.timestamp.
Check whether the client ever submitted:
- No client_tool.submitted for that toolCallId before the timeout → the client never submitted. Investigate the browser / client-side error state, authentication, or network path.
- client_tool.submitted arrived after the timeout → submit landed past the deadline. Either raise clientToolTimeoutMs (per-tool or per-agent) or fix the client-side latency.
If submit arrived on time but the wait still timed out, look for:
- client_tool.send_event_failed / client_tool.signal_failed — the submit reached the state store but the waiting runtime never woke up.
- client_tool.ownership_write_failed — the pending entry was never fully registered; the submit returned unknown_tool_call.

Remediation:

Per-tool: bump clientToolTimeoutMs on the defineTool config.
Per-agent fallback: bump AgentConfig.clientToolTimeoutMs.
Global fallback: DEFAULT_CLIENT_TOOL_TIMEOUT_MS (5 minutes) is intentional; override per-agent rather than globally.

`runtime_restarted` in tool results

What it means: the JS runtime process restarted while a client-tool call was pending. The in-memory promise map was lost; on resume() the runtime drains every pending call with error: 'runtime_restarted' so the LLM can react rather than stalling forever.

Operator action:

A burst of runtime_restarted errors immediately after a deploy or process restart is expected for active runtime-js sessions.
For short-lived agents that tolerate a restart, no action is needed — the LLM sees a failed tool result and typically retries or gives up gracefully.
For long-running agents that wait on human approval (seconds to hours), move to runtime-temporal (fully durable via workflow history), runtime-dbos (fully durable via Postgres-backed workflow replay), or runtime-cloudflare (durable via DO state + alarms, or Workflow waitForEvent). These runtimes survive process restarts without losing the pending call.

Inspecting the ownership map

Use case: a submit repeatedly returns unknown_tool_call and you need to tell whether the ownership map is empty, stale, or corrupted.

Pending client-tool calls are persisted under the owning session (sessionId the tool executes in — may be a sub-agent). The routing table lives on the root session as clientToolCallOwnership: Record<toolCallId, sessionId>. Consumers always submit against the root sessionId and the framework looks up the owner.

sql

-- Postgres
SELECT session_id,
       client_tool_call_ownership,
       pending_client_tool_calls
FROM __agents_states
WHERE session_id = 'your-root-session-id';

sql

-- D1 (Cloudflare)
SELECT session_id,
       client_tool_call_ownership,
       pending_client_tool_calls
FROM __agents_states
WHERE session_id = 'your-root-session-id';

client_tool_call_ownership is JSON: keys are toolCallIds, values are the owning sessionId (root itself for top-level tool calls, or a sub-agent session). If a toolCallId your client is submitting for isn't in the map:

Client submitted too early. The register-pending + ownership writes race with the tool_start chunk on the stream. Retry with small backoff.
The tool call already resolved. Framework returns already_completed (200). Idempotent — the first submit won.
State was cleared. The session ended, was branched, or the runtime crashed before persisting ownership. Submit will return unknown_tool_call; no action needed unless the client is stuck looping.

For Cloudflare DOs, inspect the DO's SQLite via the wrangler CLI or a diagnostic endpoint — each DO's state is isolated.

Ownership write retry semantics

When a sub-agent dispatches a client-executed tool call, the framework writes a (toolCallId → owningSessionId) entry into the root session'sclientToolCallOwnership map. This write is performed via compare-and-swap on the SessionState's version — concurrent writes from sibling sub-agents are serialized.

The implementation retries up to 3 attempts on transient CAS conflicts (writeOwnership, writeBothPendingAndOwnership, clearOwnership) and up to 5 attempts on clearPending. On exhaustion:

The exhaustion event is logged at ERROR level. Event names emitted by packages/runtime-js/src/client-tool-resolver.ts:
- client_tool.ownership_write_failed — terminal failure of writeOwnership or writeBothPendingAndOwnership.
- client_tool.ownership_clear_failed — terminal failure of clearOwnership.
- client_tool.clear_failed_exhausted — terminal failure of the pending-entry clear path that runs inside waitForResult's resolve closure (the stranded entry will be retried by the next orphan drain; manual cleanup may be required if it persists).
- client_tool.ownership_retry / client_tool.clear_pending_retry — debug-level per-attempt logs that precede the terminal error log when retries are happening.
The exception is rethrown to the caller. Sub-agent dispatch fails with the original error.

This is the fail-loud path — the framework does not silently drop ownership writes. If you observe the error log without a corresponding caller-visible exception, that's a bug; please file an issue.

Operationally: ownership-write exhaustion almost always indicates a distributed-lock contention issue at the state-store layer (e.g., Redis WATCH/MULTI conflicts under high concurrency, Postgres serialization failures, D1 OCC version churn). Diagnose by looking at the state-store's CAS conflict rate.

When a persistent state-store outage breaks idempotency

Setup: the orphan-drain path has three layers of defense against a mid-drain crash:

Per-iteration clearPendingEntry (V4) — delete the pending entry immediately after its tool-result message lands.
Post-loop bulk-clear (Z4) — sweep any entry that survived (1).
Per-toolCallId idempotency guard on append (BB6, round 29) — if an entry persists to the next drain despite (1) and (2), skip the tool-result append + counter-recording when conversation history already carries a matching tool-result.

Symptom: during a sustained state-store outage (pool exhausted, network partition, replica lag), client_tool.orphan_drain_clear_entries_summary and client_tool.orphan_drain_bulk_clear_failed both fire. The next resume's drain sees leftover entries, the BB6 idempotency guard engages (the skippedAppend: true field on client_tool.orphan_resolved confirms it), and the session resumes cleanly — though the operator sees noisier logs than usual.

When it breaks: if all three layers fail simultaneously (rare, but possible if the store recovers between the broken write and the read that finds the old entry), you can end up with:

the assistant message carries a tool_call with toolCallId=X
conversation history has a tool_result with toolCallId=X AND a second tool_result with toolCallId=X (possible only if BB6 somehow failed to see the first — a store-read-inconsistency case).

Providers will reject the session at the next LLM call with an error like "invalid tool_call_id X" or "duplicate tool_call_id X".

Manual recovery:

sql

-- Postgres: find sessions with duplicate tool-result messages by toolCallId.
-- Adjust the messages-join per your schema (__agents_messages table).
SELECT session_id, tool_call_id, COUNT(*)
FROM __agents_messages
WHERE role = 'tool'
  AND tool_call_id IS NOT NULL
GROUP BY session_id, tool_call_id
HAVING COUNT(*) > 1
ORDER BY 1, 2;

Then, for each offending session, delete the duplicate rows (keep one) and reset pending_client_tool_calls to {}:

sql

-- Postgres: delete all but the lowest-id duplicate for a given pair.
DELETE FROM __agents_messages m
WHERE role = 'tool'
  AND session_id = 'YOUR_SESSION'
  AND tool_call_id = 'YOUR_TOOLCALL_ID'
  AND message_id > (
    SELECT MIN(message_id) FROM __agents_messages
    WHERE role = 'tool'
      AND session_id = 'YOUR_SESSION'
      AND tool_call_id = 'YOUR_TOOLCALL_ID'
  );

UPDATE __agents_states
SET pending_client_tool_calls = '{}'::jsonb
WHERE session_id = 'YOUR_SESSION';

For Cloudflare DO sessions, use the DO's SQLite directly via wrangler d1 execute or a diagnostic endpoint.

Prevention: the BB6 idempotency guard closes the common failure path. If you see duplicate tool-results in practice AFTER the BB6 fix shipped, it means the state store returned stale reads — escalate to the store's on-call.

Force-failing a stuck call

Use case: a pending client-tool call will never receive a submit (browser closed, device offline, client bug) and the deadline is too far out to wait.

Submit an error directly — the framework treats it the same as a client-reported error:

if (!executor.submitToolResult) {
  throw new Error('Executor does not support client-executed tools');
}
await executor.submitToolResult({
  sessionId: rootSessionId, // ALWAYS the root
  toolCallId: stuckToolCallId,
  error: 'manually_failed_by_operator',
});

The pending call resolves with the provided error string, the LLM sees a normal failed tool result on the next step, and the agent can react or fail gracefully. Error-submissions skip outputSchema validation, so any string works.

Over HTTP (same effect):

POST /submit-tool-result
Content-Type: application/json

{ "sessionId": "root-session-id", "toolCallId": "tc_123", "error": "manually_failed_by_operator" }

Upgrading from v6

For the v6 → v7 client-tool model rewrite (durable suspension, the kind discriminator on SubmitToolResult, the deletion of HelixChatTransport, the new useResumeClientTools hook), see the canonical v6 to v7 migration guide. The notes below cover earlier (pre-v6) breaking changes that still apply.

Upgrading from pre-client-tool versions

If you're upgrading an existing Helix Agents deployment, most of the changes are additive and no migration is required. Two items MAY require code changes:

1. AgentServer fail-closed auth (BREAKING)

AgentServer now refuses to start at construction time if neither an authenticate hook nor allowUnauthenticated: true is configured. This was introduced to prevent an accidentally-unauthenticated /submit-tool-result endpoint — which is an LLM-context data-injection surface.

Before (silently insecure):

new AgentServer({
  agents: { agent1 },
  // no authenticate, no allowUnauthenticated — endpoints were open
});

After — pick one:

// Option A: wire an authenticate hook.
new AgentServer({
  agents: { agent1 },
  authenticate: async ({ request, operation }) => {
    // return { ok: true, principal } or { ok: false, error, status }
  },
});

// Option B: explicit opt-in if you gate upstream (reverse proxy, API gateway).
new AgentServer({
  agents: { agent1 },
  allowUnauthenticated: true, // emits a warn log + console.warn fallback on startup
});

If you're gating upstream (reverse proxy, API gateway, service mesh) and intentionally running AgentServer unauthenticated, pass allowUnauthenticated: true to acknowledge the posture. On startup you'll see a warn log + a console.warn fallback (the latter catches operators running the default noopLogger).

See the Security section above for the three recommended auth patterns.

2. Reserved tool name prefixes (BREAKING in edge cases)

defineTool() now rejects tool names starting with subagent__ or companion__. These prefixes are reserved for framework-internal dispatch (sub-agent routing, companion tool protocol). A user-defined tool with a reserved prefix would be misrouted and fail at dispatch time — the check moves the failure to defineTool time so it's caught in tests.

If your code has defineTool({ name: 'subagent__foo', ... }) or defineTool({ name: 'companion__bar', ... }), rename it. The reserved list is exported as RESERVED_TOOL_NAME_PREFIXES from @helix-agents/core for consumer linter rules.

Framework-internal constructors (createSubAgentTool, createRemoteSubAgentTool, companion-tool factories) bypass defineTool and continue to produce tools with these prefixes as designed.

3. `withContext` return shape (BEHAVIOR CHANGE — narrow impact)

withContext(logger, bindings) previously returned the native .child() result directly when wrapping a pino/winston/bunyan logger. It now returns a thin adapter that translates the helix (message, data) call signature to pino's (obj, msg) convention so structured data isn't silently dropped by pino's printf path.

Consumer assertions like expect(wrapped).toBe(pinoChild) will fail. Consumer code that inspected pino-specific fields on the wrapper (wrapped.bindings, wrapped.level) no longer finds them — access the underlying pino logger directly for those needs.

Log output through the wrapper is unchanged (modulo the fix — structured data now reaches the final log record correctly).

4. Consumer `usageStore` factory is now cached per-session on Cloudflare DO (BEHAVIOR CHANGE)

On the Cloudflare DO runtime, DurableObjectAgentConfig.usageStore factories are now invoked exactly once per session on a given DO instance, not per call site. Non-idempotent factories (e.g. one that constructs a new DB pool per call) no longer leak N pools per session.

If your factory intentionally returned a fresh store per call (e.g. for per-call pool-reset semantics), move that construction to a wider scope. Configure the negative-cache window for factory failures via DurableObjectAgentConfig.usageStoreFactoryNegativeCacheMs (default 5s).

Known Limitations

The framework's client-tool path is designed for correctness first; a few known scaling cliffs exist today. If your workload approaches any of these limits, please file an issue — several are straightforward to lift with additional indexing or pagination, but haven't been exercised yet.

`clientToolCallOwnership` grows unbounded per root session (R3.S1)

Each pending client-tool call writes an entry into the root session's clientToolCallOwnership map. Entries are removed on submit / timeout / abort via clearOwnership, so the map size tracks concurrent pending calls — not historical call volume. A session that issues thousands of serial client-tool calls stays at O(1) owned entries, but a session that holds thousands simultaneously pending client-tool calls (e.g. a fan- out to many human approvers at once) will grow the map linearly. The map is serialized as JSON in a single row/column, so at ~10K+ simultaneous pending calls the JSON read/write becomes the dominant cost. Practical threshold: under 1K concurrent pending calls per root, well-supported; 10K+ will work but with degraded write latency.

`pendingClientToolCalls` per owning session (R3.S2)

Same shape as above but on the owning session (the one whose runtime is suspended on the wait). In JS/Temporal/DBOS this is typically the root; in the DO path sub-agent DOs own their own pending entries. Same scaling envelope applies per DO / per session. If you expect deep fan-outs, batch client-tool calls at the agent layer so any single session holds only O(few) pending calls at a time.

`runtime-js` in-memory promise map (R3.S3)

JsClientToolResolver holds a process-local Map<toolCallId, resolver> tracking pending waits on the current process. This map is bounded by the process's concurrent pending calls across all sessions it owns. At process restart the map is lost and the runtime drains waits with runtime_restarted. For JS-runtime deployments this is the primary scale-out limit: a single process can comfortably hold tens of thousands of pending waits, but if you need multi-process durability, prefer runtime-temporal, runtime-dbos, or runtime-cloudflare.

`completed` cache is per-resolver-instance, not per-process (K3)

The JsClientToolResolver.completed LRU/TTL cache (default 10k entries / 10 min) is scoped to a single resolver instance, which today means a single JSAgentExecutor. Processes that run multiple executors (per- tenant isolation, sharded worker pools) get a separate cache per executor. The fast-path coherence guarantee — "a submit for session S finds the live waiter armed for S::toolCallId" — applies ONLY when the submitToolResult() call lands on the same executor instance that armed the waiter.

Routing constraint: if you shard JSAgentExecutor instances, route all submits for a given sessionId to the same shard. Cross-shard submits fall through to the state-store slow path, which is correct but slower and can produce unknown_tool_call during the submit- before-register window (the pre-submit-stub mechanism handles sub- agent cases, but not the primary race on root sessions).

Monitor: client_tool.submit_unknown events with reason: 'ownership_missing_toolcall' indicate submits landing on the wrong shard. A steady trickle is expected during normal shard rebalances; persistent volume suggests mis-routing.

Alarm-scheduler deadline recomputation is O(n) per wake (R3.S4)

AlarmScheduler.reschedule() and fireExpired() walk every subscriber (heartbeat, interrupt-poll, client-tool-deadlines) and every pending client-tool entry within client-tool-deadlines.compute(). For a DO with hundreds of simultaneously-pending client-tool calls each alarm wake performs that many durable reads. Practical threshold is unlikely to matter (clients rarely have hundreds of concurrent deadlines), but it is something to watch if you push a single DO toward fan-out at that scale. The fix, when needed, is to maintain a durable secondary index keyed by deadline so compute() can fetch MIN(deadline) in O(1).

Client-Executed Tools ​

v7 model: stateless suspension ​

Security ​

Recommended patterns ​

Using the authenticate hook ​

Rate limiting ​

Service identification in logs ​

When to use ​

Declaring a client tool ​

Wire the client (v7) ​

Direct API (without Vercel AI SDK) ​

Routing invariant ​

Per-runtime durability (v7) ​

DBOS specifics ​

Timeouts ​

HTTP contract ​

Programmatic error classification ​

Durability of already_completed ​

Configuring retention ​

Storage cost ​

Migration ​

Known limitations ​

Operating in Production ​

Monitoring ​

Diagnosing a timeout ​

runtime_restarted in tool results ​

Inspecting the ownership map ​

Ownership write retry semantics ​

When a persistent state-store outage breaks idempotency ​

Force-failing a stuck call ​

Upgrading from v6 ​

Upgrading from pre-client-tool versions ​

1. AgentServer fail-closed auth (BREAKING) ​

2. Reserved tool name prefixes (BREAKING in edge cases) ​

3. withContext return shape (BEHAVIOR CHANGE — narrow impact) ​

4. Consumer usageStore factory is now cached per-session on Cloudflare DO (BEHAVIOR CHANGE) ​

Known Limitations ​

clientToolCallOwnership grows unbounded per root session (R3.S1) ​

pendingClientToolCalls per owning session (R3.S2) ​

runtime-js in-memory promise map (R3.S3) ​

completed cache is per-resolver-instance, not per-process (K3) ​

Alarm-scheduler deadline recomputation is O(n) per wake (R3.S4) ​

Client-Executed Tools

v7 model: stateless suspension

Security

Recommended patterns

Using the `authenticate` hook

Rate limiting

Service identification in logs

When to use

Declaring a client tool

Wire the client (v7)

Direct API (without Vercel AI SDK)

Routing invariant

Per-runtime durability (v7)

DBOS specifics

Timeouts

HTTP contract

Programmatic error classification

Durability of `already_completed`

Configuring retention

Storage cost

Migration

Known limitations

Operating in Production

Monitoring

Diagnosing a timeout

`runtime_restarted` in tool results

Inspecting the ownership map

Ownership write retry semantics

When a persistent state-store outage breaks idempotency

Force-failing a stuck call

Upgrading from v6

Upgrading from pre-client-tool versions

1. AgentServer fail-closed auth (BREAKING)

2. Reserved tool name prefixes (BREAKING in edge cases)

3. `withContext` return shape (BEHAVIOR CHANGE — narrow impact)

4. Consumer `usageStore` factory is now cached per-session on Cloudflare DO (BEHAVIOR CHANGE)

Known Limitations

`clientToolCallOwnership` grows unbounded per root session (R3.S1)

`pendingClientToolCalls` per owning session (R3.S2)

`runtime-js` in-memory promise map (R3.S3)

`completed` cache is per-resolver-instance, not per-process (K3)

Alarm-scheduler deadline recomputation is O(n) per wake (R3.S4)