Checkpoints

Checkpoints are complete snapshots of agent state saved after each step. They enable time-travel debugging, crash recovery, and branching execution.

What are Checkpoints?

A checkpoint contains:

Complete agent state (messages, custom state, step count)
Agent status at that point
Timestamp of creation
Unique versioned ID

Checkpoints are created automatically after each step completes. They're stored in your state store alongside the agent state.

Checkpoint IDs

Checkpoint IDs follow a versioned format:

cpv1-{sessionId}-s{stepCount}-t{timestamp}-{random6hex}

Example: cpv1-session-abc123-s5-t1703123456789-a1b2c3

The format includes:

cpv1 - Version prefix for forward compatibility
sessionId - The session this checkpoint belongs to
s5 - Step count when created
t... - Timestamp in milliseconds
Random suffix for uniqueness

Listing Checkpoints

Get all checkpoints for a session:

typescript

const checkpoints = await stateStore.listCheckpoints(sessionId);

for (const meta of checkpoints.items) {
  console.log(`Step ${meta.stepCount}: ${meta.id}`);
  console.log(`  Status: ${meta.status}`);
  console.log(`  Created: ${new Date(meta.timestamp)}`);
}

Pagination

For sessions with many steps, use pagination:

typescript

// First page
const page1 = await stateStore.listCheckpoints(sessionId, {
  limit: 10,
  offset: 0,
});

console.log(`Showing ${page1.items.length} of ${page1.total}`);

// Next page
if (page1.hasMore) {
  const page2 = await stateStore.listCheckpoints(sessionId, {
    limit: 10,
    offset: 10,
  });
}

Retrieving a Checkpoint

Get a specific checkpoint with full state:

typescript

// Get latest checkpoint
const latest = await stateStore.getLatestCheckpoint(sessionId);
if (latest) {
  console.log(`Latest at step ${latest.stepCount}`);
  console.log('Messages:', latest.state.messages.length);
  console.log('Custom state:', latest.state.customState);
}

// Get specific checkpoint
const checkpoint = await stateStore.getCheckpoint(checkpointId);
if (checkpoint) {
  console.log('State:', checkpoint.state);
}

Time-Travel

Resume from any checkpoint to "time-travel" to that state:

typescript

// List checkpoints
const checkpoints = await stateStore.listCheckpoints(sessionId);

// Pick an earlier checkpoint (e.g., step 3)
const targetCheckpoint = checkpoints.items.find((c) => c.stepCount === 3);

if (targetCheckpoint) {
  // Resume from that checkpoint
  const newHandle = await handle.resume({
    mode: 'from_checkpoint',
    checkpointId: targetCheckpoint.id,
  });

  // Agent continues from step 3
  const result = await newHandle.result();
}

Use Cases

Debugging - Replay from a specific step to understand behavior
Branching - Fork execution from a historical point
Rollback - Undo recent steps if something went wrong
What-if analysis - Try different inputs from the same state

Checkpoint Metadata

CheckpointMeta is a lightweight view for listing:

typescript

interface CheckpointMeta {
  id: string; // Checkpoint ID
  sessionId: string; // Session this checkpoint belongs to
  stepCount: number; // Step count when created
  timestamp: number; // Creation time (ms since epoch)
  status: AgentStatus; // Status at checkpoint time
}

The full Checkpoint includes the complete state plus recovery coordination fields:

typescript

interface Checkpoint<TState, TOutput> {
  id: string;
  sessionId: string; // Session this checkpoint belongs to
  stepCount: number;
  timestamp: number;
  state: AgentState<TState, TOutput>; // Full agent state
  messageCount: number; // Message count at checkpoint (for recovery coordination)
  streamSequence: number; // Stream sequence at checkpoint (for resumption)
}

Recovery Coordination Fields

The messageCount and streamSequence fields enable coordinated recovery after crashes or interrupts:

messageCount: Number of messages at this checkpoint. Used to truncate orphaned messages that were created after the checkpoint but before a crash.
streamSequence: Stream position at this checkpoint. Used to resume streaming from the correct position and clean up orphaned stream chunks.

These fields ensure that messages, stream chunks, and checkpoints stay synchronized during crash recovery. When resuming from a checkpoint, the runtime uses these values to:

Truncate messages beyond messageCount (removing orphaned messages)
Clean up stream chunks beyond the checkpoint's step (removing orphaned chunks)
Resume streaming from the correct sequence position

Storage Considerations

Size

Each checkpoint stores the complete agent state, including all messages. For agents with:

Long conversations
Large custom state
Many steps

Storage can grow significantly. Plan your retention accordingly.

Retention

Configure TTL for automatic cleanup:

typescript

// Redis store with 7-day retention
const stateStore = new RedisStateStore({
  host: 'localhost',
  ttl: 86400 * 7, // 7 days in seconds
});

Cleanup

For manual cleanup, delete a session and its checkpoints:

typescript

// Delete a session's data (including checkpoints)
await stateStore.deleteSession(sessionId);

Checkpoint Parsing

Parse checkpoint IDs to extract components:

typescript

import { parseCheckpointId, generateCheckpointId } from '@helix-agents/core';

// Parse an existing ID
const parsed = parseCheckpointId('cpv1-session-123-s5-t1703123456789-a1b2c3');
if (parsed) {
  console.log(parsed.version); // 1
  console.log(parsed.sessionId); // 'session-123'
  console.log(parsed.stepCount); // 5
  console.log(parsed.timestamp); // 1703123456789
  console.log(parsed.random); // 'a1b2c3'
}

// Generate a new ID
const newId = generateCheckpointId('session-456', 10);
// Returns: 'cpv1-session-456-s10-t{timestamp}-{random}'

Stream Events

A checkpoint_created event is emitted when checkpoints are saved:

typescript

for await (const chunk of stream) {
  if (chunk.type === 'checkpoint_created') {
    console.log(`Checkpoint saved: ${chunk.checkpointId}`);
    console.log(`At step: ${chunk.stepCount}`);
  }
}

StateStore Methods

Checkpoint-related StateStore methods:

Method	Description
`getCheckpoint(checkpointId)`	Get full checkpoint by ID
`getLatestCheckpoint(sessionId)`	Get most recent checkpoint
`listCheckpoints(sessionId, options?)`	List checkpoint metadata with pagination

These methods are implemented by all state stores (Memory, Redis, Cloudflare D1).

Best Practices

1. Use Checkpoints for Debugging

When an agent behaves unexpectedly:

typescript

// List checkpoints to find where things went wrong
const checkpoints = await stateStore.listCheckpoints(sessionId);

for (const cp of checkpoints.items) {
  const full = await stateStore.getCheckpoint(cp.id);
  console.log(`Step ${cp.stepCount}: ${full?.state.messages.length} messages`);
}

2. Implement Rollback UI

Let users undo agent actions:

typescript

async function rollbackToStep(sessionId: string, targetStep: number) {
  const checkpoints = await stateStore.listCheckpoints(sessionId);
  const target = checkpoints.items.find((c) => c.stepCount === targetStep);

  if (!target) {
    throw new Error(`No checkpoint at step ${targetStep}`);
  }

  const handle = await executor.getHandle(agent, sessionId);
  return handle?.resume({
    mode: 'from_checkpoint',
    checkpointId: target.id,
  });
}

3. Monitor Storage Growth

Track checkpoint storage for capacity planning:

typescript

const checkpoints = await stateStore.listCheckpoints(sessionId);
console.log(`Session has ${checkpoints.total} checkpoints`);

// For Redis, check memory usage
// For D1, check row counts

Next Steps

Interrupt and Resume - Use checkpoints for pause/resume
Distributed Coordination - Checkpoints in multi-process deployments
State Management - Understanding agent state

Checkpoints ​

What are Checkpoints? ​

Checkpoint IDs ​

Listing Checkpoints ​

Pagination ​

Retrieving a Checkpoint ​

Time-Travel ​

Use Cases ​

Checkpoint Metadata ​

Recovery Coordination Fields ​

Storage Considerations ​

Size ​

Retention ​

Cleanup ​

Checkpoint Parsing ​

Stream Events ​

StateStore Methods ​

Best Practices ​

1. Use Checkpoints for Debugging ​

2. Implement Rollback UI ​

3. Monitor Storage Growth ​

Next Steps ​

Checkpoints

What are Checkpoints?

Checkpoint IDs

Listing Checkpoints

Pagination

Retrieving a Checkpoint

Time-Travel

Use Cases

Checkpoint Metadata

Recovery Coordination Fields

Storage Considerations

Size

Retention

Cleanup

Checkpoint Parsing

Stream Events

StateStore Methods

Best Practices

1. Use Checkpoints for Debugging

2. Implement Rollback UI

3. Monitor Storage Growth

Next Steps