Checkpoints
Checkpoints are complete snapshots of agent state saved after each step. They enable time-travel debugging, crash recovery, and branching execution.
What are Checkpoints?
A checkpoint contains:
- Complete agent state (messages, custom state, step count)
- Agent status at that point
- Timestamp of creation
- Unique versioned ID
Checkpoints are created automatically after each step completes. They're stored in your state store alongside the agent state.
Checkpoint IDs
Checkpoint IDs follow a versioned format:
cpv1-{sessionId}-s{stepCount}-t{timestamp}-{random6hex}Example: cpv1-session-abc123-s5-t1703123456789-a1b2c3
The format includes:
cpv1- Version prefix for forward compatibilitysessionId- The session this checkpoint belongs tos5- Step count when createdt...- Timestamp in milliseconds- Random suffix for uniqueness
Listing Checkpoints
Get all checkpoints for a session:
const checkpoints = await stateStore.listCheckpoints(sessionId);
for (const meta of checkpoints.items) {
console.log(`Step ${meta.stepCount}: ${meta.id}`);
console.log(` Status: ${meta.status}`);
console.log(` Created: ${new Date(meta.timestamp)}`);
}Pagination
For sessions with many steps, use pagination:
// First page
const page1 = await stateStore.listCheckpoints(sessionId, {
limit: 10,
offset: 0,
});
console.log(`Showing ${page1.items.length} of ${page1.total}`);
// Next page
if (page1.hasMore) {
const page2 = await stateStore.listCheckpoints(sessionId, {
limit: 10,
offset: 10,
});
}Retrieving a Checkpoint
Get a specific checkpoint with full state:
// Get latest checkpoint
const latest = await stateStore.getLatestCheckpoint(sessionId);
if (latest) {
console.log(`Latest at step ${latest.stepCount}`);
console.log('Messages:', latest.state.messages.length);
console.log('Custom state:', latest.state.customState);
}
// Get specific checkpoint
const checkpoint = await stateStore.getCheckpoint(checkpointId);
if (checkpoint) {
console.log('State:', checkpoint.state);
}Time-Travel
Resume from any checkpoint to "time-travel" to that state:
// List checkpoints
const checkpoints = await stateStore.listCheckpoints(sessionId);
// Pick an earlier checkpoint (e.g., step 3)
const targetCheckpoint = checkpoints.items.find((c) => c.stepCount === 3);
if (targetCheckpoint) {
// Resume from that checkpoint
const newHandle = await handle.resume({
mode: 'from_checkpoint',
checkpointId: targetCheckpoint.id,
});
// Agent continues from step 3
const result = await newHandle.result();
}Use Cases
- Debugging - Replay from a specific step to understand behavior
- Branching - Fork execution from a historical point
- Rollback - Undo recent steps if something went wrong
- What-if analysis - Try different inputs from the same state
Checkpoint Metadata
CheckpointMeta is a lightweight view for listing:
interface CheckpointMeta {
id: string; // Checkpoint ID
sessionId: string; // Session this checkpoint belongs to
stepCount: number; // Step count when created
timestamp: number; // Creation time (ms since epoch)
status: AgentStatus; // Status at checkpoint time
}The full Checkpoint includes the complete state plus recovery coordination fields:
interface Checkpoint<TState, TOutput> {
id: string;
sessionId: string; // Session this checkpoint belongs to
stepCount: number;
timestamp: number;
state: AgentState<TState, TOutput>; // Full agent state
messageCount: number; // Message count at checkpoint (for recovery coordination)
streamSequence: number; // Stream sequence at checkpoint (for resumption)
}Recovery Coordination Fields
The messageCount and streamSequence fields enable coordinated recovery after crashes or interrupts:
- messageCount: Number of messages at this checkpoint. Used to truncate orphaned messages that were created after the checkpoint but before a crash.
- streamSequence: Stream position at this checkpoint. Used to resume streaming from the correct position and clean up orphaned stream chunks.
These fields ensure that messages, stream chunks, and checkpoints stay synchronized during crash recovery. When resuming from a checkpoint, the runtime uses these values to:
- Truncate messages beyond
messageCount(removing orphaned messages) - Clean up stream chunks beyond the checkpoint's step (removing orphaned chunks)
- Resume streaming from the correct sequence position
Storage Considerations
Size
Each checkpoint stores the complete agent state, including all messages. For agents with:
- Long conversations
- Large custom state
- Many steps
Storage can grow significantly. Plan your retention accordingly.
Retention
Configure TTL for automatic cleanup:
// Redis store with 7-day retention
const stateStore = new RedisStateStore({
host: 'localhost',
ttl: 86400 * 7, // 7 days in seconds
});Cleanup
For manual cleanup, delete a session and its checkpoints:
// Delete a session's data (including checkpoints)
await stateStore.deleteSession(sessionId);Checkpoint Parsing
Parse checkpoint IDs to extract components:
import { parseCheckpointId, generateCheckpointId } from '@helix-agents/core';
// Parse an existing ID
const parsed = parseCheckpointId('cpv1-session-123-s5-t1703123456789-a1b2c3');
if (parsed) {
console.log(parsed.version); // 1
console.log(parsed.sessionId); // 'session-123'
console.log(parsed.stepCount); // 5
console.log(parsed.timestamp); // 1703123456789
console.log(parsed.random); // 'a1b2c3'
}
// Generate a new ID
const newId = generateCheckpointId('session-456', 10);
// Returns: 'cpv1-session-456-s10-t{timestamp}-{random}'Stream Events
A checkpoint_created event is emitted when checkpoints are saved:
for await (const chunk of stream) {
if (chunk.type === 'checkpoint_created') {
console.log(`Checkpoint saved: ${chunk.checkpointId}`);
console.log(`At step: ${chunk.stepCount}`);
}
}StateStore Methods
Checkpoint-related StateStore methods:
| Method | Description |
|---|---|
getCheckpoint(checkpointId) | Get full checkpoint by ID |
getLatestCheckpoint(sessionId) | Get most recent checkpoint |
listCheckpoints(sessionId, options?) | List checkpoint metadata with pagination |
These methods are implemented by all state stores (Memory, Redis, Cloudflare D1).
Best Practices
1. Use Checkpoints for Debugging
When an agent behaves unexpectedly:
// List checkpoints to find where things went wrong
const checkpoints = await stateStore.listCheckpoints(sessionId);
for (const cp of checkpoints.items) {
const full = await stateStore.getCheckpoint(cp.id);
console.log(`Step ${cp.stepCount}: ${full?.state.messages.length} messages`);
}2. Implement Rollback UI
Let users undo agent actions:
async function rollbackToStep(sessionId: string, targetStep: number) {
const checkpoints = await stateStore.listCheckpoints(sessionId);
const target = checkpoints.items.find((c) => c.stepCount === targetStep);
if (!target) {
throw new Error(`No checkpoint at step ${targetStep}`);
}
const handle = await executor.getHandle(agent, sessionId);
return handle?.resume({
mode: 'from_checkpoint',
checkpointId: target.id,
});
}3. Monitor Storage Growth
Track checkpoint storage for capacity planning:
const checkpoints = await stateStore.listCheckpoints(sessionId);
console.log(`Session has ${checkpoints.total} checkpoints`);
// For Redis, check memory usage
// For D1, check row countsNext Steps
- Interrupt and Resume - Use checkpoints for pause/resume
- Distributed Coordination - Checkpoints in multi-process deployments
- State Management - Understanding agent state