Vercel AI SDK Adapter
The Vercel AI SDK adapter (@helix-agents/llm-vercel) connects Helix Agents to any LLM provider supported by the Vercel AI SDK. This is the recommended adapter for most applications.
When to Use
Good fit:
- Production applications
- Multiple provider support needed
- Streaming responses required
- Using OpenAI, Anthropic, Google, or other major providers
Not ideal for:
- Unit testing (use MockLLMAdapter instead)
- Custom/private LLM APIs not in Vercel AI SDK
Installation
npm install @helix-agents/llm-vercel aiAlso install provider packages for your chosen models:
# OpenAI
npm install @ai-sdk/openai
# Anthropic
npm install @ai-sdk/anthropic
# Google
npm install @ai-sdk/googleBasic Usage
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { openai } from '@ai-sdk/openai';
// Create adapter
const adapter = new VercelAIAdapter();
// Create executor
const executor = new JSAgentExecutor(
new InMemoryStateStore(),
new InMemoryStreamManager(),
adapter
);
// Define agent with Vercel AI SDK model
const agent = defineAgent({
name: 'assistant',
systemPrompt: 'You are a helpful assistant.',
llmConfig: {
model: openai('gpt-4o'),
temperature: 0.7,
},
});Supported Providers
The Vercel AI SDK supports many providers:
| Provider | Package | Example Model |
|---|---|---|
| OpenAI | @ai-sdk/openai | openai('gpt-4o') |
| Anthropic | @ai-sdk/anthropic | anthropic('claude-sonnet-4-20250514') |
@ai-sdk/google | google('gemini-1.5-pro') | |
| Cohere | @ai-sdk/cohere | cohere('command-r-plus') |
| Mistral | @ai-sdk/mistral | mistral('mistral-large-latest') |
| Amazon Bedrock | @ai-sdk/amazon-bedrock | Various models |
| Azure OpenAI | @ai-sdk/azure | Azure-hosted models |
See the Vercel AI SDK documentation for the full list.
Configuration
Model Configuration
const agent = defineAgent({
name: 'my-agent',
systemPrompt: 'You are a helpful assistant.',
llmConfig: {
// Required: The model to use
model: openai('gpt-4o'),
// Generation parameters
temperature: 0.7, // 0-2, higher = more creative
maxOutputTokens: 4096, // Maximum tokens to generate
topP: 0.95, // Nucleus sampling
topK: 40, // Top-k sampling
// Penalties
presencePenalty: 0, // Reduce repetition of topics
frequencyPenalty: 0, // Reduce repetition of tokens
// Control
stopSequences: ['END'], // Stop generation at these sequences
seed: 12345, // For deterministic outputs
// Reliability
maxRetries: 3, // Retry on transient failures
// import { anthropicCache } from '@helix-agents/core';
// Prompt caching is opt-in. Supply a provider-specific strategy from
// `@helix-agents/core` that matches your model (here: Anthropic).
cache: anthropicCache({ ttl: '1h' }),
},
});Provider-Specific Options
Important: Reasoning features require AI SDK provider packages v3+:
@ai-sdk/openai@^3.0.0@ai-sdk/anthropic@^3.0.0Earlier v2.x versions use
specificationVersion: "v2"which triggers compatibility mode in AI SDK v6, stripping reasoning features.
Enable features specific to certain providers:
// OpenAI o-series reasoning
const agent = defineAgent({
name: 'reasoning-agent',
systemPrompt: 'Solve complex problems step by step.',
llmConfig: {
model: openai('o1'),
providerOptions: {
openai: {
reasoningSummary: 'detailed',
reasoningEffort: 'high',
},
},
},
});
// Anthropic extended thinking
const agent = defineAgent({
name: 'thinking-agent',
systemPrompt: 'Think through problems carefully.',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
providerOptions: {
anthropic: {
thinking: {
type: 'enabled',
budgetTokens: 10000,
},
},
},
},
});Dynamic Configuration
Override LLM config based on agent state:
const agent = defineAgent({
name: 'adaptive-agent',
stateSchema: z.object({
complexity: z.enum(['simple', 'complex']),
stepCount: z.number(),
}),
llmConfig: {
model: openai('gpt-4o-mini'),
temperature: 0.5,
},
llmConfigOverride: (customState, stepCount) => {
// Use more powerful model for complex tasks
if (customState.complexity === 'complex') {
return {
model: openai('gpt-4o'),
temperature: 0.2,
maxOutputTokens: 8192,
};
}
// Increase temperature over time for variety
if (stepCount > 5) {
return { temperature: 0.8 };
}
return {};
},
});Prompt Caching
Prompt caching reduces cost and latency by reusing cached prompt prefixes across LLM calls. It is opt-in and provider-specific: you choose a cache strategy that matches your model and set it on llmConfig.cache. When cache is unset, no caching is applied. The framework performs no provider detection — pick the helper that matches your model.
import { anthropicCache } from '@helix-agents/core';
const agent = defineAgent({
name: 'cached-agent',
systemPrompt: 'You are a helpful assistant with detailed instructions...',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
cache: anthropicCache({ ttl: '1h' }),
},
});Shipped strategies
| Helper | Provider | What it does |
|---|---|---|
anthropicCache({ ttl }) | Anthropic (Claude) | Places cache_control markers on the system prompt, the tool definitions, and a rolling pair of conversation breakpoints. ttl is '<N>m' / '<N>h' (default '1h'), passed through to the provider for validation. |
openaiCache() | OpenAI (GPT-4o, o-series, …) | Sets providerOptions.openai.promptCacheKey from the session ID, giving repeated requests in a session cache affinity. |
xaiCache() | xAI / Grok | Sets the x-grok-conv-id header from the session ID for conversation-level cache routing. |
Google / Gemini needs no helper — it uses implicit prefix caching server-side, so there is nothing to annotate. There is intentionally no googleCache().
Each helper only does its provider's thing; the framework never detects providers, so the helper you choose must match the model you configured.
How anthropicCache places breakpoints
anthropicCache({ ttl }) marks, up to Anthropic's four-breakpoint limit:
- The last system message (caches the system prompt).
- The last tool definition (caches the tool schema).
- The end of the most recent turn — including tool-result turns — so the next step reads the entire prior prefix from cache.
- The end of the previous turn — a rolling second anchor. Anthropic only looks back ~20 content blocks from a breakpoint to find a prior cache entry, so a single tool-heavy turn (many parallel tool calls + results) could push the latest-turn breakpoint out of range and silently re-process the whole history. The previous-turn anchor sits where the prior request's breakpoint landed, keeping the history cached regardless of how large the latest turn is.
Composing strategies
cache also accepts an array of strategies, applied in order — useful for layering a provider strategy with your own:
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
cache: [anthropicCache(), myCustomStrategy],
}Writing a custom strategy
A CacheStrategy is a pure function (CacheRequest) => CacheResult. It can annotate messages/tools (returned in messages / tools) or add request-level options (providerOptions / headers). The CacheStrategy, CacheRequest, and CacheResult types — and applyCacheStrategies, the provider-agnostic folder the runtimes use — are exported from @helix-agents/core:
import type { CacheStrategy } from '@helix-agents/core';
const tagConversation: CacheStrategy = ({ context }) => ({
headers: { 'x-my-conv-id': context.sessionId },
});Cache Token Tracking
Cache hit/miss metrics flow through the standard token usage pipeline:
// In afterLLMCall hook
hooks: {
afterLLMCall: (payload, ctx) => {
if (payload.usage) {
console.log(`Prompt tokens: ${payload.usage.promptTokens}`);
console.log(`Cached tokens: ${payload.usage.cachedTokens}`); // Cache hits
console.log(`Cache writes: ${payload.usage.cacheWriteTokens}`); // New cache entries
}
},
}Cache tokens also appear in:
- Stream chunks:
step_endchunks includecachedTokensandcacheWriteTokensin their usage field - Usage tracking: The
TokenCountsrollup includescachedandcacheWritefields - Langfuse tracing: Mapped to
cache_read_input_tokensandcache_creation_input_tokens
Streaming
The adapter supports real-time streaming:
// Streaming happens automatically in execute()
const handle = await executor.execute(agent, 'Research AI agents');
// Get the stream
const stream = await handle.stream();
if (stream) {
for await (const chunk of stream) {
switch (chunk.type) {
case 'text_delta':
process.stdout.write(chunk.delta);
break;
case 'thinking':
console.log('[Thinking]', chunk.content);
break;
case 'tool_start':
console.log(`[Tool: ${chunk.toolName}]`);
break;
}
}
}Chunk Mapping
The adapter maps Vercel AI SDK stream parts to framework chunks:
| Vercel AI SDK | Framework | Notes |
|---|---|---|
text-delta | text_delta | Generated text tokens |
reasoning-delta | thinking | Reasoning/thinking content |
tool-input-start | tool_start | Tool call begins |
tool-call | tool_start | Complete tool call |
tool-result | tool_end | Tool result |
error | error | Generation error |
Thinking/Reasoning Content
Both Anthropic and OpenAI support reasoning features:
Anthropic Extended Thinking
const agent = defineAgent({
name: 'claude-thinker',
llmConfig: {
model: anthropic('claude-sonnet-4-20250514'),
providerOptions: {
anthropic: {
thinking: {
type: 'enabled',
budgetTokens: 10000, // Token budget for thinking
},
},
},
},
});
// Thinking content streams via 'thinking' chunks
for await (const chunk of stream) {
if (chunk.type === 'thinking') {
console.log('[Claude thinking...]', chunk.content);
}
}OpenAI Reasoning
const agent = defineAgent({
name: 'o1-reasoner',
llmConfig: {
model: openai('o1'),
providerOptions: {
openai: {
reasoningSummary: 'detailed', // or 'concise'
reasoningEffort: 'high', // or 'medium', 'low'
},
},
},
});Message Conversion
The adapter converts framework messages to Vercel AI SDK format:
Framework → Vercel AI SDK
// Framework format
const messages: Message[] = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello' },
{
role: 'assistant',
content: 'I will search for that.',
toolCalls: [{ id: 'tc1', name: 'search', arguments: { q: 'test' } }],
},
{
role: 'tool',
toolCallId: 'tc1',
toolName: 'search',
content: JSON.stringify({ results: [] }),
},
];
// Automatically converted to Vercel AI SDK ModelMessage[]The conversion handles:
- System, user, and assistant messages
- Tool calls in assistant messages
- Tool results in tool messages
- Mixed text + tool call content
The adapter also coerces non-object tool-call inputs to objects — at ingestion, in stream-chunk mapping, and during message conversion (which heals any string arguments already persisted in durable stores before this guarantee existed). This complements the runtime-agnostic coercion in core's planStepProcessing(); together they ensure a tool input is never replayed as a non-object tool_use.input (which the provider rejects). Structured output is left to core's schema-aware repair, so non-object outputSchemas are preserved. See Robust tool inputs.
Tool Conversion
Framework tools (with Zod schemas) are converted to Vercel AI SDK tools:
// Framework tool
const searchTool = defineTool({
name: 'search',
description: 'Search the web',
inputSchema: z.object({
query: z.string(),
limit: z.number().optional(),
}),
execute: async (input, ctx) => {
// ...
},
});
// Automatically converted to Vercel AI SDK tool format
// The Zod schema is passed directly (AI SDK 5.x supports Zod)Error Handling
The adapter handles errors gracefully:
const adapter = new VercelAIAdapter({
logger: console, // Optional: log warnings
});
// Errors are returned as ErrorStepResult, not thrown
const result = await adapter.generateStep(input);
if (result.type === 'error') {
console.error('LLM error:', result.error.message);
// Framework handles this appropriately
}Retry Configuration
Configure retries for transient failures:
const agent = defineAgent({
llmConfig: {
model: openai('gpt-4o'),
maxRetries: 5, // Retry up to 5 times on transient errors
},
});Logger Integration
Pass a custom logger for debug output:
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
const logger = {
debug: (msg: string) => console.debug(`[DEBUG] ${msg}`),
info: (msg: string) => console.info(`[INFO] ${msg}`),
warn: (msg: string) => console.warn(`[WARN] ${msg}`),
error: (msg: string) => console.error(`[ERROR] ${msg}`),
};
const adapter = new VercelAIAdapter({ logger });Complete Example
import { defineAgent, defineTool } from '@helix-agents/core';
import { JSAgentExecutor } from '@helix-agents/runtime-js';
import { InMemoryStateStore, InMemoryStreamManager } from '@helix-agents/store-memory';
import { VercelAIAdapter } from '@helix-agents/llm-vercel';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
// Create adapter
const adapter = new VercelAIAdapter();
// Define tool
const searchTool = defineTool({
name: 'web_search',
description: 'Search the web for information',
inputSchema: z.object({
query: z.string().describe('Search query'),
}),
outputSchema: z.object({
results: z.array(z.string()),
}),
execute: async (input) => {
// Simulate search
return { results: [`Result for: ${input.query}`] };
},
});
// Define agent
const ResearchAgent = defineAgent({
name: 'researcher',
description: 'Researches topics using web search',
systemPrompt: `You are a research assistant.
Use the web_search tool to find information.
Summarize your findings clearly.`,
tools: [searchTool],
outputSchema: z.object({
summary: z.string(),
sources: z.array(z.string()),
}),
llmConfig: {
model: openai('gpt-4o'),
temperature: 0.3,
maxOutputTokens: 2048,
},
});
// Create executor
const executor = new JSAgentExecutor(
new InMemoryStateStore(),
new InMemoryStreamManager(),
adapter
);
// Execute
async function main() {
const handle = await executor.execute(
ResearchAgent,
'What are the latest developments in AI agents?'
);
// Stream output
const stream = await handle.stream();
if (stream) {
for await (const chunk of stream) {
if (chunk.type === 'text_delta') {
process.stdout.write(chunk.delta);
}
}
}
// Get result
const result = await handle.result();
console.log('\n\nResult:', result.output);
}
main();Limitations
Model-Specific Features
Not all features work with all models:
- Thinking/reasoning: Only Anthropic Claude and OpenAI o-series
- Tool calling: Most models, but check provider docs
- JSON mode: Provider-specific implementation
Token Counting
The adapter doesn't provide token counting. Use provider SDKs directly for token estimation.
Image/Multimodal
The framework supports file uploads (images, PDFs, etc.) via the files field in AgentInput. Files are converted to ContentPart[] alongside the text message and passed to the LLM. This works across all runtimes (JS, Temporal, Cloudflare).
await executor.execute(
agent,
{
message: 'Describe this image',
files: [
{
data: base64EncodedData,
mediaType: 'image/png',
filename: 'screenshot.png', // optional
},
],
},
{ sessionId }
);Next Steps
- LLM Overview - Understanding the adapter interface
- Custom Adapters - Building your own adapter
- Streaming - Real-time streaming deep dive