Skip to content

Tracing & Observability

Tracing provides visibility into agent execution for debugging, performance analysis, and cost tracking. Helix Agents integrates with Langfuse for comprehensive LLM observability.

Overview

Tracing captures:

  • Agent Runs - Full execution lifecycle with timing and status
  • LLM Calls - Model, tokens, latency, prompts and responses
  • Token Tracking - Standard, reasoning, and cached token usage
  • Tool Executions - Arguments, results, and timing
  • Sub-Agent Calls - Nested traces with parent-child relationships
  • Metadata - User attribution, session grouping, custom tags

Why Trace?

  1. Debugging - Understand why an agent behaved a certain way
  2. Performance - Identify slow LLM calls or inefficient tool usage
  3. Cost Tracking - Monitor token usage across users and features
  4. Quality - Evaluate agent outputs and improve prompts
  5. Compliance - Audit trail of LLM interactions

Runtime Compatibility

Tracing is fully stateless — no shared in-memory state between hooks. This means it works identically across all runtimes:

  • JS Runtime (JSAgentExecutor) — hooks run in-process
  • Temporal Runtime (TemporalAgentExecutor) — hooks run as activities, potentially on different worker pods
  • Cloudflare Runtime — hooks run in Workers/Durable Objects

No configuration changes are needed when switching runtimes. Define hooks once, use them everywhere.

Quick Start

1. Install the Package

bash
npm install @helix-agents/tracing-langfuse @langfuse/tracing @langfuse/otel @opentelemetry/api @opentelemetry/sdk-trace-base

2. Set Up Langfuse

Create a Langfuse account and get your API keys:

bash
# .env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

3. Add Hooks to Your Agent

typescript
import { createLangfuseHooks } from '@helix-agents/tracing-langfuse';
import { defineAgent, JSAgentExecutor } from '@helix-agents/sdk';

// Create hooks (auto-reads credentials from env)
const { hooks, flush } = createLangfuseHooks();

// Use with agent
const agent = defineAgent({
  name: 'my-agent',
  hooks,
  systemPrompt: 'You are a helpful assistant.',
  llmConfig: { model: { provider: 'openai', name: 'gpt-4o' } },
});

// Execute
const handle = await executor.execute(agent, 'Hello!');
const result = await handle.result;

// Flush in serverless (optional in long-running processes)
await flush();

4. View Traces in Langfuse

Open your Langfuse dashboard to see:

  • Trace timeline with all observations
  • Token usage and costs
  • Latency breakdown
  • Error details

Configuration

Basic Options

typescript
const { hooks } = createLangfuseHooks({
  // Credentials (optional if using env vars)
  publicKey: 'pk-lf-...',
  secretKey: 'sk-lf-...',
  baseUrl: 'https://cloud.langfuse.com', // or self-hosted URL

  // Version tag for filtering
  release: '1.0.0',

  // Default tags for all traces
  defaultTags: ['production', 'v2'],

  // Default metadata for all traces
  defaultMetadata: {
    service: 'chat-api',
    team: 'platform',
  },

  // Step grouping
  groupByStep: true, // Group observations by agent step (default: true)

  // Environment label
  environment: 'production',

  // Debug logging
  debug: false,
});

Data Capture Options

Control what data is sent to Langfuse:

typescript
const { hooks } = createLangfuseHooks({
  // Agent state snapshots (may be large)
  includeState: false,

  // Full conversation messages (may contain PII)
  includeMessages: false,

  // Tool arguments (default: true)
  includeToolArgs: true,

  // Tool results (may be large)
  includeToolResults: false,

  // LLM prompts (default: true)
  includeGenerationInput: true,

  // LLM responses (default: true)
  includeGenerationOutput: true,
});

Privacy

For production systems handling PII, consider disabling includeMessages, includeGenerationInput, and includeGenerationOutput to avoid logging sensitive user data.

Metadata & Tagging

Metadata enables filtering and attribution in Langfuse.

Passing Metadata at Execution

typescript
await executor.execute(agent, input, {
  // User attribution
  userId: 'user-123',

  // Session grouping (e.g., conversation threads)
  sessionId: 'conversation-456',

  // Tags for filtering
  tags: ['premium', 'mobile'],

  // Custom key-value metadata
  metadata: {
    environment: 'production',
    region: 'us-west-2',
    feature: 'chat',
  },
});

Using the Context Builder

For better ergonomics, use the fluent builder:

typescript
import { tracingContext } from '@helix-agents/tracing-langfuse';

const context = tracingContext()
  .user('user-123')
  .session('conversation-456')
  .tags('premium', 'mobile')
  .environment('production')
  .version('1.0.0')
  .metadata('region', 'us-west-2')
  .build();

await executor.execute(agent, input, context);

Typed Metadata

For common metadata patterns, use typed interfaces:

typescript
import { createTracingMetadata } from '@helix-agents/tracing-langfuse';

const metadata = createTracingMetadata({
  environment: 'production',
  version: '1.0.0',
  service: 'chat-api',
  region: 'us-west-2',
  tier: 'premium',
  source: 'mobile',
});

await executor.execute(agent, input, { metadata });

Token Tracking

Token usage is tracked automatically on every LLM generation. Beyond standard input/output tokens, the integration captures extended token types:

  • Reasoning tokens - Used by models with chain-of-thought reasoning (OpenAI o1/o3, Claude with extended thinking)
  • Cached tokens - Served from prompt cache (Anthropic prompt caching, OpenAI cached context)
  • Cache write tokens - Tokens written to create new cache entries (Anthropic cache_creation_input_tokens)

These are sent to Langfuse in the v4 usageDetails format:

Framework fieldLangfuse fieldDescription
promptTokensinputInput tokens
completionTokensoutputOutput tokens
totalTokenstotalTotal tokens
reasoningTokensreasoning_tokensReasoning/thinking tokens
cachedTokenscache_read_input_tokensTokens served from cache
cacheWriteTokenscache_creation_input_tokensTokens written to cache

No configuration is needed — when your LLM reports these token types in its usage response, they automatically appear in Langfuse. This enables accurate cost tracking for reasoning models and prompt caching.

Prompt Caching

Enable prompt caching with caching: 'auto' in your agent's llmConfig. Cache read tokens reduce cost (typically 90% discount), while cache write tokens have a small surcharge on the first request. See the Prompt Caching guide for details.

Step Grouping

The groupByStep option (default: true) groups observations under step spans, making it easy to see what happened in each iteration of the agent loop:

typescript
const { hooks } = createLangfuseHooks({
  groupByStep: true, // default
});

With groupByStep: true:

mermaid
graph TB
    subgraph Trace ["trace: my-agent"]
        subgraph Step1 ["span: step-1"]
            G1["generation: llm.generation<br/><i>model: gpt-4o, tokens: 1234</i>"]
            T1["span: tool:search"]
            T2["span: tool:calculate"]
        end
        subgraph Step2 ["span: step-2"]
            G2["generation: llm.generation"]
            SA["span: agent:sub-agent"]
        end
    end

With groupByStep: false, all observations are flat under the root trace:

mermaid
graph TB
    subgraph Trace ["trace: my-agent"]
        G1["generation: llm.generation"]
        T1["span: tool:search"]
        T2["span: tool:calculate"]
        G2["generation: llm.generation"]
        SA["span: agent:sub-agent"]
    end

    G1 --> T1 --> T2 --> G2 --> SA

Step grouping is useful for multi-step agents where you want to clearly see the boundary between each LLM call and its resulting tool executions.

Trace Hierarchy

  • Trace - Root container, represents the full agent run
  • Generation - LLM call with model, tokens, timing
  • Span - Tool or sub-agent execution

Trace Update Consolidation

Trace-level attributes (name, input, output, userId, sessionId, tags) are updated only on the root span of each agent trace. This avoids redundant updateTrace calls from child spans (generations, tool spans) that would send duplicate data to Langfuse. The result is cleaner traces and fewer API calls.

If you use the onAgentTraceCreated lifecycle hook to set trace metadata, those attributes apply to the root span and are inherited by all child observations in Langfuse.

Lifecycle Hooks

Customize observations with lifecycle hooks:

onAgentTraceCreated

Called when the root trace is created:

typescript
const { hooks } = createLangfuseHooks({
  onAgentTraceCreated: ({ runId, agentName, hookContext, updateTrace }) => {
    // Add environment info
    updateTrace({
      metadata: {
        nodeVersion: process.version,
        environment: process.env.NODE_ENV,
      },
    });
  },
});

onGenerationCreated

Called when an LLM generation is created. Use for logging, metrics, or side effects. The updateGeneration callback is a no-op in the current architecture; use extractAttributes for per-observation metadata instead.

typescript
const { hooks } = createLangfuseHooks({
  onGenerationCreated: ({ model, observationId }) => {
    const provider = model?.includes('gpt') ? 'openai' : 'anthropic';
    console.log(`Generation ${observationId} using ${provider}`);
  },
});

onToolCreated

Called when a tool span is created. Use for logging, metrics, or side effects. The updateTool callback is a no-op in the current architecture; use extractAttributes for per-observation metadata instead.

typescript
const { hooks } = createLangfuseHooks({
  onToolCreated: ({ toolName, toolCallId }) => {
    const category = toolName.startsWith('db_') ? 'database' : 'external';
    console.log(`Tool ${toolCallId}: ${toolName} (${category})`);
  },
});

onObservationEnding

Called before any observation ends:

typescript
const { hooks } = createLangfuseHooks({
  onObservationEnding: ({ type, observationId, durationMs, success, error }) => {
    if (!success) {
      console.error(`${type} failed after ${durationMs}ms:`, error);
    }
  },
});

Custom Attribute Extraction

Extract attributes from hook context for all observations:

typescript
const { hooks } = createLangfuseHooks({
  extractAttributes: (context) => ({
    stepCount: String(context.stepCount),
    hasParent: String(!!context.parentSessionId),
    // Access execution metadata
    region: context.metadata?.region,
  }),
});

Sub-Agent Tracing

Sub-agents automatically inherit tracing context:

typescript
const researchAgent = defineAgent({
  name: 'researcher',
  // ... config
});

const orchestrator = defineAgent({
  name: 'orchestrator',
  hooks, // Langfuse hooks
  tools: [
    createSubAgentTool({
      name: 'research',
      agent: researchAgent,
      description: 'Delegate research tasks',
    }),
  ],
});

In Langfuse, you'll see:

mermaid
graph TB
    subgraph Trace ["trace: orchestrator"]
        G1["generation: llm.generation"]
        subgraph SubAgent ["span: agent:researcher"]
            SG["generation: llm.generation"]
            ST["span: tool:search"]
        end
    end

    G1 --> SubAgent
    SG --> ST

Sub-agents inherit userId, sessionId, tags, and metadata from the parent.

Serverless Considerations

Langfuse batches events and sends them asynchronously. In serverless environments, flush before the function returns:

typescript
// AWS Lambda / Vercel / Cloudflare Workers
export async function handler(event) {
  const { hooks, flush } = createLangfuseHooks();

  const agent = defineAgent({ hooks, ... });
  const executor = new JSAgentExecutor({ ... });

  const handle = await executor.execute(agent, event.message);
  const result = await handle.result;

  // IMPORTANT: Flush before returning
  await flush();

  return { statusCode: 200, body: JSON.stringify(result) };
}

For graceful shutdown in long-running processes:

typescript
const { hooks, shutdown } = createLangfuseHooks();

process.on('SIGTERM', async () => {
  await shutdown(); // Flushes and closes
  process.exit(0);
});

OpenTelemetry Integration

The Langfuse integration uses the v4 SDK (@langfuse/tracing + @langfuse/otel), built on OpenTelemetry. The package creates a locally-owned BasicTracerProvider with a LangfuseSpanProcessor to export traces. This provider never touches global OTEL state, so it coexists safely with other OTEL integrations in the same process.

Configure the span processor via top-level options:

typescript
const { hooks } = createLangfuseHooks({
  publicKey: 'pk-lf-...',
  secretKey: 'sk-lf-...',
  flushAt: 512,
  flushInterval: 5,
  exportMode: 'batched',
});

Self-Hosted Langfuse

To use a self-hosted Langfuse instance:

typescript
const { hooks } = createLangfuseHooks({
  baseUrl: 'https://langfuse.your-company.com',
  publicKey: 'pk-...',
  secretKey: 'sk-...',
});

Or via environment variables:

bash
LANGFUSE_BASEURL=https://langfuse.your-company.com
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...

Troubleshooting

Traces Not Appearing

  1. Check credentials: Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set
  2. Enable debug mode: createLangfuseHooks({ debug: true })
  3. Flush in serverless: Call await flush() before function returns
  4. Check network: Verify connectivity to cloud.langfuse.com

Missing Metadata

Metadata must be passed at execute() time, not in agent definition:

typescript
// WRONG: Agent definition doesn't support execution metadata
const agent = defineAgent({
  metadata: { userId: '123' }, // This won't work!
});

// CORRECT: Pass at execution time
await executor.execute(agent, input, {
  userId: '123',
  metadata: { custom: 'value' },
});

High Memory Usage

If tracing increases memory usage:

  1. Disable state capture: includeState: false
  2. Disable message capture: includeMessages: false
  3. Disable result capture: includeToolResults: false

Next Steps

Released under the MIT License.