"We already use LangSmith for tracing — do we really need an audit trail too?"
I hear this question constantly. The answer is yes, but not because LangSmith is bad — it's excellent at what it does. The issue is that observability and audit trails solve fundamentally different problems. Using an observability tool as your audit trail is like using your car's speedometer as proof you didn't speed. It shows you what's happening, but it's not evidence.
Observability answers: "What happened, so I can fix it?"
It's an engineering tool. You're debugging a failed agent run, profiling token usage, optimizing prompt latency, or tracking error rates. You need fast search, detailed traces, and interactive exploration. The audience is your engineering team.
Audit trails answer: "What happened, so I can prove it?"
It's a compliance tool. A regulator asks for evidence of your agent's decision-making process. An auditor wants proof that records weren't altered. A lawyer needs the chain of events for a liability investigation. The audience is external — people who don't trust you by default and need verifiable evidence.
These are different requirements, different users, and different standards of evidence.
Tools like LangSmith, Datadog, Splunk, and New Relic are built for operational visibility:
This is genuinely valuable. If your agent is slow, broken, or producing bad outputs, observability tools help you find and fix the problem fast.
Here's where the gap appears:
Observability platforms store traces in a standard database. Engineers with access can modify or delete records. There's no cryptographic guarantee that a trace you're looking at today is identical to what was recorded last month.
For debugging, this doesn't matter — you're looking at recent data to fix current problems. For compliance, it's disqualifying. The EU AI Act Article 12 requires logs that maintain integrity over the system's lifetime. A hash-chained audit trail provides mathematical proof that records are unaltered. Standard observability storage doesn't.
Can you run a verification function that proves every record in your LangSmith project is exactly as it was when recorded? No. Can you demonstrate this verification to a regulator in real time? No.
Hash chain verification is a specific capability: walk the chain, recompute every SHA-256 hash, confirm they match. If any record was modified, you see exactly which one. This is the difference between "we have logs" and "we have logs we can prove are genuine."
When your SOC 2 auditor asks for processing integrity evidence, you need a report — not a dashboard login. When the EU AI Act requires conformity assessment documentation, you need structured reports mapping your data to specific articles and requirements.
Observability tools give you dashboards and data exports. Compliance requires formatted reports that map to specific frameworks: EU AI Act Article 12 logging, SOC 2 Trust Service Criteria, ISO 27001 controls. These are different output formats for different audiences.
LangSmith's trace replay is great for debugging — you can see what happened and diagnose issues. But forensic replay for compliance needs additional properties:
Debugging replay asks "what went wrong?" Forensic replay asks "what exactly happened, and can we prove it in a legal proceeding?"
Article 43 of the EU AI Act requires conformity assessment for high-risk AI systems. This isn't a checkbox — it's a structured process demonstrating compliance with Articles 9-15. An observability dashboard doesn't map to this process. A compliance-grade audit trail with report generation does.
This is the key point: you don't choose between observability and audit trails. You use both.
LangSmith/Datadog handles your engineering workflow:
AgentTraceHQ handles your compliance workflow:
They can run simultaneously. The AgentTraceHQ SDK adds a callback handler alongside your existing LangSmith tracing — both capture events from the same agent execution without interfering with each other.
import { AgentTraceHQ, LangChainHandler } from "@agenttracehq/sdk"; // LangSmith auto-traces when LANGCHAIN_TRACING_V2=true const athq = new AgentTraceHQ({ apiKey: process.env.AGENTTRACEHQ_API_KEY, agentId: "my-agent", }); const auditHandler = new LangChainHandler(athq); // Both LangSmith and AgentTraceHQ trace the same execution const result = await agentExecutor.invoke( { input: userQuery }, { callbacks: [auditHandler] } // LangSmith traces via env var, ATHQ via handler );
| Scenario | Observability | Audit Trail | Both |
|---|---|---|---|
| Debugging a failed agent run | Yes | - | - |
| Optimizing prompt latency | Yes | - | - |
| A/B testing agent behavior | Yes | - | - |
| SOC 2 Type II audit | - | Yes | - |
| EU AI Act compliance | - | Yes | - |
| Customer disputes about agent decisions | - | Yes | - |
| Regulator investigation | - | Yes | - |
| Production AI agents in regulated industry | - | - | Yes |
| Enterprise customer requiring compliance evidence | - | - | Yes |
| Pre-compliance startup in early development | Yes | - | - |
Using observability as your audit trail feels like it works — until it doesn't. The failure mode is specific and predictable:
The cost of adding an audit trail from the start is minimal — 5 minutes to set up, a few lines of code per agent. The cost of retrofitting under regulatory pressure is weeks of engineering time, potential fines, and possible loss of enterprise customers during the gap.
Keep LangSmith for debugging. Keep Datadog for monitoring. Add AgentTraceHQ for the compliance layer that those tools weren't designed to provide.
Every trace is hash-chained, verifiable, and exportable to the compliance framework you need. It runs alongside your existing observability tools — no replacement, no migration, no disruption.
Start free at agenttracehq.com — add compliance-grade audit trails alongside your existing observability stack.