Claude Code Observability Stack: Visualize Token Spend with Grafana

🎣 Hook

“What gets measured gets improved.” Yet most AI-assisted dev workflows remain opaque. One day you’re pair-programming with Claude; the next, your token bill looks suspiciously high and nobody can explain why.

The Claude Code Observability Stack solves that by turning hidden costs and performance quirks into clear Grafana dashboards—vendor-neutral, open source, and deployable in minutes.


🏗️ Why we built it

In our consulting work with Seed-to-Series B teams, two questions surface again and again:

  1. Are we actually faster with Claude, or just busier?
  2. Where did last night’s $412 in token spend originate?

Answering requires real telemetry—sessions, tokens, cost, tool usage, latency—made visible to both engineers and finance. The OSS we needed didn’t exist, so we created it.


🛠️ What’s inside the repo

LayerTechPurpose
Telemetry ingestOpenTelemetry CollectorUnified metrics & logs
Metrics storePrometheusTime-series powerhouse
Log storeLokiStructured event search
VisualizationGrafanaPre-wired dashboard
DX helpersMakefile + Docker Composemake up → full stack

MIT-licensed, ~50 MB container images—the stack is live in under 90 seconds on a laptop.


🚀 Quick start

git clone https://github.com/ColeMurray/claude-code-otel.git
cd claude-code-otel
make up            # Stack online: Grafana → :3000, Prometheus → :9090

Point Claude Code to the collector:

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
claude

Open Grafana and watch data flow; the dashboard refreshes every 30 seconds by default.


📊 Dashboard tour

  • đź’° Cost & Usage – spend by model, token type, and time window.
  • đź”§ Tool Performance – frequency and success of each Claude tool.
  • ⚡ Latency & Errors – surface slowdowns and HTTP issues quickly.
  • 📝 Productivity Metrics – commits, PRs, lines added/removed.
  • 🔍 Event Logs – jump from a metric spike straight to the log entry in Loki.

Queries follow OTel best practices (low-cardinality labels, efficient rates), so the stack scales to hundreds of active sessions.


Key extras

  • Cardinality toggles – drop session IDs or account UUIDs with env vars.
  • Multi-exporter support – ship metrics to Prometheus and Datadog if needed.
  • Privacy guardrails – prompt text is redacted by default; enable only when audits demand it.
  • MDM-friendly – organization-wide settings via JSON, perfect for larger enterprises.

Early-stage outcomes

DayResult
1Finance sees spend by model—cost conversations become data-driven.
3Alerting on token spikes halts runaway jobs in minutes.
5Product demonstrates that Claude sessions correlate with a 23 % uptick in merged PRs.

About us

We’re a fractional-CTO and AI product studio that prefers observability over guesswork. Rather than billing by the hour, we deliver outcomes: faster releases, predictable costs, happier engineers. Tools like this stack make those outcomes measurable—and repeatable.

Need help integrating it with Kubernetes or mapping cost to cost centers? Let’s talk.


Get the code

github.com/ColeMurray/claude-code-otel

Because if you can’t see your AI workflows, you can’t scale them.