Claude Code Observability Stack: Visualize Token Spend with Grafana

🎣 Hook

“What gets measured gets improved.” Yet most AI-assisted dev workflows remain opaque. One day you’re pair-programming with Claude; the next, your token bill looks suspiciously high and nobody can explain why.

The Claude Code Observability Stack solves that by turning hidden costs and performance quirks into clear Grafana dashboards—vendor-neutral, open source, and deployable in minutes.

🏗️ Why we built it

In our consulting work with Seed-to-Series B teams, two questions surface again and again:

Are we actually faster with Claude, or just busier?
Where did last night’s $412 in token spend originate?

Answering requires real telemetry—sessions, tokens, cost, tool usage, latency—made visible to both engineers and finance. The OSS we needed didn’t exist, so we created it.

🛠️ What’s inside the repo

Layer	Tech	Purpose
Telemetry ingest	OpenTelemetry Collector	Unified metrics & logs
Metrics store	Prometheus	Time-series powerhouse
Log store	Loki	Structured event search
Visualization	Grafana	Pre-wired dashboard
DX helpers	Makefile + Docker Compose	`make up` → full stack

MIT-licensed, ~50 MB container images—the stack is live in under 90 seconds on a laptop.

🚀 Quick start

git clone https://github.com/ColeMurray/claude-code-otel.git
cd claude-code-otel
make up            # Stack online: Grafana → :3000, Prometheus → :9090

Point Claude Code to the collector:

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
claude

Open Grafana and watch data flow; the dashboard refreshes every 30 seconds by default.

📊 Dashboard tour

💰 Cost & Usage – spend by model, token type, and time window.
🔧 Tool Performance – frequency and success of each Claude tool.
⚡ Latency & Errors – surface slowdowns and HTTP issues quickly.
📝 Productivity Metrics – commits, PRs, lines added/removed.
🔍 Event Logs – jump from a metric spike straight to the log entry in Loki.

Queries follow OTel best practices (low-cardinality labels, efficient rates), so the stack scales to hundreds of active sessions.

Key extras

Cardinality toggles – drop session IDs or account UUIDs with env vars.
Multi-exporter support – ship metrics to Prometheus and Datadog if needed.
Privacy guardrails – prompt text is redacted by default; enable only when audits demand it.
MDM-friendly – organization-wide settings via JSON, perfect for larger enterprises.

Early-stage outcomes

Day	Result
1	Finance sees spend by model—cost conversations become data-driven.
3	Alerting on token spikes halts runaway jobs in minutes.
5	Product demonstrates that Claude sessions correlate with a 23 % uptick in merged PRs.

About us

We’re a fractional-CTO and AI product studio that prefers observability over guesswork. Rather than billing by the hour, we deliver outcomes: faster releases, predictable costs, happier engineers. Tools like this stack make those outcomes measurable—and repeatable.

Need help integrating it with Kubernetes or mapping cost to cost centers? Let’s talk.

Get the code

github.com/ColeMurray/claude-code-otel

Because if you can’t see your AI workflows, you can’t scale them.