You Can't Improve What You Can't See
If you've been running RLM in production, you've probably asked yourself some version of these questions: How much am I spending per query? Which model is giving me the best bang for my buck? Why did that one request take 12 seconds?
Until now, answering those questions meant cobbling together your own logging. We wanted something better — something that ships with RLM out of the box.
Introducing --metrics
Enabling observability is a single flag:
rlm "Summarize this document" -f report.pdf --metricsThat's it. RLM will start collecting detailed metrics for every query and persisting them to ~/.rlm/metrics.json by default. Every execution captures:
- Token usage — input and output tokens per query
- Cost — estimated USD cost based on the model's pricing
- Latency — total execution time in milliseconds
- Iterations — how many REPL cycles the recursive engine ran
- Context size — bytes of context processed
- Model — which provider and model handled the request
- Status — success or failure, with error details when things go wrong
You also get a --metrics-port option if you want the metrics API server to run on a custom port (default is 3001).
Configuration
RLM's metrics system is configurable through environment variables:
# Where metrics are stored (default: ~/.rlm/metrics.json)
RLM_METRICS_FILE=/path/to/metrics.json
# Protect your metrics API with an API key
RLM_METRICS_API_KEY=your-secret-key
# Redact query text for privacy compliance (stores SHA256 hash instead)
RLM_REDACT_QUERIES=true
# Maximum queries to retain (default: 10,000)
RLM_MAX_HISTORY=10000The RLM_REDACT_QUERIES option is worth calling out — if you're processing sensitive documents, you can hash query text so the metrics system never stores the raw input. You still get full performance and cost data without the privacy concern.
The Metrics API
When metrics are enabled, RLM exposes a REST API that you can query directly or point other tools at:
# Health check
curl http://localhost:3001/api/metrics/health
# Aggregated stats for the last 24 hours
curl http://localhost:3001/api/metrics/stats?period=day
# Recent queries filtered by model
curl "http://localhost:3001/api/metrics/queries?model=gpt-5.2&limit=10"
# Find failed queries
curl "http://localhost:3001/api/metrics/queries?success=false"The stats endpoint returns aggregated data broken down by model, so you can compare cost and volume across providers at a glance. Filtering supports time ranges, cost thresholds, tags, and text search across queries.
You can also run the metrics server standalone, decoupled from any CLI execution:
npx tsx src/metrics/server.ts --port 3001This is useful when you have multiple RLM instances writing to the same metrics store and want a single API to query them all.
Storage Backends
Metrics default to a JSON file, which works well for local development and single-instance deployments. For production workloads, RLM also supports SQLite with indexed columns for fast queries at scale. The SQLite backend uses WAL mode for concurrent reads, so your dashboard won't block your CLI.
RLM Dashboard
Numbers in a JSON file are useful. Charts are better.
We built RLM Dashboard as a dedicated observability frontend for RLM. It connects to your metrics API and gives you a real-time view of everything happening across your deployments.
What You Get
Query Analytics — A live feed of every query hitting your RLM instances. Search, filter by model or status, and drill into individual executions to see token counts, iteration steps, and timing. Export your query history as CSV or JSON when you need to dig deeper.
Cost Tracking — Daily cost trends, per-model breakdowns, and cost-by-day-of-week analysis so you can spot patterns and anomalies. Set budget alerts with daily or monthly thresholds and get notified before you blow past them.
Performance Monitoring — Latency percentiles (p50, p95, p99) for SLA tracking, throughput metrics, and a model comparison table so you can make data-driven decisions about which models to use where.
Content Health — Token usage patterns over time, context size distribution, and prompt pattern detection that classifies your queries into categories like summarization, extraction, and analysis.
Multi-Instance Support — Connect multiple RLM instances from a single dashboard. Monitor health, toggle instances on and off, and compare performance across deployments.
Getting Started
git clone https://github.com/hampton-io/RLM-Dashboard.git
cd RLM-Dashboard
npm install
cp .env.example .env.local
npx prisma migrate dev
npm run devPoint the dashboard at your RLM metrics endpoint in Settings, and you'll start seeing data immediately. One-click Vercel deployment is also supported if you want it hosted.
Built With
The dashboard runs on Next.js with Prisma and SQLite by default (PostgreSQL for production), uses Recharts for visualizations, and includes NextAuth.js for access control with role-based permissions. It's the same stack we use across our other tools, so it should feel familiar if you've worked with any of our projects.
Why This Matters
Recursive language models are powerful, but they're also unpredictable in ways that traditional API calls aren't. A single query might take 2 iterations or 8. Context sizes vary by orders of magnitude. Costs compound across models and providers. Without observability, you're flying blind.
With --metrics and the RLM Dashboard, you get the visibility to optimize your usage, catch problems early, and make informed decisions about model selection and architecture.
Try It
Enable metrics on your next RLM run:
rlm "Your query" -f document.txt --metricsThen set up the RLM Dashboard to visualize it all. Both are open source and ready to use.