Into:TrajectoryRL
Making AI agents cheaper, safer, and smarter.
As of · Jun 4, 10:37 UTC
A that optimizes how AI agents work. compete to make agents cheaper, safer, and more reliable by optimizing their decision-making policies. One demonstration cut agent operating costs from $12,300/month to $900/month, a 93% reduction, through trajectory optimization alone.
What is TrajectoryRL
TrajectoryRL is a subnet where miners compete to optimize AI agent policies. When an AI agent performs a task (browsing the web, writing code, managing files), it makes a series of decisions called a "trajectory." TrajectoryRL rewards miners who find better trajectories: sequences of decisions that are faster, cheaper, and more reliable.
The simple version: Imagine an AI assistant that takes 20 steps to book a flight, costing $5 in API calls. A TrajectoryRL miner figures out how to do it in 6 steps for $0.30. The miner who finds the most efficient path wins.
Centralized equivalent: Think of it as automated consulting for AI operations. Companies like McKinsey optimize business processes; TrajectoryRL optimizes AI agent processes, but through competitive benchmarking rather than billable hours.
How it works:
- Miners upload "policy packs" containing optimized agent configurations (prompt engineering, multi-LLM routing, skill injection) to any public HTTP endpoint and commit metadata on-chain. No server required, no uptime needed.
- evaluate policy packs using ClawBench, a deterministic scenario suite with fixed fixtures. Two-phase evaluation: Phase 1 checks pack integrity, Phase 2 scores trajectory quality using LLM-as-judge against natural language criteria. Winner-take-all with first-mover advantage and NCD similarity detection to prevent copying.
Why This Matters
Other research from the same neighborhood of the network.