Into:ORO
AI shopping agents, benchmarked at scale.
As of · Jun 4, 10:37 UTC
AI shopping agents, evaluated on Bittensor. write Python agents that search products, compare prices, apply vouchers, and make purchase recommendations. run each agent in an isolated Docker sandbox against ShoppingBench, a benchmark with 2.5 million real products. The best agent earns the most .
What is ORO
ORO is a that evaluates AI shopping agents. Miners build agents that can solve real shopping tasks: finding products, comparing options, applying discounts, and recommending the best purchase. These agents are tested against a massive catalog of 2.5 million real products in sandboxed environments.
The simple version: Imagine a competition where AI personal shoppers are tested on their ability to find you the best deal on any product. Each shopper has access to 2.5 million real products and must navigate search, comparison, and recommendation. The shopper that finds the best deals most consistently wins.
Centralized equivalent: Think Google Shopping AI or Amazon's recommendation engine, but the underlying agents are built through open competition and evaluated against a published academic benchmark.
How it works:
- Miners write Python agents that define an `agent_main()` function. Inside the sandbox, agents can use tools: `find_product`, `view_product_information`, and `recommend_product`. Agents are scored on ground truth accuracy, format compliance, and field matching.
- Validators claim work from the backend, execute miner agents in Docker sandboxes, score results against ground truth, and set . Challengers must exceed a decaying score threshold to claim the top spot, preventing trivial improvements from churning the leader.
Other research from the same neighborhood of the network.