IntoTAO
Back to Subnets
Unknown

Unknown

SN117

Tests how smart AI models really are by making them play real games

Forget abstract leaderboards. BrainPlay benchmarks AI models by putting them head-to-head in real games, turning dry evaluation metrics into something anyone can watch, understand, and actually compare.

// LLM benchmarks, but make them games

Price0.00000+11.32% 7d
Holders0
Momentum0.0 / 100Strong
// WHAT_IS_THIS

BrainPlay is a Bittensor subnet that evaluates language models through competitive gameplay. Rather than scoring models on abstract math benchmarks, it runs them through games like Codenames, 20 Questions, and Super Mario, producing results that are both technically meaningful and visually interpretable.

The simple version: It's like Elo chess ratings for AI, except the match is a game of Codenames or a Mario speedrun instead of a chess game.

Centralized equivalent: No direct equivalent. The closest is LMSYS Chatbot Arena, which also compares models head-to-head, but that's centralized, relies on human votes rather than game outcomes, and doesn't reward participants.

How it works:

  • Miners deploy language models to Targon (SN4), a serverless inference layer. Their model is their entry.
  • Validators create game rooms, assign pairs of miners to compete, score outcomes, and set weights based on performance.
1,413holders|22commits|4social mentions this week
Buy Unknown on TaoSwap
Research snapshot from April 29, 2026. Live metrics are in the sidebar.
// WHY_THIS_MATTERS
  • The problem it solves: Standard benchmarks like MMLU or HellaSwag are hard to interpret. A jump from 72.3% to 73.1% accuracy tells you almost nothing about how a model actually behaves. Game outcomes are comparatively intuitive.
  • The opportunity: A shared, decentralized benchmark layer where results come from adversarial competition rather than curated test sets that models can overfit. That's a real issue in model evaluation today.
  • The Bittensor advantage: Running many instances of many models simultaneously at scale requires infrastructure. Centralized providers charge per query. Bittensor's incentive layer makes that scale economically feasible.
  • Traction signals: 410 commits across 6 contributors, with the last commit on April 20, 2026. Active development. Minimal social presence was found in current data, suggesting the project is building before marketing.

// FULL_ANALYSIS

Category: Other: LLM Benchmarking and Evaluation | Centralized Competitor: LMSYS Chatbot Arena

The AI benchmarking space is crowded with static test sets. BrainPlay's bet is that game-based, adversarial evaluation is more robust: models can't overfit to a fixed dataset if the competition is dynamic and the games are diverse. That's a real problem in model evaluation today, where optimizing directly for benchmark scores is common practice.

Mechanism:

BrainPlay v2.0 integrates with Targon (SN4) as its inference backbone. Miners don't run their own servers. Instead, they deploy models via Targon's TVM (Targon Virtual Machine) and receive a serverless endpoint. Validators create shared game rooms and query those endpoints to run matches. Both miners and validators require a Targon API key, and miners need sufficient Targon credits to participate.

Three games are currently live, per the official repo: Codenames (language, social deduction), 20 Questions (language, logical inference), and SuperMario (vision, policy control). Codenames and 20 Questions run through the LLM weight group (mechid 0); SuperMario runs through the vision weight group (mechid 1). Emissions are split equally between the two groups.

Reward logic: winning teams receive a normalized score of 1.0; losing teams score proportionally lower. When no valid games complete in a round, the validator publishes burn weights for that group rather than preserving stale miner scores. This keeps the scoring system clean but also means no emissions reach miners during inactive periods.

Price has gained 16.5% over 7 days against a backdrop of modest net inflows (~289 TAO over 7 days). Market cap stands at approximately 7,097 TAO. The root proportion is 0.32, meaning roughly two-thirds of the liquidity pool reflects organic demand rather than protocol subsidy. That's a reasonable signal of genuine staker interest for a subnet at this stage.

The most notable data point right now: on-chain data shows zero active miners. With no miners deploying models, validators produce burn weights and no benchmarks run. Given recent commit activity (April 20, 2026), this looks like a build-first phase rather than stagnation, but it's worth watching.


// RISK_FACTORS
Risks assessed as of April 29, 2026. Conditions may have changed.
  • No active miners: On-chain data shows zero active miners. Without participants deploying models to Targon, no benchmarks are being produced. This is the subnet's most pressing operational gap.
  • Targon dependency: Miners require both a Targon API key and sufficient Targon credits. This creates a secondary cost barrier on top of standard Bittensor registration, which may limit participation relative to subnets with fewer prerequisites.
  • Thin liquidity: Root in pool is approximately 2,889 TAO. Slippage is a real consideration for any meaningful position change.
  • Early community: No meaningful Twitter or social activity was found in current data. The project appears to be pre-marketing, which limits visibility and discovery.
// LIVE_DATA
Price0.00000 TAO
24h-1.21%
7d+11.32%
30d+22.96%
Market Cap0.00 TAO
Emission0.00%
Liquidity3.2K TAO
Holders0