ReadyAI
SN33Analyses conversations to extract structured knowledge and insights from dialogue
A decentralized data structuring pipeline. Validators pull raw text, build a full ground-truth tag set, then split it into windows for miners to process with LLMs. The miner whose tag embeddings sit closest to the ground-truth vectors wins. Cleaned outputs flow to vector databases and Hugging Face datasets.
// Raw data in, structured data out.
ReadyAI is an open-source subnet that turns raw, unstructured text into AI-ready structured data. Miners run text windows through an LLM and return semantic tags, annotations, and embeddings. Validators have already tagged the full data source themselves, so they can score each miner by how closely the miner's tag embeddings cluster around their ground-truth tag vectors.
The simple version: You have a pile of raw text, podcast transcripts, support tickets, scraped documents. ReadyAI runs that text through a network of LLM-powered miners and hands you back a tagged, semantically labeled dataset you can drop into a vector database or use for fine-tuning.
Centralized equivalent: Think Scale AI, Labelbox, or Snorkel AI, but with LLM-based annotation across a competitive miner pool instead of a managed human workforce.
How it works:
- Miners receive a data window from a validator, run it through their chosen LLM (GPT-4o by default; Anthropic, OpenRouter, and Chutes are supported), and return tags, annotations, and tag embeddings for that window. CPU plus 8 GB RAM and 20 GB disk is enough, no GPU.
- Validators pull a raw data source, generate a complete ground-truth tag set for the full source, break it into fractal windows for miners, and score returns by cosine distance between miner tag embeddings and the ground-truth tag neighborhood. The scored, structured outputs are published to vector stores and Hugging Face.
- The problem it solves: Most AI applications need structured, tagged data. Producing it with human annotators is slow and inconsistent, and producing it with a single LLM behind an API is expensive and opaque. ReadyAI turns that job into a verifiable, incentive-aligned pipeline.
- The opportunity: Validators can run the pipeline against their own proprietary data sources, with a recommended minimum of 50,000 items, and resell access. The same network can structure conversations, documents, scraped web content, or survey responses.
- The Bittensor advantage: The cosine-distance scoring rule is a clean objective function. Validators are not handing out subjective ratings, they are measuring vector similarity to a ground-truth they computed themselves.
- Traction signals: 1,155 commits across 20 contributors, last push the day before writing. 230 active miners on-chain. Recent work includes a llms.txt Reference MCP server and improvements to empty-tag handling.
Category: Data Scraping and Archival | Centralized Competitor: Scale AI, Labelbox, Snorkel AI, Appen
ReadyAI is run by Afterparty, with David Fields listed as co-founder, and ships from the afterpartyai/bittensor-conversation-genome-project repo. The product surface is readyai.ai and the public outputs land on the ReadyAi Hugging Face org.
Mechanism:
A validator starts by pulling raw text from a source it controls and tagging that source in full. That full pass is the ground truth. The validator then slices the source into smaller fractal windows and hands a window to each of three miners. Each miner runs the window through an LLM and returns the same shape of output the validator produced for the full source: a set of semantic tags, annotations, and a tag embedding vector per tag.
Scoring is geometric. The validator compares each miner's tag embeddings to the ground-truth tag vector neighborhood and computes cosine distances. The final per-miner score is a weighted mix: top 3 unique tag mean (55%), overall mean (25%), median (10%), and top single score (10%). Penalties apply for missing shared tags, insufficient unique tags, and low-quality outputs. The signal rewards miners that produce tags that are both close to the ground truth and contribute genuinely novel coverage.
LLM provider choice is part of the strategy. GPT-4o is the default, with native overrides for Anthropic, OpenRouter (defaulting to deepseek/deepseek-chat via Chutes provider), and Chutes directly. Both miners and validators need an OpenAI API key by default plus a Weights and Biases key, which is the operational cost floor of running on this subnet.
Recent commits show the team building outward from the core pipeline. A llms.txt Reference MCP server lets LLM agents query a searchable index of ReadyAI's reference data through the Model Context Protocol. There is also a user-requested task bundles flow and tighter handling of miners that return empty tag sets. The repo has 1,155 commits over its lifetime with 20 contributors, and four authors have well over 100 commits each, so the bus factor is materially lower than a single-author project.
On the market side the picture is mixed. Price sits at 0.00750 TAO with a market cap around 33,000 TAO and 17,905 TAO of root in the pool. Root proportion is 0.18, indicating the pool is mostly organic. Taoflow emission share is currently 0% with a 7-day net flow of +692 TAO, so flow has turned positive recently but the EMA-smoothed share has not yet caught up. Price action is +7.5% on the week, -9.6% on the month, -26% on 90 days, with 230 miners still active on-chain.
- No current emission share: Taoflow share is 0%. Net 7-day flow turned positive (+692 TAO), but until the EMA crosses into positive territory, no portion of daily TAO emissions accrues here.
- LLM cost floor: Both miners and validators require LLM API access. Margin is bounded by the spread between LLM call cost and TAO rewards, so any pricing change at OpenAI, Anthropic, or the configured providers passes directly into mining economics.
- Validator ground truth quality: The scoring system trusts the validator's own tagging pass as ground truth. A validator using a weaker LLM or a poorly chosen source weakens the signal it sends to miners.
- Crowded category: Centralized data labeling has well-funded incumbents and the open-source dataset and synthetic data spaces move fast. ReadyAI's edge is verifiable, on-chain provenance plus low cost per token, not a moat in the underlying tagging task.
Into the next one.