Into:ReadyAI
Raw data in, structured data out.
As of · Jun 4, 10:37 UTC
A decentralized data structuring pipeline. pull raw text, build a full ground-truth tag set, then split it into windows for to process with LLMs. The miner whose tag embeddings sit closest to the ground-truth vectors wins. Cleaned outputs flow to vector databases and Hugging Face datasets.
What is ReadyAI
ReadyAI is an open-source subnet that turns raw, unstructured text into AI-ready structured data. Miners run text windows through an LLM and return semantic tags, annotations, and embeddings. Validators have already tagged the full data source themselves, so they can score each miner by how closely the miner's tag embeddings cluster around their ground-truth tag vectors.
The simple version: You have a pile of raw text, podcast transcripts, support tickets, scraped documents. ReadyAI runs that text through a network of LLM-powered miners and hands you back a tagged, semantically labeled dataset you can drop into a vector database or use for fine-tuning.
Centralized equivalent: Think Scale AI, Labelbox, or Snorkel AI, but with LLM-based annotation across a competitive miner pool instead of a managed human workforce.
How it works:
- Miners receive a data window from a validator, run it through their chosen LLM (GPT-4o by default; Anthropic, OpenRouter, and Chutes are supported), and return tags, annotations, and tag embeddings for that window. CPU plus 8 GB RAM and 20 GB disk is enough, no GPU.
- Validators pull a raw data source, generate a complete ground-truth tag set for the full source, break it into fractal windows for miners, and score returns by cosine distance between miner tag embeddings and the ground-truth tag neighborhood. The scored, structured outputs are published to vector stores and Hugging Face.
Other research from the same neighborhood of the network.