# Into: Data Universe

230 miners scrape and store social media data across X, Reddit, and YouTube. Validators verify what they collect through random sampling and credibility scoring. The output is what Macrocosmos calls the world's largest open-source social media dataset, queryable through a no-code product called Gravity.

// Web-scale scraping, queryable by anyone.

---

> New to Bittensor? Start here. Experienced users can skip to the full analysis.

### What is Data Universe?

Data Universe (SN13) is Macrocosmos's data-scraping subnet on Bittensor. Miners continuously pull fresh posts from X, Reddit, and YouTube transcripts, and store them in a structured format. Validators check sample slices to confirm the data is real, fresh, and not just duplicates of what every other miner is holding.

**The simple version:** Imagine a decentralized version of a web-scraping firm where hundreds of independent operators race to collect the freshest, most diverse social media content, and only get paid for data nobody else has and that the market actually asked for.

**Centralized equivalent:** Think Bright Data, Apify, or Common Crawl, but with miner incentives tied to demand from a no-code product (Gravity) instead of one-off enterprise contracts.

**How it works:**
- **Miners** scrape DataEntities from X, Reddit, and YouTube, organize them into time and label buckets, and report a MinerIndex to validators. They also upload to S3-compatible storage for public dataset access.
- **Validators** pull each miner's MinerIndex, randomly sample data for correctness, and score miners on data value (freshness, desirability, scarcity) multiplied by credibility raised to the 2.5 power. Weights get set roughly every 20 minutes.

---

### Why This Matters

- **The problem it solves:** Fresh social media data is expensive, locked behind paid APIs, and concentrated in a handful of vendors. Smaller AI teams and researchers get priced out.
- **The opportunity:** Every model training pipeline, every sentiment dashboard, every trend-detection product needs a constant feed of structured social data. The macrocosmosai dataset claims 55 billion rows.
- **The Bittensor advantage:** The Gravity product turns user queries into miner incentives directly. Demand routes to supply through emissions, not through procurement contracts.
- **Traction signals:** 2,070 commits across 12+ contributors, with the most recent push four days ago. SN13 data is reportedly consumed by SN44 (Score, sports analytics) and SN64 (Chutes, via Squad.ai) for downstream products.

---

## Full Analysis

**Category:** Data Scraping and Archival | **Centralized Competitor:** Bright Data, Apify, ScraperAPI, Common Crawl

Macrocosmos is one of the larger operators on Bittensor, running five subnets and treating Data Universe as the raw-data tier of a broader stack. Gravity sits on top as a no-code query tool, Nebula visualizes the data on a 3D plane with sentiment overlays, and Mission Commander is an agentic chatbot (built on SN1, Apex) that helps users phrase scraping jobs. There is also an MCP integration that wires the dataset into Claude and Cursor.

**Mechanism:**

Each miner scrapes from the supported DataSources and groups entries into DataEntityBuckets keyed by source, time bucket, and a DataLabel (like a stock ticker or subreddit). The full set is the miner's MinerIndex. Validators periodically request that index and store a local copy, then sample data to verify it actually exists at the original source and matches what was claimed. A separate S3 storage validation runs roughly every six hours and checks for duplicates, job-match alignment, and scraper-verifiable authenticity.

Data value is not flat. Fresh data is worth more, with linear decay over 30 days and zero value after that. Data that matches active Gravity user requests gets up to a 5x multiplier, while unspecified labels score at 30% of baseline. Data that many miners already hold is worth less per unit. Credibility, tracked as an exponential moving average of validation outcomes, is then applied as a multiplier raised to the 2.5 power, which makes misrepresentation strictly worse than honest reporting of smaller stores. The team publishes a live dashboard at sn13-dashboard.api.macrocosmos.ai showing what the network currently holds.

On the market side, the alpha token trades around 0.00734 TAO with a market cap of roughly 38,392 TAO and 22,640 TAO of depth in the pool. The 30-day price is down about 8% and 7-day net flow is negative at -213 TAO. Under the November 2025 Taoflow model, subnets with negative net staking flows receive no share of network emissions, and SN13's current emission share is 0%. The product surface is live and miners and validators continue to operate; emission share will follow when staking flows turn back positive.

---

### Risk Factors

- **Taoflow position:** With 7-day net flow at -213 TAO, the subnet is currently outside the emission band. Operators keep running on credibility and product utility, but on-chain rewards resume only when staking inflows recover.
- **Platform dependency:** Scraping X, Reddit, and YouTube means API and ToS changes at those platforms can directly affect what miners can collect, and miners carry their own GDPR and compliance exposure on collected data.
- **Off-chain product surface:** The dataset's value largely flows through Macrocosmos's Gravity, Nebula, and Mission Commander products. A pivot or pricing change at the company level would reshape demand routing into the subnet.
- **Competition:** Centralized scrapers like Bright Data and Apify still serve most enterprise data needs, and several other Bittensor subnets work in adjacent data and archival roles.
- **Multi-subnet operator:** Macrocosmos runs five subnets, which is operational strength but also concentration: a strategy shift at the parent organization affects SN13 directly.

---

Into the next one.