Into:Data Universe
Web-scale scraping, queryable by anyone.
As of · Jun 4, 10:37 UTC
230 scrape and store social media data across X, Reddit, and YouTube. verify what they collect through random sampling and credibility scoring. The output is what Macrocosmos calls the world's largest open-source social media dataset, queryable through a no-code product called Gravity.
What is Data Universe
Data Universe (SN13) is Macrocosmos's data-scraping on Bittensor. Miners continuously pull fresh posts from X, Reddit, and YouTube transcripts, and store them in a structured format. Validators check sample slices to confirm the data is real, fresh, and not just duplicates of what every other miner is holding.
The simple version: Imagine a decentralized version of a web-scraping firm where hundreds of independent operators race to collect the freshest, most diverse social media content, and only get paid for data nobody else has and that the market actually asked for.
Centralized equivalent: Think Bright Data, Apify, or Common Crawl, but with miner incentives tied to demand from a no-code product (Gravity) instead of one-off enterprise contracts.
How it works:
- Miners scrape DataEntities from X, Reddit, and YouTube, organize them into time and label buckets, and report a MinerIndex to validators. They also upload to S3-compatible storage for public dataset access.
- Validators pull each miner's MinerIndex, randomly sample data for correctness, and score miners on data value (freshness, desirability, scarcity) multiplied by credibility raised to the 2.5 power. Weights get set roughly every 20 minutes.
Other research from the same neighborhood of the network.