# Into: Vocence

Describe a voice in plain words, a calm middle-aged man reading slowly and warmly, and Vocence asks AI miners to actually produce it, then pays them by how closely they hit the brief.

// Prompt a voice, score the result

---

> New to Bittensor? Start here. Experienced users can skip to the full analysis.

### What is Vocence?

Vocence is a Bittensor subnet built around voice AI. Miners build text-to-speech models that take a line of text plus a written description of how it should sound, and return spoken audio. Validators send the same prompts to every miner and grade how well each one matches the request.

**The simple version:** It is like commissioning a voice actor by describing them in a sentence, except the actors are AI models competing to match your description most closely.

**Centralized equivalent:** Think ElevenLabs or OpenAI's text-to-speech, but run as an open competition where many models bid to read your text instead of one company's fixed API.

**How it works:**
- **Miners** train and deploy prompt-driven text-to-speech models that take text plus a voice instruction (tone, emotion, accent, pace, age) and return a spoken audio file.
- **Validators** send the same evaluation prompts to every miner, then score the returned audio on three things: whether it says the right words, whether it sounds clean, and whether the voice matches the requested traits.

---

### Why This Matters

- **The problem it solves:** Most high-quality voice synthesis sits behind closed APIs, and steering voice traits usually means picking from a fixed set of preset voices. Vocence aims for open models you control with plain-language instructions instead.
- **The opportunity:** Voice is becoming the interface for AI agents, assistants, audiobooks, and accessibility tools. A steerable, open voice layer that anyone can build on is broadly useful, not tied to one vendor's roadmap.
- **The Bittensor advantage:** A standing, scored competition keeps every miner's model directly comparable and pushes quality up without a single gatekeeper. Models stay open and reproducible, run through a shared evaluation pipeline.
- **Traction signals:** Early. The Q1 PromptTTS competition is live, the public repository was committed to through late May 2026, and founder @knakamor recently sat for a recorded interview on decentralized voice AI. On-chain participation is still thin, which is normal for a subnet roughly two months into its life.

---

## Full Analysis

**Category:** Image/Video/Audio Generation | **Centralized Competitor:** ElevenLabs, OpenAI TTS, Play.ht

Voice synthesis has moved fast, but the best models are mostly closed and the control surface is narrow: you get a menu of preset voices, not a dial you can describe in words. Vocence is trying to build the opposite, an open marketplace where the unit of competition is how faithfully a model can turn a written voice brief into audio. The subnet registered on April 14, 2026 and is still inside its network immunity period.

**Mechanism:**

The work and the scoring are sourced from the subnet's own repository. The current focus (Q1) is Prompt-based Text-to-Speech, or PromptTTS: a miner receives a prompt such as "a calm middle-aged male voice with a warm tone, speaking slowly and clearly" along with the text to read, and returns a WAV file. Every model runs through a standardized wrapper and exposes a single `/speak` endpoint, so outputs stay directly comparable across miners. Miners publish their model on Hugging Face and deploy it on Chutes, another Bittensor subnet, for compute, with Hippius used for storage.

Validators pull the list of registered miners, call each one's `/speak` endpoint with evaluation prompts, and run a shared scoring pipeline measuring content correctness, audio quality, and prompt adherence. Weight setting uses what the repo calls global consensus scoring: each validator reads the most recent evaluation window from every active validator, aggregates those results with stake-weighted scoring, then applies the subnet's winner-take-all and threshold rule before setting weights on chain. The team describes Q1 PromptTTS as the baseline, with the roadmap expanding to speech-to-text, speech-to-speech, and voice cloning.

On development, the public GitHub repository (vocence-78/vocence, the repo declared in the subnet's on-chain identity) shows 83 commits and a most recent commit on May 26, 2026, via a live API check, with one contributor on the public graph and Python as the primary language. On the market side, the alpha token trades around 0.00662 TAO with roughly 1,213 TAO of depth in its pool, and the subnet's emission share has sat in the low single digits of a percent, around 0.4 to 0.9 percent. Net staking flows over the past week have been negative, which under Bittensor's flow-based Taoflow model puts downward pressure on future emission share. The pool's root proportion is high (about 0.63), a heuristic that usually points to a younger subnet whose price has not yet settled into organic demand.

---

### Risk Factors

- **Execution:** Vocence is a young team with a broad roadmap and a narrow live product. Only the PromptTTS task is running today, the deeper voice capabilities (STT, speech-to-speech, cloning) are still ahead, and the public repository shows a single contributor on its graph. Delivering the full stack is unproven.
- **Competition:** Voice AI is crowded. Well-funded closed providers like ElevenLabs and OpenAI set a high quality bar, and capable open models exist outside Bittensor. Vocence has to be good enough that an open, decentralized market is worth the friction.
- **Market:** The alpha token is down roughly 24 percent over the past week and 37 percent over the past month, and recent net staking flows have turned negative, so emission share is small and under pressure.
- **Concentration:** A Gini coefficient of 0.51 across the top stake positions points to moderately concentrated stake distribution. Large positions could meaningfully move pool dynamics.

---

Another subnet, unpacked. Into the next one.
