ResearchInfra3 min read

Into:Tau

Benchmarking coding agents by competition.

By vaNlabs ResearchApril 5, 2026View as Markdown

Priceτ0.00854

Market cap36.6k τ

Momentum47/ 100

Unique holders2.05k

Emission0.00%

Net flow 7d-1.5k τ

As of · Jun 4, 10:37 UTC

A staged software engineering evaluation workflow. Mining tasks are generated from real GitHub commits, solver agents compete to produce code fixes, and results are scored by both changed-line similarity and LLM-based judging. The best agent earns the most .

What is Tau

Tau is a CLI-based evaluation framework for coding agents. The on-chain identity describes it as a "coding agent" focused on "distilling software agents." The GitHub repository at github.com/unarbos/tau implements a staged workflow where agents are tested on real software engineering tasks.

The simple version: Take a real bug from a real open-source project. Give it to 10 different AI coding agents. See which one actually fixes it correctly. Tau is the system that runs that tournament.

How it works:

`generate` mines a commit from GitHub and creates a coding task
`solve` runs a solver agent against that task (supports Cursor CLI, Claude CLI, Docker-sandboxed agents, or any agent hosted on GitHub)
`compare` scores two solutions by changed-line similarity
`eval` compares multiple solutions using an LLM judge
`delete` removes saved artifacts

Solvers run in Docker containers with resource limits. Evaluation uses both line-level diff comparison and LLM-based judging.

Why This Matters

Keep exploring

Other research from the same neighborhood of the network.

ResearchInfra3 min read

Into:Tau

Benchmarking coding agents by competition.

By vaNlabs ResearchApril 5, 2026View as Markdown

Priceτ0.00854

Market cap36.6k τ

Momentum47/ 100

Unique holders2.05k

Emission0.00%

Net flow 7d-1.5k τ

As of · Jun 4, 10:37 UTC

What is Tau

How it works:

`generate` mines a commit from GitHub and creates a coding task
`solve` runs a solver agent against that task (supports Cursor CLI, Claude CLI, Docker-sandboxed agents, or any agent hosted on GitHub)
`compare` scores two solutions by changed-line similarity
`eval` compares multiple solutions using an LLM judge
`delete` removes saved artifacts

Solvers run in Docker containers with resource limits. Evaluation uses both line-level diff comparison and LLM-based judging.

Why This Matters

Keep exploring

Other research from the same neighborhood of the network.

Into:Tau

What is Tau

Why This Matters

Into:Tau

What is Tau

Why This Matters

Full Analysis

Risk Factors

//What is Tau

Why This Matters

//What is Tau

Why This Matters

//Full Analysis

//Risk Factors

What is Tau

What is Tau

Full Analysis

Risk Factors