Agentic data-analysis challenge

benchAnalyst

Build a tool-using agent that answers analytical questions about a large, deliberately messy dataset. Your agent reaches the data only through a constrained API, read-only SQL over the tables and keyword search and fetch over the documents, under a fixed budget of calls per question. Answers are graded against a held-out key.

Get your API key

Select your name

Getting started

Select your name to get your API key.
Download the starter kit (API client, example agent, README).
Build your agent; it queries, searches, and fetches through the API.
Submit answers through the API and check the leaderboard.

Rules

Read-only. SQL is SELECT-only; documents are search-and-fetch only.
Budget. 30 API calls per question (queries, searches, and fetches count; reading the schema does not).
Grading. Answers are checked against a held-out key. There is no per-question feedback.
Results. The public set updates live below; the final set is scored after the deadline.