TinyFish has launched Bigset, an open-source multi-agent system that turns a plain-language sentence into a structured dataset pulled from the live web. You describe what you want, and Bigset infers the schema, sends autonomous agents to research it on real web pages, verifies their findings against sources, deduplicates, and hands back a clean table you can export as CSV or XLSX. Set a refresh cadence from 30 minutes to weekly, and have the agents rerun on schedule so the dataset stays current without anyone needing to touch a script.
University Application Tracker
The work is split across two agent roles. An orchestrator agent does breadth-first discovery, identifying which rows belong in the dataset and where on the web to find them, then dispatches sub-agents to fill each one. The orchestrator holds no write access of its own. Each sub-agent researches a single entity under a tight budget of 6 tool calls, pulls real data via TinyFish Search and Fetch, and inserts one verified row with its source URLs and a record of how the data was found.

Sub-agents are instructed never to fabricate values, to leave fields blank when they cannot be confirmed, and to reject duplicate primary keys automatically. The orchestrator runs until the dataset reaches its row target, building faster as it learns where the data lives.

Bigset is licensed under AGPL-3.0 and runs self-hosted through Docker, with schema inference on Claude Sonnet 4.6 and the agent roles on Qwen3.7-max by default, all routed through OpenRouter and configurable per role. The team is candid that the project is experimental: a dataset takes 2 to 5 minutes to build, it works best on topics with public web data, and the free tier covers 2,500 row operations per month. It ships with 9 curated public datasets covering AI companies hiring, GPU prices, model pricing, and top open-source repositories, browsable without an account.
Test it out for yourself on TinyFish!
TinyFish is the Palo Alto-based company behind the platform, backed by $47 million in Series A funding led by ICONIQ, and counts Google, DoorDash, and Amazon among its enterprise clients, having processed more than 40 million agent operations. Bigset is built directly on TinyFish Search and Fetch, the same web infrastructure underneath the company's enterprise agent products, and arrives as the open-source answer to proprietary natural-language dataset tools, with no per-seat pricing, no domain restrictions, and full pipeline ownership for anyone who runs it themselves.