
OpenAI
BrowseComp: OpenAI’s Brutally Hard Benchmark for AI Browsing Agents
OpenAI open-sourced BrowseComp, a benchmark built to test whether AI can find obscure, verifiable facts buried across the internet. It's intentionally hard, and most models fail—unless they can reason, persist, and browse like a human researcher.