June 05, 2026
Retrieval-augmented agents are increasingly the interface to large organizational and public knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology, constraints, and likely evidence, leading to unnecessary retrieval rounds, increased latency, and poor recall. We introduce Superintelligent Retrieval Agent (SIRA), which defines superintelligence in retrieval as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask what terms are relevant to the query; it asks which terms are likely to separate the desired evidence from corpus-level confusers. On the corpus side, an LLM enriches each document offline with missing search vocabulary; on the query side, it predicts evidence vocabulary omitted by the query; and corpus statistics are used as tool calls to filter proposed terms that are absent, overly common, or unlikely to create retrieval margin. The final retrieval step is a single weighted BM25 call combining the original query with the validated expansion. Across ten BEIR benchmarks, SIRA achieves the strongest average retrieval performance in our comparison, outperforming dense retrievers, learned sparse retrievers, and LLM-based search-agent baselines while using no relevance labels or retriever fine-tuning. On downstream question answering, SIRA's retrieval-only answer coverage exceeds recent RL-trained agentic QA systems on NQ and HotpotQA. Finally, we introduce BrowseComp-Wikipedia, a hard-search benchmark of 232 BrowseComp-derived queries grounded in a 25,587,229-document English Wikipedia index. Even without index-time LLM document enrichment, using only grounded Wikipedia categories as corpus-visible structure, SIRA outperforms multi-round Perplexity agents at every retrieval budget, reaching 9.70% Recall@1, 15.27% Recall@10, and 36.14% Recall@100. These results show that one well-formed, corpus-grounded lexical retrieval action can outperform substantially more expensive multi-round search while remaining interpretable, training-free, and efficient.
Written by
Zeyu Yang
Qi Ma
Jason Chen
Anshumali Shrivastava
Publisher
arXiv
May 18, 2026
Rohit Patel, Alexandre Rezende, Steven McClain
May 18, 2026
February 26, 2026
Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Shaoliang Nie, Michael Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini
February 26, 2026
December 01, 2025
Yun He, Wenzhe Li, Hejia Zhang, Vincent Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Selina Xiaoliang Peng, Shengjie Bi, Shishir G. Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Gonugondla, Hunter Lang, Yue Yu, Yundi Qian, Maryam Fazel-Zarandi, Licheng Yu, Amine Benhalloum, Hany Awadalla, Manaal Faruqui
December 01, 2025
September 24, 2025
Dulhan Jayalath, Shashwat Goel, Thomas Simon Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten
September 24, 2025

Our approach
Latest news
Foundational models