June 05, 2026
Retrieval-augmented agents are increasingly the interface to large organizational and public knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology, constraints, and likely evidence, leading to unnecessary retrieval rounds, increased latency, and poor recall. We introduce Superintelligent Retrieval Agent (SIRA), which defines superintelligence in retrieval as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask what terms are relevant to the query; it asks which terms are likely to separate the desired evidence from corpus-level confusers. On the corpus side, an LLM enriches each document offline with missing search vocabulary; on the query side, it predicts evidence vocabulary omitted by the query; and corpus statistics are used as tool calls to filter proposed terms that are absent, overly common, or unlikely to create retrieval margin. The final retrieval step is a single weighted BM25 call combining the original query with the validated expansion. Across ten BEIR benchmarks, SIRA achieves the strongest average retrieval performance in our comparison, outperforming dense retrievers, learned sparse retrievers, and LLM-based search-agent baselines while using no relevance labels or retriever fine-tuning. On downstream question answering, SIRA's retrieval-only answer coverage exceeds recent RL-trained agentic QA systems on NQ and HotpotQA. Finally, we introduce BrowseComp-Wikipedia, a hard-search benchmark of 232 BrowseComp-derived queries grounded in a 25,587,229-document English Wikipedia index. Even without index-time LLM document enrichment, using only grounded Wikipedia categories as corpus-visible structure, SIRA outperforms multi-round Perplexity agents at every retrieval budget, reaching 9.70% Recall@1, 15.27% Recall@10, and 36.14% Recall@100. These results show that one well-formed, corpus-grounded lexical retrieval action can outperform substantially more expensive multi-round search while remaining interpretable, training-free, and efficient.
Written by
Anshumali Shrivastava
Jason Chen
Qi Ma
Zeyu Yang
Publisher
arXiv
May 18, 2026
Alexandre Rezende, Rohit Patel, Steven McClain
May 18, 2026
February 26, 2026
Kaiqu Liang, Xianjun Yang, Shaoliang Nie, Jaime Fernández Fisac, Shuyan Zhou, Julia Kruk, Lijuan Liu, Michael Zhang, Saghar Hosseini, Shengjie Bi, Shengyi Qian
February 26, 2026
December 01, 2025
Amine Benhalloum, Hany Awadalla, Hejia Zhang, Hunter Lang, Julian Katz-Samuels, Karishma Mandyam, Licheng Yu, Manaal Faruqui, Maryam Fazel-Zarandi, Nanshu Wang, Qi Qi, Richard Yuanzhe Pang, Selina Xiaoliang Peng, Shengjie Bi, Shengyu Feng, Shishir G. Patil, Sopan Khosla, Sujan Gonugondla, Vincent Li, Wenzhe Li, Yuanhao Xiong, Yue Yu, Yun He, Yundi Qian
December 01, 2025
September 24, 2025
Dulhan Jayalath, Suchin Gururangan, Cheng Zhang, Alan Schelten, Anirudh Goyal, Parag Jain, Shashwat Goel, Thomas Simon Foster
September 24, 2025

Our approach
Latest news
Foundational models