May 22, 2020
Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2× across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.
Written by
Udit Gupta
Samuel Hsia
Vikram Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Publisher
International Symposium on Computer Architecture (ISCA)
June 05, 2026
Anshumali Shrivastava, Jason Chen, Qi Ma, Zeyu Yang
June 05, 2026
May 26, 2026
Valentin Wyart, Huy V. Vo, Jean Remi King, Josephine Raugel, Jérémy Rapin, Marc Szafraniec, Max Seitzer, Patrick Labatut, Piotr Bojanowski
May 26, 2026
May 19, 2026
Alvin W. M. Tan, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Michael C. Frank, Angel Villar Corrales, Charles-Eric Saint-James, Dongyan Lin, Emmanuel Dupoux, Jiayi Shen, Juan Pino, Mahi Luthra, Martin Gleize, Phillip Rust, Rashel Moritz, Sheila Krogh-Jespersen, Surya Parimi, Tom Fizycki, Vanessa Stark, Yosuke Higuchi, Youssef Benchekroun
May 19, 2026
May 17, 2026
Alexandre Rezende, Rohit Patel, Steven McClain
May 17, 2026
October 31, 2019
Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc’Aurelio Ranzato
October 31, 2019
October 27, 2019
Zhuoyuan Chen, Demi Guo, Tong Xiao, Saining Xie, Xinlei Chen, Haonan Yu, Jonathan Gray, Kavya Srinet, Haoqi Fan, Jerry Ma, Charles R. Qi, Shubham Tulsiani, Arthur Szlam, Larry Zitnick
October 27, 2019
April 25, 2020
Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives
April 25, 2020
June 11, 2019
Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick
June 11, 2019

Our approach
Latest news
Foundational models