Research

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference

May 22, 2020

Abstract

Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2× across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.

Download the Paper

AUTHORS

Written by

Udit Gupta

Samuel Hsia

Vikram Saraph

Xiaodong Wang

Brandon Reagen

Gu-Yeon Wei

Hsien-Hsin S. Lee

David Brooks

Carole-Jean Wu

Publisher

International Symposium on Computer Architecture (ISCA)

Related Publications

February 27, 2026

Human & Machine Intelligence

Unified Vision–Language Modeling via Concept Space Alignment

Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk

February 27, 2026

February 26, 2026

Conversational AI

Learning Personalized Agents from Human Feedback

Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Shaoliang Nie, Michael Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini

February 26, 2026

February 11, 2026

Computer Vision

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

February 11, 2026

December 18, 2025

Computer Vision

Pixel Seal: Adversarial-only training for invisible image and video watermarking

Tomáš Souček, Pierre Fernandez, Hady Elsahar, Sylvestre Rebuffi, Valeriu Lacatusu, Tuan Tran, Tom Sander, Alexandre Mourachko

December 18, 2025

October 31, 2019

NLP

Facebook AI's WAT19 Myanmar-English Translation Task Submission

Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc’Aurelio Ranzato

October 31, 2019

October 27, 2019

Order-Aware Generative Modeling Using the 3D-Craft Dataset | Facebook AI Research

Zhuoyuan Chen, Demi Guo, Tong Xiao, Saining Xie, Xinlei Chen, Haonan Yu, Jonathan Gray, Kavya Srinet, Haoqi Fan, Jerry Ma, Charles R. Qi, Shubham Tulsiani, Arthur Szlam, Larry Zitnick

October 27, 2019

April 25, 2020

Energy-Based Models for Atomic-Resolution Protein Conformations | Facebook AI Research

Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

April 25, 2020

June 11, 2019

Computer Vision

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero | Facebook AI Research

Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

June 11, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.