Research

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference

May 22, 2020

Abstract

Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2× across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.

Download the Paper

AUTHORS

Written by

Udit Gupta

Samuel Hsia

Vikram Saraph

Xiaodong Wang

Brandon Reagen

Gu-Yeon Wei

Hsien-Hsin S. Lee

David Brooks

Carole-Jean Wu

Publisher

International Symposium on Computer Architecture (ISCA)

Related Publications

November 27, 2022

Core Machine Learning

Neural Attentive Circuits

Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio

November 27, 2022

November 27, 2022

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann

November 27, 2022

November 16, 2022

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

November 10, 2022

Computer Vision

Learning State-Aware Visual Representations from Audible Interactions

Unnat Jain, Abhinav Gupta, Himangi Mittal, Pedro Morgado

November 10, 2022

April 08, 2021

Responsible AI

Integrity

Towards measuring fairness in AI: the Casual Conversations dataset

Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer

April 08, 2021

April 30, 2018

The Role of Minimal Complexity Functions in Unsupervised Learning of Semantic Mappings | Facebook AI Research

Tomer Galanti, Lior Wolf, Sagie Benaim

April 30, 2018

April 30, 2018

Computer Vision

NAM – Unsupervised Cross-Domain Image Mapping without Cycles or GANs | Facebook AI Research

Yedid Hoshen, Lior Wolf

April 30, 2018

December 11, 2019

Speech & Audio

Computer Vision

Hyper-Graph-Network Decoders for Block Codes | Facebook AI Research

Eliya Nachmani, Lior Wolf

December 11, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.