May 22, 2020
Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, providing up to 9.8× memory latency speedup over a highly optimized baseline. Overall, RecNMP offers 4.2× throughput improvement and 45.8% memory energy savings.
Written by
Liu Ke
Udit Gupta
Benjamin Youngjae Cho
David Brooks
Vikas Chandra
Utku Diril
Amin Firoozshahian
Kim Hazelwood
Bill Jia
Hsien-Hsin S. Lee
Bert Maher
Dheevatsa Mudigere
Maxim Naumov
Martin Schatz
Mikhail Smelyanskiy
Xiaodong Wang
Brandon Reagen
Mark Hempstead
Xuan Zhang
Publisher
International Symposium on Computer Architecture (ISCA)
May 12, 2026
Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 12, 2026
February 27, 2026
Yifu Qiu, Holger Schwenk, Paul-Ambroise Duquenne
February 27, 2026
February 11, 2026
Leon Liangyu Chen, Haoyu Ma, Ziqi Huang, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Serena Yeung-Levy, Animesh Sinha, Chu Wang, Felix Juefei-Xu, Junzhe Sun, Zhipeng Fan
February 11, 2026
December 18, 2025
Alexandre Mourachko, Hady Elsahar, Pierre Fernandez, Sylvestre Rebuffi, Tom Sander, Tomáš Souček, Tuan Tran, Valeriu Lacatusu
December 18, 2025
June 11, 2019
Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick
June 11, 2019
April 30, 2018
Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
April 30, 2018
October 10, 2016
Matthijs Douze, Hervé Jégou, Florent Perronnin
October 10, 2016
June 18, 2018
Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou
June 18, 2018

Our approach
Latest news
Foundational models