Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference

March 01, 2021


As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on the client’s encrypted data. While HE can meet privacy constraints, it introduces enormous computational challenges and remains impractically slow in current systems. This paper introduces Cheetah, a set of algorithmic and hardware optimizations for server-side HE DNN inference to approach real-time speeds. Cheetah proposes HE-parameter tuning optimization and operator scheduling optimizations, which together deliver 79× speedup over state-of-the-art. However, this still falls short of real-time inference speeds by almost four orders of magnitude. Cheetah further proposes an accelerator architecture, when combined with the algorithmic optimizations, to bridge the remaining performance gap. We evaluate several DNNs and show that privacy-preserving HE inference for ResNet50 can be done at near real-time performance with an accelerator dissipating 30W and 545mm^2 in 5nm.

Download the Paper


Written by

Hsien-Hsin Sean Lee

David Brooks

Vincent Lee

Brandon Reagan

Gu-Yeon Wei

Wooseok Choi

Yeongil Ko


International Symposium on High-Performance Computer Architecture

Research Topics

Systems Research

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.