RESEARCH

COMPUTER VISION

Quasi-hyperbolic momentum and Adam for deep learning

February 11, 2019

Abstract

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.

Download the Paper

AUTHORS

Written by

Jerry Ma

Denis Yarats

Publisher

ICLR

Research Topics

Computer Vision

Related Publications

March 13, 2025

NLP

COMPUTER VISION

Subobject-level Image Tokenization

Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung

March 13, 2025

February 27, 2025

INTEGRITY

THEORY

Logic.py: Bridging the Gap between LLMs and Constraint Solvers

Pascal Kesseli, Peter O'Hearn, Ricardo Silveira Cabral

February 27, 2025

February 07, 2025

RESEARCH

SPEECH & AUDIO

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

Andros Tjandra, Yi-Chiao Wu, Baishan Guo, John Hoffman, Brian Ellis, Apoorv Vyas, Bowen Shi, Sanyuan Chen, Matt Le, Nick Zacharov, Carleigh Wood, Ann Lee, Wei-Ning Hsu

February 07, 2025

February 06, 2025

RESEARCH

NLP

Brain-to-Text Decoding: A Non-invasive Approach via Typing

Jarod Levy, Mingfang (Lucy) Zhang, Svetlana Pinet, Jérémy Rapin, Hubert Jacob Banville, Stéphane d'Ascoli, Jean Remi King

February 06, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.