November 25, 2020
Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context dependent. Examples frequently arise in machine learning, from shape- versus texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on ‘standard’ loss functions will not find, since it instead converges to the ‘easy’ solutions. In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian. By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems.
Written by
Jack Parker-Holder
Luke Metz
Cinjon Resnick
Hengyuan Hu
Adam Lerer
Alistair Letcher
Alex Peysakhovich
Aldo Pacchiano
Jakob Foerster
Research Topics
Reinforcement Learning
November 30, 2020
Nicolas Usunier, Clément Calauzènes
November 30, 2020
November 01, 2018
Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye
November 01, 2018
May 03, 2019
Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, Jimmy Lin
May 03, 2019
December 03, 2018
Jian Zhang, Jiyan Yang, Hector Yuen
December 03, 2018
December 18, 2020
Yang Liu, Zhengxing Chen, Kittipat Virochsiri, Juan Wang, Jiahao Wu, Feng Liang
December 18, 2020
Foundational models
Our approach
Latest news
Foundational models