December 06, 2018
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.
Publisher
NIPS
Research Topics
October 04, 2024
Bandhav Veluri, Benjamin Peloquin, Bokai Yu, Hongyu Gong, Shyam Gollakota
October 04, 2024
October 03, 2024
David Dale, Marta R. Costa-jussa
October 03, 2024
September 26, 2024
Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussa
September 26, 2024
September 05, 2024
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Luke Zettlemoyer, Omer Levy, Xuezhe Ma
September 05, 2024
Foundational models
Latest news
Foundational models