CORE MACHINE LEARNING

Learning-Rate-Free Learning by D-Adaptation

June 13, 2023

Abstract

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available.

Download the Paper

AUTHORS

Written by

Aaron Defazio

Konstantin Mishchenko

Publisher

ICML

Research Topics

Core Machine Learning

Related Publications

November 20, 2024

NLP

CORE MACHINE LEARNING

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra

November 20, 2024

November 14, 2024

NLP

CORE MACHINE LEARNING

A Survey on Deep Learning for Theorem Proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

November 14, 2024

November 06, 2024

THEORY

CORE MACHINE LEARNING

The Road Less Scheduled

Aaron Defazio, Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

November 06, 2024

August 16, 2024

THEORY

REINFORCEMENT LEARNING

Dual Approximation Policy Optimization

Zhihan Xiong, Maryam Fazel, Lin Xiao

August 16, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.