March 13, 2024
Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
Publisher
arXiv
Research Topics
Core Machine Learning
May 12, 2026
Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 12, 2026
November 18, 2025
Roberta Raileanu, * Equal authorship, Alexis Audran-Reiss, Amar Budhiraja *, Anton Protopopov, Bhavul Gauri, Despoina Magka, Gaurav Chaurasia, Michael Slater, Shalini Maiti *, Tatiana Shavrina, Yoram Bachrach
November 18, 2025
October 13, 2025
Paria Rashidinejad, Cai Zhou, Tommi Jaakkola, DiJia Su, Bo Liu, Feiyu Chen, Chenyu Wang, Shannon Zejiang Shen, Sid Wang, Siyan Zhao, Song Jiang, Yuandong Tian
October 13, 2025
September 24, 2025
Chris Cummins, Hugh Leather, Aram Markosyan, Matteo Pagliardini, Tal Remez, Volker Seeker, Marco Selvi, Lingming Zhang, Abhishek Charnalia, Alex Gu, Badr Youbi Idrissi, Christian Keller, Daniel Haziza, David Zhang, Dmitrii Pedchenko, Emily McMilin, Fabian Gloeckle, Felix Kreuk, Francisco Massa, François Fleuret, Gabriel Synnaeve, Gal Cohen, Gallil Maimon, Jacob Kahn, Jade Copet, Jannik Kossen, Jonas Gehring, Jordi Armengol-Estape, Juliette Decugis, Keyur Muzumdar, Kunhao Zheng, Luca Wehrstedt, Maximilian Beck, Michael Hassid, Michel Meyer, Naila Murray, Oren Sultan, Ori Yoran, Pedram Bashiri, Peter O'Hearn, Pierre Chambon, Pierre-Emmanuel Mazaré, Quentin Carbonneaux, Rahul Kindi, Sida Wang, Taco Cohen, Vegard Mella, Yossi Adi, Yuxiang Wei, Zacharias Fisches
September 24, 2025

Our approach
Latest news
Foundational models