December 12, 2024
Data is a critical resource for machine learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that creates a shared representation across ML tools, frameworks, and platforms. Croissant makes datasets more discoverable, portable, and interoperable, thereby addressing significant challenges in ML data management. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, enabling easy loading into the most commonly-used ML frameworks, regardless of where the data is stored. Our initial evaluation by human raters shows that Croissant metadata is readable, understandable, complete, yet concise.
Written by
Mubashara Akhtar
Omar Benjelloun
Costanza Conforti
Luca Foschini
Pieter Gijsbers
Joan Giner-Miguelez
Sujata Goswami
Nitisha Jain
Michalis Karamousadakis
Satyapriya Krishna
Michael Kuchnik
Sylvain Lesage
Quentin Lhoest
Pierre Marcenac
Manil Maskey
Peter Mattson
Luis Oala
Hamidah Oderinwale
Pierre Ruyssen
Tim Santos
Rajat Shinde
Elena Simperl
Arjun Suresh
Goeffry Thomas
Slava Tykhonov
Joaquin Vanschoren
Susheel Varma
Jos van der Velde
Steffen Vogler
Luyao Zhang
Publisher
NeurIPS
May 14, 2025
Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick
May 14, 2025
May 14, 2025
Linnea Evanson, Christine Bulteau, Mathilde Chipaux, Georg Dorfmüller, Sarah Ferrand-Sorbets, Emmanuel Raffo, Sarah Rosenberg, Pierre Bourdillon, Jean Remi King
May 14, 2025
April 04, 2025
Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar
April 04, 2025
February 28, 2025
Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu
February 28, 2025
Our approach
Latest news
Foundational models