December 12, 2024
Data is a critical resource for machine learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that creates a shared representation across ML tools, frameworks, and platforms. Croissant makes datasets more discoverable, portable, and interoperable, thereby addressing significant challenges in ML data management. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, enabling easy loading into the most commonly-used ML frameworks, regardless of where the data is stored. Our initial evaluation by human raters shows that Croissant metadata is readable, understandable, complete, yet concise.
Written by
Mubashara Akhtar
Omar Benjelloun
Costanza Conforti
Luca Foschini
Pieter Gijsbers
Joan Giner-Miguelez
Sujata Goswami
Nitisha Jain
Michalis Karamousadakis
Satyapriya Krishna
Michael Kuchnik
Sylvain Lesage
Quentin Lhoest
Pierre Marcenac
Manil Maskey
Peter Mattson
Luis Oala
Hamidah Oderinwale
Pierre Ruyssen
Tim Santos
Rajat Shinde
Elena Simperl
Arjun Suresh
Goeffry Thomas
Slava Tykhonov
Joaquin Vanschoren
Susheel Varma
Jos van der Velde
Steffen Vogler
Luyao Zhang
Publisher
NeurIPS
December 12, 2024
December 12, 2024
December 10, 2024
Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky Chen, David Lopez-Paz, Heli Ben Hamu, Itai Gat
December 10, 2024
December 09, 2024
Itai Gat, Tal Remez, Felix Kreuk, Ricky Chen, Gabriel Synnaeve, Yossef (Yossi) Adi, Yaron Lipman, Neta Shaul
December 09, 2024
November 20, 2024
Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao
November 20, 2024
Foundational models
Latest news
Foundational models