August 01, 2022
Maintainable, high quality, rapidly built, scalable ML datasets have been fundamental for multiple AI production applications that we have worked on. How have we gone about building these ML datasets in a systematic way? Our approach has included defining a set of operational metrics for ML data. Our framework for organizing those metrics focuses on goals that we have: time to launch, effect on model performance, properties of the data, data quality, and tracking dataset and historical changes. In each area, we have defined more detailed metrics and created operational processes to track them. Through disciplined tracking, we have seen the benefits of ML dataset improvements on ML performance improvements in diverse examples.
Written by
Anoop Sinha
Gunveer Gujral
Liz Jenkins
Nicolas Scheffer
Publisher
ICML 2022 Workshop on DataPerf
Research Topics
Core Machine Learning
Foundational models
Latest news
Foundational models