August 24, 2014
Counting the number of distinct elements in a large dataset is a common task in web applications and databases. This problem is difficult in limited memory settings where storing a large hash table table is intractable. This paper advances the state of the art in probabilistic methods for estimating the number of distinct elements in a streaming setting. New streaming algorithms are given that provably beat the “optimal” errors for Min-count and HyperLogLog while using the same sketch.
This paper also contributes to the understanding and theory of probabilistic cardinality estimation introducing the concept of an area cutting process and the martingale estimator. These ideas lead to theoretical analyses of both old and new sketches and estimators and show the new estimators are optimal for several streaming settings while also providing accurate error bounds that match those obtained via simulation.
Furthermore, the area cutting process provides a geometric intuition behind all methods for counting distinct elements which are not affected by duplicates. This intuition leads to a new sketch, Discrete Max-count, and the analysis of a class of sketches, self-similar area cutting decompositions that have attractive properties and unbiased estimators for both streaming and non-streaming settings.
Together, these contributions lead to multi faceted advances in sketch construction, cardinality and error estimation, the theory, and intuition for the problem of approximate counting of distinct elements for both the streaming and non-streaming cases.
April 17, 2025
Ansong Ni, Ruta Desai, Yang Li, Xinjie Lei, Dong Wang, Ramya Raghavendra, Gargi Ghosh, Daniel Li (FAIR), Asli Celikyilmaz
April 17, 2025
April 16, 2025
Paul McVay, Sergio Arnaud, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Mike Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
April 16, 2025
April 14, 2025
Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schoenfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu
April 14, 2025
March 24, 2025
Wassim (Wes) Bouaziz, Nicolas Usunier, El Mahdi El Mhamdi
March 24, 2025
April 08, 2021
Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer
April 08, 2021
April 30, 2018
Tomer Galanti, Lior Wolf, Sagie Benaim
April 30, 2018
April 30, 2018
Yedid Hoshen, Lior Wolf
April 30, 2018
December 11, 2019
Eliya Nachmani, Lior Wolf
December 11, 2019
Foundational models
Our approach
Latest news
Foundational models