Research

NLP

On the Idiosyncrasies of the Mandarin Chinese Classifier System

June 16, 2019

Abstract

While idiosyncrasies of the Chinese classifier system have been a richly studied topic among linguists (Adams and Conklin, 1973; Erbaugh, 1986; Lakoff, 1986), not much work has been done to quantify them with statistical methods. In this paper, we introduce an information-theoretic approach to measuring idiosyncrasy; we examine how much the uncertainty in Mandarin Chinese classifiers can be reduced by knowing semantic information about the nouns that the classifiers modify. Using the empirical distribution of classifiers from the parsed Chinese Gigaword corpus (Graff et al., 2005), we compute the mutual information (in bits) between the distribution over classifiers and distributions over other linguistic quantities. We investigate whether semantic classes of nouns and adjectives differ in how much they reduce uncertainty in classifier choice, and find that it is not fully idiosyncratic; while there are no obvious trends for the majority of semantic classes, shape nouns reduce uncertainty in classifier choice the most.

Download the Paper

Related Publications

February 06, 2025

NLP

Brain-to-Text Decoding: A Non-invasive Approach via Typing

Jarod Levy, Mingfang (Lucy) Zhang, Svetlana Pinet, Jérémy Rapin, Hubert Jacob Banville, Stéphane d'Ascoli, Jean Remi King

February 06, 2025

February 06, 2025

NLP

From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production

Mingfang (Lucy) Zhang, Jarod Levy, Stéphane d'Ascoli, Jérémy Rapin, F.-Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean Remi King

February 06, 2025

November 16, 2022

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

October 31, 2022

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel

October 31, 2022

April 30, 2018

NLP

Speech & Audio

Identifying Analogies Across Domains | Facebook AI Research

Yedid Hoshen, Lior Wolf

April 30, 2018

November 01, 2018

NLP

Computer Vision

Non-Adversarial Unsupervised Word Translation | Facebook AI Research

Yedid Hoshen, Lior Wolf

November 01, 2018

December 02, 2018

NLP

Computer Vision

One-Shot Unsupervised Cross Domain Translation | Facebook AI Research

Sagie Benaim, Lior Wolf

December 02, 2018

June 30, 2019

NLP

Variational Training for Large-Scale Noisy-OR Bayesian Networks | Facebook AI Research

Geng Ji, Dehua Cheng, Huazhong Ning, Changhe Yuan, Hanning Zhou, Liang Xiong, Erik B. Sudderth

June 30, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.