RESEARCH

SPEECH & AUDIO

The emergence of number and syntax units in LSTM language models

March 18, 2019

Abstract

Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two "number units". Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs.

Download the Paper

AUTHORS

Written by

Marco Baroni

Théo Desbordes

Dieuwke Hupkes

Germán Kruszewski

Stanislas Dehaene

Yair Lakretz

Publisher

NAACL

Related Publications

October 16, 2024

SPEECH & AUDIO

COMPUTER VISION

Movie Gen: A Cast of Media Foundation Models

Movie Gen Team

October 16, 2024

October 04, 2024

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Bandhav Veluri, Benjamin Peloquin, Bokai Yu, Hongyu Gong, Shyam Gollakota

October 04, 2024

September 26, 2024

SPEECH & AUDIO

NLP

Unveiling the Role of Pretraining in Direct Speech Translation

Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussa

September 26, 2024

August 23, 2024

SPEECH & AUDIO

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

August 23, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.