NLP

Neurons in Large Language Models: Dead, N-gram, Positional

June 25, 2024

Abstract

We analyze a family of large language models in such a lightweight manner that can be done on a single GPU. Specifically, we focus on the OPT family of models ranging from 125m to 66b parameters and rely only on whether an FFN neuron is activated or not. First, we find that the early part of the network is sparse and represents many discrete features. Here, many neurons (more than 70% in some layers of the 66b model) are "dead", i.e. they never activate on a large collection of diverse data. At the same time, many of the alive neurons are reserved for discrete features and act as token and n-gram detectors. Interestingly, their corresponding FFN updates not only promote next token candidates as could be expected, but also explicitly focus on removing the information about triggering them tokens, i.e., current input. To the best of our knowledge, this is the first example of mechanisms specialized at removing (rather than adding) information from the residual stream. With scale, models become more sparse in a sense that they have more dead neurons and token detectors. Finally, some neurons are positional: them being activated or not depends largely (or solely) on position and less so (or not at all) on textual data. We find that smaller models have sets of neurons acting as position range indicators while larger models operate in a less explicit manner.

Download the Paper

AUTHORS

Written by

Elena Voita

Javier Ferrando Monsonis

Christoforos Nalmpantis

Publisher

ARR, then ACL

Related Publications

May 14, 2025

HUMAN & MACHINE INTELLIGENCE

SPEECH & AUDIO

Emergence of Language in the Developing Brain

Linnea Evanson, Christine Bulteau, Mathilde Chipaux, Georg Dorfmüller, Sarah Ferrand-Sorbets, Emmanuel Raffo, Sarah Rosenberg, Pierre Bourdillon, Jean Remi King

May 14, 2025

April 25, 2025

RESEARCH

NLP

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Qiao Rui, Varsha Kishore, Niklas Muennighoff, Victoria Lin, Daniela Rus, Bryan Kian Hsiang Low, Sewon Min, Scott Yih, Pang Wei Koh, Luke Zettlemoyer

April 25, 2025

April 17, 2025

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

Collaborative Reasoner: Self-improving Social Agents with Synthetic Conversations

Ansong Ni, Ruta Desai, Yang Li, Xinjie Lei, Dong Wang, Ramya Raghavendra, Gargi Ghosh, Daniel Li (FAIR), Asli Celikyilmaz

April 17, 2025

April 04, 2025

NLP

CORE MACHINE LEARNING

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar

April 04, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.