May 14, 2021
Every day, we’re inundated with a constant stream of information — most of which we’ll forget. Sure, you can probably remember what you had for breakfast this morning, but what about last year? We often take for granted the ability to forget mundane, day-to-day details to make room for valuable moments that matter in our long-term memory bank. Chances are, you’ll always remember that time your significant other surprised you with heart-shaped pancakes or your favorite bakery on your first trip to Paris.
Unlike human memory, most neural networks typically process information indiscriminately. At a small scale, this is functional. But current AI mechanisms used to selectively focus on certain parts of their input struggle with ever-larger quantities of information, like long-form books or videos, incurring unsustainable computational costs.
As a step toward achieving humanlike memory in machines, we’re announcing a novel method in deep learning: Expire-Span, a first-of-its-kind operation that equips neural networks with the ability to forget at scale. It has an order of magnitude more memory capacity than standard attention models. It works by first predicting information that’s most relevant to the task at hand. Based on the context, Expire-Span then assigns an expiration date to each piece of information — much like the expiration date on a bottle of milk. When the date has passed, the information gradually expires from the AI system. Intuitively, more relevant information is retained longer, while irrelevant information expires more quickly. With more memory space, AI systems can process information at drastically larger scales. Let’s say that we task an AI agent to navigate and find the yellow door. Some models, like Standard Transformers, may indiscriminately memorize all information presented at each timestep to find the yellow door.
Since in the first frame the agent is given the task to find the yellow door, it can forget the unnecessary information that’s typically processed on the way to the door, and remember only that first frame containing the task description. This method sets a new state of the art on a widely used benchmark for character-level language modeling, and it improves efficiency across several long-context tasks in language modeling, reinforcement learning, object collision, and algorithmic tasks.
The main challenge with forgetting in AI is that it’s a discrete operation, meaning you either forget or not — there is no in-between. Optimizing such discrete operations is really hard, which is why most systems process information indiscriminately and incur heavy computational costs. Previous approaches to this problem often focus on compression, so information that’s far in the past is compressed to be smaller. While this allows the model to extend to longer ranges in the past, compression yields blurry versions of memory.
With Expire-Span, AI systems can gradually forget irrelevant information and continuously optimize such discrete operations in a highly efficient way.
Picture a neural network presented with a time series of, for example, words, images, or video frames. Expire-Span calculates the information’s expiration value for each hidden state each time a new piece of information is presented, and determines how long that information is preserved as a memory. This gradual decay of some information is key to keeping important information without blurring it. And the learnable mechanism allows the model to adjust the span size as needed. Expire-Span calculates a prediction based on context learned from data and influenced by its surrounding memories. As an example, if the model is training to perform a word prediction task, it’s possible to teach AI to remember rare words such as names but forget very common, filler words such as the and of. By looking at previous contextual content, it predicts whether something can be forgotten. By learning from mistakes — over time — Expire-Span figures out what information is important.
Expire-Span scales to tens of thousands of pieces of information, has the ability to retain less than a thousand bits of it, and still has stronger performance with higher efficiency than alternative methods. After testing our model on several incredibly long context tasks, like language modeling and moving objects, Expire-Span improves all tasks with more efficiency and speed than previous models.
Expire-Span is inspired by the way humans retain memories. Our brains naturally make room for important knowledge by providing easy access for recollection rather than becoming overwhelmed with every detail. Similarly, Expire-Span helps AI keep data that’s useful for a given task and forgets the rest. The impressive scalability and efficiency of Expire-Span has exciting implications for one day achieving a wide-range of difficult, humanlike AI capabilities that otherwise would not be possible.
While this is currently research, we could see the Expire-Span method used in future real-world applications that might benefit from AI that forgets nonessential information. Theoretically, one day, Expire-Span could empower people to more easily retain information they find most important for these types of long-range tasks and memories.
Of course, human memories are highly complex. While Expire-Span focuses on memories of past experiences, there are many other types of human memories. Semantic memory, for instance, serves to store general, factual information. As a next step in our research toward more humanlike AI systems, we’re studying how to incorporate different types of memories into neural networks. So, in the long term, we can bring AI even closer to humanlike memory with capabilities of learning much faster than current systems. We believe Expire-Span is an important, exciting advancement toward such futuristic AI-powered innovations.