Launching the NetHack Challenge at NeurIPS 2021

June 9, 2021

Recent advances in reinforcement learning (RL) have been fuelled by simulation environments such as games like StarCraft II, Dota 2 or Minecraft. However, this progress came at substantial computational costs, often requiring running thousands of GPUs in parallel for a single experiment, while also falling short of leading to RL methods that can be transferred to more real-world problems outside of these games. We need environments that are complex, highlighting open research problems in RL such as generalization to a large variability of observations and vast richness of entities and their dynamics, while also allowing extremely fast simulation at low computation costs. To this end, Facebook AI open-sourced the NetHack Learning Environment (NLE) last year. This year as part of a NeurIPS 2021 competition, we are proud to launch the NetHack Challenge—the most accessible grand challenge for AI research—with our partner and co-organizer AIcrowd.

NetHack is frequently referred to as one of the hardest games in the world. Winning a game of NetHack requires long term planning in an incredibly unforgiving environment. Once a player’s character dies (often in unexpected ways), the game starts from scratch in an entirely new dungeon. Successfully completing the game as an expert player takes on average 25-50x more steps than an average StarCraft II game, and players’ interactions with objects and the environment are extremely complex, so success often hinges on calling upon imagination to solve problems in creative or surprising ways as well as consulting external knowledge sources like the NetHack Wiki. However, since NetHack is terminal-based, we can simulate it extremely fast, training agents for over 1.2 billion steps a day using only two GPUs. In other words, the NetHack Challenge hits the sweet spot of being able to test the abilities of state-of-the-art AI methods in a complex and rich environment without the need to run experiments on a supercomputer.

The competition runs from early June through October 15, and the winners will be announced at NeurIPS in December.

A decades-old game that is ideal for AI research

NetHack is a visually simple but complex and incredibly difficult “dungeon crawler” adventure game that has been under active development since the 1980s. It remains popular with a large and diverse community players and it is entirely free to play.

These examples of NetHack levels showcase the game’s diversity and complexity as well as the variety of challenges the agent may encounter. These range from randomized mazes to more structured challenges, like large rooms full of monsters and traps, towns and forts, and hazards such as kraken-infested waters. They are rendered using more visually engaging sprites on top of the more “bare-metal” ASCII observation layer that aficionados usually play the game under.

Challenges for the state of the art of RL run rife in NetHack: Partial observation makes exploration essential. Procedural generation and “permadeath” make the cost of failure significant. Agents cannot reset or interfere with the environment, making methods like Monte Carlo Tree Search (underpinning agents such as AlphaZero for StarCraft II or GoExplore for Montezuma’s Revenge) impossible. New ways of dealing with the ever changing observations in a stochastic and rich game world calls for the development of techniques that have a better chance of scaling to real-world settings with high degrees of variability.

This table compares NetHack with other standard RL benchmarks based on how fast environments run in simulation. Running speed is measured by the number of steps per second (SPS), which roughly corresponds to how many interactions (receiving observations, making decisions, and taking actions) with the environment an artificial agent will be able to have during training and evaluation.

Although 2D sprite-based or isometric views of the game exist, true aficionados often use the original interface: a world composed entirely of ASCII characters representing different objects and features (wands, weapons, armors, potions, spellbooks, walls, creatures etc.). The happy byproduct of this unusual contrast of complex gameplay and basic visuals is that using the game to train RL agents is 15x faster than even the decade-old Atari benchmark. Furthermore, NetHack can be used to test the limits of even more recent state-of-the-art deep RL methods while running 50-100x faster than challenges of comparable difficulty while providing a higher degree of complexity.

Even with its simple ASCII visuals, every NetHack game is different, testing the generalization limits of current state-of-the-art approaches. In order to master the game, human players often have to consult external resources such as the NetHack Wiki to identify critical strategies or discover new paths forward.

How the NetHack Challenge will help advance AI

The NetHack Challenge invites entrants — using any method they please — to develop agents that can reliably either beat the game or (in the more likely scenario) achieve as high a score as possible. In doing so, the challenge aims to achieve three things:

  1. Yield a broad head-to-head comparison of methods on NetHack and develop new benchmarks for future research.

  2. Showcase the suitability of the NetHack Learning Environment as a setting for groundbreaking RL research in both industry labs and academia at a comparably low computational cost.

  3. Provide a stage for a showdown between neural and symbolic methods in sequential decision-making problems. While deep neural networks form the basis of most contemporary RL agents, there is a large community of NetHack bot developers who will be invited to submit entries to the competition for direct comparison with neural agents.

Achieving these objectives will lay the groundwork for a series of follow-up competitions in this setting, focusing on specific aspects of the learning problem (leveraging expert play or external knowledge, obtaining a particular level of performance with low resource usage, etc.). It will also help bring light to classes of training methods and modeling approaches that are capable of dealing with complex, highly varied environments and a high cost of errors (i.e., having to restart from scratch if your character is killed). Many real-world and industrial problems — navigation, for example — share these characteristics. Consequently, making progress in NetHack is making progress toward RL in a wider range of applications.

Challenge details

The challenge asks participants to produce, by whatever means they wish, agents capable of playing the full game of NetHack. No restrictions are placed on how the agent is trained (and participants are welcome to use other techniques besides machine learning if they choose). Contestants will use their own hardware and evaluation will be performed in a controlled setting — thanks to our partner and co-organizer, AIcrowd — where the candidate agents will play a number of games, each with a randomly drawn character role and fantasy race. For a given set of evaluation episodes for an agent, the average number of episodes where the agent completes the game will be computed, along with the median in-game end-of-episode score. Entries will be ranked by average number of wins and, if tied, by median score. Contestants are encouraged, but in no way required, to incorporate this scoring mechanism into the training process of their agent (where relevant).

There will be three competition tracks:

  1. Best overall agent, awarded to the best-performing agent in the competition. All submitted agents qualify for this track.

  2. Best agent not using a neural network, awarded to the best-performing agent not using a neural network or significantly similar modeling technique. This includes, most prominently, agents that are not underpinned by parametric models like deep neural networks.

  3. Best agent from an academic/independent team, awarded to the best-performing agent produced by a team predominantly led by non-industry-affiliated researchers.

Contestants need not submit to any specific track but will automatically be ranked in any and all tracks for which their submission qualifies. The top-performing teams for each track will be invited to submit method videos to the NeurIPS competition event, as well as invited to participate in the writing of a post-competition report.

New ways to advance AI collaboratively

RL and other subfields of AI advance more quickly when researchers can easily compare their results and learn from one another’s work. Facebook AI has leveraged this open approach to spur progress in other research areas, such as with the Deepfake Detection Challenge, the Hateful Memes Challenge, and the Habitat Challenge. We are excited to see how others approach the NetHack Challenge, and we look forward to sharing results and insights.

Information about the competition and links to resources, tutorials, and information about submission will appear on the official website when ready.

Written By

Research Scientist

Research Scientist

Eric Hambro

Research Engineer

Heinrich Küttler

Research Engineer