FLoRes Benchmarking

FLoRes is a benchmark dataset for machine translation between English and low-resource languages.


One of the most exciting recent trends in Natural Language Processing is training a single system on multiple languages at once. A multilingual machine translation system is capable of translating a sentence into several languages, translating sentences from several languages into a given language, or any combination thereof.

This greatly simplifies system development and deployment, as only a single model needs to be built and used for all language pairs as opposed to one for each pair. This also improves the translation quality on low-resource pairs.

However, evaluation of this kind of translation system has been hindered by the lack of high-quality benchmarks and the absence of a standardized evaluation process.

The goal of Flores is to bring the community together on the topic of low-resource multilingual machine translation by introducing a realistic benchmark as well as a fair and rigorous evaluation process.

How Flores Works

The validation and test data are obtained from the Flores 101 evaluation benchmark. This is a high-quality benchmark that enables evaluation of MMT systems in more than a hundred languages. It supports many-to-many evaluation, as all sentences are aligned across all languages.

The training data is provided by the publicly available Opus repository, which contains data of varying quality from a variety of domains. In-domain Wikipedia monolingual data for each language will also be provided.

The actual evaluation will be performed on a dedicated server where you can upload your code.

Important Dates

· Release of dev and dev-test data: June 2021
· Evaluation server opening: June 4, 2021
· Online submissions available: July 15, 2021
· Notification of results: August 15, 2021


There will be two submission periods. During the first submission period, you can submit your code to a server in order to evaluate on a hidden test set. For the second and final submission period, you can submit your code to evaluate on a different hidden test set. You will be required to submit code that fits certain memory and compute requirements.

Submissions are now open. Learn more about the submission process here. If you have any questions, please contact us at

Submit to FLoRes Benchmark