MUSE (Multilingual Unsupervised and Supervised Embeddings) is a Python library that enables faster and easier development and evaluation of cross-lingual word embeddings and natural language processing. This library enables researchers and developers to ship their AI technologies to new languages faster.
MUSE takes a novel approach to natural language processing. Rather than relying on language-specific training or intermediary translations in order to classify text, it utilizes multilingual word embeddings to enable training across many languages to help developers scale.
MUSE is compatible with fastText, and offers large-scale, high-quality bilingual dictionaries for training and evaluation. It's available on CPU or GPU, in Python 2 or 3.
Clone Muse and get monolingual and cross-lingual word embeddings evaluation datasets.
cd ./MUSE/ ./data/get_evaluation.sh
Download monolingual word embeddings.
# English fastText Wikipedia embeddings curl -Lo data/wiki.en.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.vec # Spanish fastText Wikipedia embeddings curl -Lo data/wiki.es.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.es.vec
Review documentation to familiarize yourself with MUSE dictionaries and word embeddings.
Experiment with supervised and unsupervised training.