Wav2letter is an end-to-end Automatic Speech Recognition (ASR) system for researchers and developers to transcribe speech.
Wav2letter implements the architecture proposed in Wav2Letter: an End-to-End ConvNet-based Speech Recognition System and Letter Based Speech Recognition with Gated ConvNets. It provides pre-trained models for the Librispeech dataset to help developers start transcribing speech right away.
If you plan to train on a CPU, install Intel MKL. For training on a GPU, install NVIDIA CUDA Toolkit. Install LuaJIT + LuaRocks, KenLM, OpenMPI, and TorchMPI as needed to support development.
Install Torch and Torch packages.
luarocks install torch luarocks install cudnn # for GPU support luarocks install cunn # for GPU support
Install wav2letter packages.
git clone https://github.com/facebookresearch/wav2letter.git cd wav2letter cd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd .. cd speech && luarocks make rocks/speech-scm-1.rockspec && cd .. cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd .. cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd .. # Assuming here you got KenLM in $HOME/kenlm # And only if you plan to use the decoder: cd beamer && KENLM_INC=$HOME/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..
Download pre-trained models and iterate on them or build and train new models.
Foundational models
Latest news
Foundational models