Mozilla Developing Whisperfile For Local Audio-To-Text Translation
([Mozilla] 5 Hours Ago
Whisperfile)
- Reference: 0001487051
- News link: https://www.phoronix.com/news/Mozilla-Whisperfile
- Source link:
The Mozilla Ocho group leads "innovation and experiments" at Mozilla. Following all of their work on [1]Llamafile for easily distributing large language models as a single file that can be easily executed across different hardware/software, their newest effort is Whisperfile for easy audio-to-text translations.
Whisperfile is a new initiative for easily turning audio into text. As implied by the name, Whisperfile is built around OpenAI's Whisper model for local audio/language translation. Whisperfile is based on the Whisper.cpp sources and is also able to translate non-English audio into English as part of the transcribing process.
Whisperfiles bundle in the weights and can be easily run across Linux, Windows, macOS, FreeBSD, OpenBSD, and NetBSD systems. Whisperfiles currently work on both x86_64 and AArch64.
Those wanting to learn more about Whisperfile can do so via [2]Mozilla/whisperfile on HuggingFace .
Longtime followers may also recall that Mozilla previously developed [3]DeepSpeech as an open-source, offline speech-to-text engine. DeepSpeech leveraged TensorFlow and Baidu's Deep Speech research paper. Sadly DeepSpeech development was halted with prior Mozilla layoffs and its GitHub repository hasn't seen any commits now in three years.
For those interested I will be running some [4]Whisperfile benchmarks soon across various CPUs.
[1] https://www.phoronix.com/search/Llamafile
[2] https://huggingface.co/Mozilla/whisperfile
[3] https://www.phoronix.com/search/DeepSpeech
[4] https://openbenchmarking.org/test/pts/whisperfile-1.0.0
Whisperfile is a new initiative for easily turning audio into text. As implied by the name, Whisperfile is built around OpenAI's Whisper model for local audio/language translation. Whisperfile is based on the Whisper.cpp sources and is also able to translate non-English audio into English as part of the transcribing process.
Whisperfiles bundle in the weights and can be easily run across Linux, Windows, macOS, FreeBSD, OpenBSD, and NetBSD systems. Whisperfiles currently work on both x86_64 and AArch64.
Those wanting to learn more about Whisperfile can do so via [2]Mozilla/whisperfile on HuggingFace .
Longtime followers may also recall that Mozilla previously developed [3]DeepSpeech as an open-source, offline speech-to-text engine. DeepSpeech leveraged TensorFlow and Baidu's Deep Speech research paper. Sadly DeepSpeech development was halted with prior Mozilla layoffs and its GitHub repository hasn't seen any commits now in three years.
For those interested I will be running some [4]Whisperfile benchmarks soon across various CPUs.
[1] https://www.phoronix.com/search/Llamafile
[2] https://huggingface.co/Mozilla/whisperfile
[3] https://www.phoronix.com/search/DeepSpeech
[4] https://openbenchmarking.org/test/pts/whisperfile-1.0.0
oleid