FFmpeg 8 Can Now Subtitle Your Videos on the Fly (theregister.com)
(Thursday August 28, 2025 @11:30PM (msmash)
from the pushing-the-limits dept.)
- Reference: 0178919010
- News link: https://news.slashdot.org/story/25/08/28/1844210/ffmpeg-8-can-now-subtitle-your-videos-on-the-fly
- Source link: https://www.theregister.com/2025/08/28/ffmpeg_8_huffman/
FFmpeg 8.0 brings GPU-accelerated video encoding via Vulkan -- and can [1]now subtitle your videos automatically using integrated speech recognition. From a report:
> At the start of the week, the FFmpeg project released its eighth major version. It's codenamed "Huffman" after the Huffman code algorithm, which was invented in 1952, making it one of the oldest lossless compression algorithms.
>
> [...] The changelog lists 30 significant changes, of which the top new feature is integrating Whisper. This means whisper.cpp, which is Georgi Gerganov's entirely local and offline version of OpenAI's Whisper automatic speech recognition model. The bottom line is that FFmpeg can now automatically subtitle videos for you.
[1] https://www.theregister.com/2025/08/28/ffmpeg_8_huffman/
> At the start of the week, the FFmpeg project released its eighth major version. It's codenamed "Huffman" after the Huffman code algorithm, which was invented in 1952, making it one of the oldest lossless compression algorithms.
>
> [...] The changelog lists 30 significant changes, of which the top new feature is integrating Whisper. This means whisper.cpp, which is Georgi Gerganov's entirely local and offline version of OpenAI's Whisper automatic speech recognition model. The bottom line is that FFmpeg can now automatically subtitle videos for you.
[1] https://www.theregister.com/2025/08/28/ffmpeg_8_huffman/
If it is as good as Youtube, I'll pass (Score:5, Insightful)
by thesjaakspoiler ( 4782965 )
Youtube's automatic subtitling is a piece of junk.
Re: (Score:2)
by Zontar_Thing_From_Ve ( 949321 )
> Youtube's automatic subtitling is a piece of junk.
Yeah, it really is. I'd have modded you up, but no points.
Re: (Score:2)
by vbdasc ( 146051 )
It does many mistakes, perhaps too many - this can't be denied, but it can still be a life saver for people with poor hearing or poor command of spoken English.
Re: (Score:2)
by Valgrus Thunderaxe ( 8769977 )
Try turning off your audio just relying on their generated CC's. It's just a bunch of incomprehensible nonsense.
Not on Wayland (Score:1)
by kurt_cordial ( 6208254 )
The last fs I trusted was Windows 10. We had airport sysadmin delete the regex hotfix. Not worth recommending! I don't know why people insist on stupid commentary.
Shit (Score:2)
Looks like ffmpeg is the latest enshittification victim.
Re:Shit in a divacup (Score:1)
I am sure as HELL not going to manually subtitle the [1]rsilvergun sextape [xhamster.com].
I watched the whole thing and would never try to transcribe it. Dude literally drinks his top's piss out of a glass.
[1] https://xhamster.com/videos/give-me-your-piss-and-fuck-my-throat-until-you-cum-on-my-face-xhYQmc9
Re: (Score:2)
At least it's local and offline.
Re:Shit (Score:4, Interesting)
Not entirely.
Whisper actually works rather well in several specific use cases, and fails spectacularly in others. You need to know this in advance:
- Whisper is roughly 90% accurate at transcription and translation
- Whisper absolutely does not know what to do with silence and will randomly inject "subtitled by (fansub group, netflix, etc)" into silence
- Whisper does not really understand singing well
- Whisper does not understand code-switching (eg switching between English and Japanese in the same context window)
- Whisper understands zero onomatopoeia, just like all ASR systems.
With that said, it is not useful or reliable for:
1. Fansubbing, especially anything adult. It can only understand words, not onomatopoeia. So when it stumbles into a scene where someone goes "ah!" it has zero context for it. The result is actually pretty silly, and often turns sex scenes in R-rated and unrated media into a series of random gibberish words that begin with the same sound. Likewise children playing and women giggling often turns it into a series of nonsense, sometimes sexually charged words.
2. Transcription of podcasts. Sorry bub, your average podcaster has a shitty microphone, and can not subtitle when multiple people are speaking over each other. Especially when people use Zoom or Discord to have a multi-party video. If you want to use it to transcribe a podcast, record each participant separately and merge the result.
3. ASR technology is often built on corpus of bad data that elevates profanity when it tries to guess words it can not understand. So it's more likely to use racist language "trigger" becomes the same word with an n, that isn't even in the audio. So your input source must be professional grade, or it's word error rate will be higher and favor profanity or racist language over other more less-often but more obvious words.
I doubt most people will use this in practice as Whisper.cpp is insanely slow without being expressly used on a 16GB nvidia GPU anyway.