Extracting subtitles

for all subjects/topics not covered by the other forum categories
Post Reply
User avatar
geraldholdsworth
Posts: 1406
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

Extracting subtitles

Post by geraldholdsworth »

Hi all,

I've got a VOB file which plays in VLC and I can display subtitles (in a variety of languages). But, I want these subtitles, in the appropriate languages, in a text file with timecode markers. This is not from a commercial DVD, so these subtitles won't be online.

Anyone got any ideas?
BTW, I'm using a Mac.

Cheers,

Gerald.
Gerald Holdsworth, CTS-D
Extron Authorised Programmer
https://www.geraldholdsworth.co.uk
https://www.reptonresourcepage.co.uk
Twitter @radiogezza
sP1d3r
Posts: 769
Joined: Fri Aug 23, 2019 2:40 am
Contact:

Re: Extracting subtitles

Post by sP1d3r »

geraldholdsworth wrote: Sat Feb 17, 2024 12:54 pm I want these subtitles, in the appropriate languages, in a text file with timecode markers.
Called a transcript, I believe.
Boydie
Posts: 770
Joined: Sat Oct 24, 2015 9:25 am
Location: Sunny Wigan
Contact:

Re: Extracting subtitles

Post by Boydie »

Ffmpeg? Don’t know if it preserves the timecodes, but there must be an option…

https://trac.ffmpeg.org/wiki/ExtractSubtitles

VLC can allegedly do it as well. Apparently Handbrake can too.
User avatar
geraldholdsworth
Posts: 1406
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

Re: Extracting subtitles

Post by geraldholdsworth »

Tried Handbrake - couldn't work out how to get it to do it.

Couldn't figure out how to install ffmpeg. Then I found out I can do it through homebrew. Still won't touch the subtitles. It claims that there isn't any...but there is. Apparently, 16 languages - only need the 6 of them (well, the client says we don't need Japanese, so that makes it 5).
Gerald Holdsworth, CTS-D
Extron Authorised Programmer
https://www.geraldholdsworth.co.uk
https://www.reptonresourcepage.co.uk
Twitter @radiogezza
Boydie
Posts: 770
Joined: Sat Oct 24, 2015 9:25 am
Location: Sunny Wigan
Contact:

Re: Extracting subtitles

Post by Boydie »

Is the VOB all you have, or do you have a DVD (either physical or iso)?
It seems not all subtitle data on DVDs is stored in the VOB; some is stored in other files. Players such as VLC seem to be able to cope without the missing data, but transcoders are apparently more fussy.

If you have the “DVD”, maybe try transcoding the relevant title(s) into individual container file(s) such as mkv (remembering to include the subtitles). Hopefully ffmpeg will have more luck in finding the subtitle data in the mkv than vob.
Vivoii
Posts: 1
Joined: Sun Feb 18, 2024 10:03 am
Contact:

Re: Extracting subtitles

Post by Vivoii »

There are various tools for subtitle extraction but if you want them as a text file you’ll need optical character recognition (OCR) software as well.

The best tool I’ve used for both extraction and OCR is Subtitle Edit (freeware). It’s for Windows and Linux, but I’ve used it on a Mac, running within Crossover (an app that lets you run Windows software without installing a virtual machine).

There’s a guide here: https://iamscum.wordpress.com/guides/ocr/
sP1d3r
Posts: 769
Joined: Fri Aug 23, 2019 2:40 am
Contact:

Re: Extracting subtitles

Post by sP1d3r »

This relies on Windows programs to extract the subtitles but they'd subsequently need converting into a transcript which may not be possible;

https://www.youtube.com/watch?v=ru1aDajSe9g
sP1d3r
Posts: 769
Joined: Fri Aug 23, 2019 2:40 am
Contact:

Re: Extracting subtitles

Post by sP1d3r »

Perhaps converting into a Youtube video and then using Youtube transcription would work.
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: Extracting subtitles

Post by scruss »

geraldholdsworth wrote: Sat Feb 17, 2024 11:56 pm Couldn't figure out how to install ffmpeg. Then I found out I can do it through homebrew. Still won't touch the subtitles. It claims that there isn't any...but there is.
What does ffmpeg -i yourfile.vob say about subtitles? If it can't find subtitle text streams, then your client's video file has the subtitles "burned in" as image frames with no explicit text. Those require the OCR tricks described above, and lots and lots of proof-reading. This can be expensive and slow to get right.
User avatar
geraldholdsworth
Posts: 1406
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

Re: Extracting subtitles

Post by geraldholdsworth »

Boydie wrote: Sun Feb 18, 2024 1:23 am Is the VOB all you have, or do you have a DVD (either physical or iso)?
I've got a ripped copy of the DVD (so I've got the IFO and BUP files too in a VIDEO_TS folder. I can get hold of the physical DVD.

I've got my collegue working on it too - he's managed to extract the English subtitles, so far, into an SRT file, which I can use. Just need the other five languages.
Gerald Holdsworth, CTS-D
Extron Authorised Programmer
https://www.geraldholdsworth.co.uk
https://www.reptonresourcepage.co.uk
Twitter @radiogezza
User avatar
geraldholdsworth
Posts: 1406
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

Re: Extracting subtitles

Post by geraldholdsworth »

scruss wrote: Sun Feb 18, 2024 4:30 pm What does ffmpeg -i yourfile.vob say about subtitles? If it can't find subtitle text streams, then your client's video file has the subtitles "burned in" as image frames with no explicit text. Those require the OCR tricks described above, and lots and lots of proof-reading. This can be expensive and slow to get right.
It says:

Code: Select all

Input #0, mpeg, from 'Urquhart_Castle_Cinema.VOB':
  Duration: 00:08:38.02, start: 0.060000, bitrate: 8971 kb/s
  Stream #0:0[0x1bf]: Data: dvd_nav_packet
  Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, bottom first), 720x576 [SAR 16:15 DAR 4:3], 6000 kb/s, 25 fps, 25 tbr, 90k tbn
    Side data:
      cpb: bitrate max/min/avg: 6000000/0/0 buffer size: 1835008 vbv_delay: N/A
  Stream #0:2[0x85]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:3[0x84]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:4[0x83]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:5[0x82]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:6[0x81]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
  Stream #0:7[0x80]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
I know that there are 6 (at least) audio streams, one for each language. VLC can display the subtitles, in the different languages.
Gerald Holdsworth, CTS-D
Extron Authorised Programmer
https://www.geraldholdsworth.co.uk
https://www.reptonresourcepage.co.uk
Twitter @radiogezza
User avatar
scruss
Posts: 653
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: Extracting subtitles

Post by scruss »

No text subtitles found in your ffmpeg output, alas. Your colleague's SRT must've used the OCR method
User avatar
rmbrowngr
Posts: 620
Joined: Sat Jan 13, 2018 12:46 pm
Location: Dionysos, Greece
Contact:

Re: Extracting subtitles

Post by rmbrowngr »

I’ve used subtitle edit to ocr subtitles https://www.nikse.dk/subtitleedit. It’s free and quite useful.
Richard B
Acorn Electrons issue 4 and 6, MRB, Plus 1 x2, Plus 3, AP6 x2, AP5, Pegasus 400, BeebSCSI, Gotek, Raspberry Pi Co-processor, GoSDC MBE.
BBC B+ 64K (128K upgraded) with Duel OS, Raspberry Pi Co-processor and Gotek.
User avatar
geraldholdsworth
Posts: 1406
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

Re: Extracting subtitles

Post by geraldholdsworth »

scruss wrote: Tue Feb 20, 2024 3:34 am No text subtitles found in your ffmpeg output, alas. Your colleague's SRT must've used the OCR method
VLC greys out the Subtitle Track sub menu until the narrator starts speaking. Then I get a choice of about 16 tracks, and can change the font, colour, size, etc. which makes me believe they're in there, just not at the start.
Gerald Holdsworth, CTS-D
Extron Authorised Programmer
https://www.geraldholdsworth.co.uk
https://www.reptonresourcepage.co.uk
Twitter @radiogezza
Post Reply

Return to “off-topic”