Extracting DVD Subtitles Software name : Avidemux
If you want to extract subtitle files from a DVD you should understand a little how they work. Subtitles in DVDs are contained in VOB files along with the main video and audio streams. We can call them all streams here to account for the difference between a self contained file and a stream. Several streams can be included in a file. The subtitles you see on a DVD are streams of images files which appear one after the other. Each stream displays a different language. When we extract these streams of subtitles the most handy format we can save them as is actually a text file which has the timecode of when the text appears. If the subtitle file you have is in text rather than image format it makes it easier to edit it and translate it. You can easily send that file via the internet or put it on a website for others to download. In order to create a text-based subtitle file we first need to extract the images files from the DVD to two files:
We can then convert those files into a single text based subtitle file. There are many different formats but Avidemux uses a very compatible one with the '.srt' extention. note : Screenshots in the following explanation are a combination of Ubuntu (Linux) and Windows operating systems. Avidemux works well in both and the interface looks the same except for a few color differences.
Extracting to an idx / VobSub file From the Tools menu select 'VOB' and then 'VobSub'
Then you should see the following screen asking you to Browse for three things.
Finding the VOB Files
Locating the IFO file
Select where to save the VobSub files
Saving your filesWhen you have found or selected all the files. Then click 'OK' to shut the small window with the small buttons : and you'll get a window telling you how long the process will take. When this process is complete you will have created a new .idx file and and new .sub file. These will be saved in the directory you choose for saving the .idx file. In my case I saved them to the desktop : Making the '.srt' FileNow we want to merge the idx file and the .sub file into a '.srt' file. Click on the top menu 'Tools' and then 'OCR (VobSub -> Srt)': You should see a window titled 'MiniOCR'. Click on the 'Open' button under 'VobSub'. You will then see a window called 'VobSub Settings'. Click on 'Select .idx' and browse for and select the idx file you created in the 'Extracting to an idx / VobSub file' section. Click on 'Open' when you have selected the idx file. You should return to the 'VobSub Settings' window : If the DVD you are using has more than one language it should be displayed in the 'Select Language' drop down box. Select the language you want to create a subtitle file for. When you have the right language selected click 'OK', and you should return to the 'MiniOCR' window. Now you need to select a place on your computer to save the target *.srt file to. Click on the 'Save' button in the 'Output srt' section : You will see a window asking you to choose a folder to save the srt file in. Browse until you find the right place. When you have, give the file a name by typing in a name in the box at the top make sure the name ends in '.srt' and then click 'Save' Now you have set your input and output files you can start the process of converting the images file in to a text file. This process is called OCR. Click 'Start OCR'. You should see a window like this: The OCR (Optical Character Recognition) process needs you to tell it what the characters (letters and numbers + symbols) in the subtitles are. It will display a character from the image subtitle and you have to then tell the application what the corresponding text character is. Avidemux will show you a phrase and one character for that phrase like this: Now you must type the right character in the empty text field. You do this because it is more accurate for you to specific exactly what the characters are than for the application to guess. Where it says 'Current Glyph Text:' and shows an image of a character you need to enter that character using the keyboard in the box below and then click 'OK'. It does make a difference if it is a capital letter or a lower case letter. Also this process is very unforgiving at the moment. There is no undo option, so don't get it wrong! Sometimes 2 characters well be selected. You should enter those two characters and click enter. This may seem to be taking a long time but when you have entered all the characters and numbers the program should fly through the subtitles. You should be able to process a 90 minute film in 5 -10 minutes. When you are finished the '.srt' file you saved will have the right timecode and subtitle information in it. You can open it with a text editor and it should look something like this: 1 00:00:10,991 --> 00:00:13,991 Man: Mick Jagger 2 00:00:18,565 --> 00:00:21,565 - Mick Jagger - Thank you 3 00:00:32,479 --> 00:00:35,479 - Man: Mick Jagger. - ( police radio squelch ) 4 00:01:04,778 --> 00:01:06,011 Man: one minute! one minute! |