WebVTT Captioning & Subtitle Support

WebVTT Captioning & Subtitle Support

Looking to add closed captions or subtitles to your video content? Need WebVTT captioning support?

While closed captions can be associated with aiding the deaf and those hard of hearing, their benefit goes beyond this for video creators. In this article, learn about captions, why you should be investing in them and how viewing habits are changing in a way that increases their use. Also learn about the WebVTT format for subtitles and caption, how to create them and convert other subtitle formats to them as well. This is topped off with discussing how to add captions to your videos on IBM’s video streaming and enterprise video platform.

Note that in addition to support for WebVTT, IBM Watson Media also has support for automated closed captions through converting video speech to text utilizing IBM Watson.

What are closed captions?

Sometimes cited as simply CC, closed captions are optional text that can be displayed as part of the video. Text can be a transcript of the dialogue but can also be text cues for audible only events, like something that occurs off screen. They often are used to to assist those who are deaf and hard of hearing to still enjoy and understand video material. That said, there is a growing number of online viewers who are watching content totally muted, to which captions are a natural fit for this audience.

To learn more about captions, the reasons for using them and regulations behind them, check out our What is Closed Captioning article.

Closed captioning VODs

Once content owners begin to formulate a strategy for captioning their VODs (video on-demand) files, this will often involve associating this content with a caption file. These caption files will contain both text and time stamps, to denote when the caption or subtitle should appear and disappear from the screen.

There are a variety of different caption file formats to choose from. This can sometimes lead to content owners asking “which caption file format is the best?”. Features across the different formats are very similar, though, and it can often come down to what provider you’re using. In the case of IBM’s video streaming services, WebVTT (Web Video Text Tracks) is suggested.

What is WebVTT?

Sometimes abbreviated as VTT, which is the file extension, WebVTT is a caption format for displaying text tied to an HTML5 element. Due to general mobile capability, and being a W3C standard, WebVTT has emerged as a popular file format for closed captions.

In terms of structure, a VTT file could literally be created in a text editor, although there are more efficient ways to do it. Here is a sample of what might be in a VTT file.

Kind: captions
Language: en

00:00:00.000 --> 00:00:12.000
This building was built in 1954.

00:00:13.050 --> 00:00:15.800
For it's 50th anniversary, though, it was remodeled.

That’s it. If all someone had were two captions for their video, that would literally be the entire text file and then it could be saved as a “.vtt” and uploaded as a caption solution for that VOD (Video On-Demand). So while it’s possible to edit a text file directly to create a subtitle file, there are more efficient methods.

Creating WebVTT files: DIY

There are a variety of programs and closed captioning software that aid in the creation of WebVTT files. The following could be considered manual, or do it yourself (DIY), as they do require someone watching and captioning content. For a more “automatic approach”, there are paid services that will accept a video file and do the translation work for you. Assuming you’re looking to do this yourself, below are some programs to assist you.

Subtitle Edit (Windows)Creating WebVTT Files with Subtitle Edit
This free software, seen to the right. lets you playback a video file while creating the subtitles/captions. This makes it easy to sync up and view how the content will look with the captions. It also includes the ability to convert different formats. So if you have a format that is something like PAC (*.pac, binary) it can be used here and exported as WebVTT. In fact, the service touts compatibility across 200 different subtitle and caption formats.

Gaupol (Windows)
Another free program, Subtitle Workshop allows content creators to preview the video while creating captions. The key benefit of this in general is removing some of the headache of manually pausing a video and switching over to a notepad or something else where the captions are being created. This also gives an all important preview of the video with the subtitles, to proof for accuracy. Support for WebVTT was also recently added to this program back in April.

Subtitle Workshop (Windows)
This program, which is free, enables the viewing of the video while subtitles are being added to simplify this process. Undo and redo processes are also available, to aid in simplifying the creation process. These subtitles can then be saved in a variety of formats, including VTT files. The actual program has not been updated since 2013 and is largely considered discontinued, although still works in its current state.

There are additional programs that can be used to create subtitle files as well, such as Aegisub (Windows, Mac or UNIX) or Jubler (Windows or Mac). However, they create subs in a different format, like SRT (SubRip Text, aka SubRip Subtitles), and would need to be converted to WebVTT.

Convert SRT to WebVTT online

There can be scenarios where a content owner is generating or has a lot of captions in a different format, most often SRT. Short for SubRip Text, and sometimes abbreviated as SubRip Subtitles, this caption format has a slightly different structure. While WebVTT is setup as “timestamp” then “caption”, SRT goes caption/subtitle number, timestamp and then the caption. It also uses a comma as part of the timestamp. An example would be:

00:00:02,000 --> 00:00:13,400
This building was built in 1954.

Thankfully, there are numerous ways to convert these files to a WebVTT format without having to manually go through and change each line of caption information. While there are program solutions around this, such as the previously mentioned Subtitle Edit, many of these are online, cloud based solutions.

SubRip to WebVTT Converter (Online)
Very simple layout, this online solution allows content owners to upload a SRT file. With the click of a “Convert me, pleeeaaase!” button, the service then prompts them to download the captions converted to a WebVTT format.

Rev (Online)
This online tool allows you to upload one or numerous SRT captions files. After a successful upload, the user can choose from a variety of output formats, including WebVTT. The only downside is that inserting an email address is required, as the end files are emailed to the user.

WebVTT.org (Online)
This cloud based solution involves actually copying the SRT text directly and then outputs in a WebVTT format, which will have to be copy and pasted into another program like Notepad to create the file. How the conversion is achieved, though, is deceptively simple, as most of the changing occurs by replacing all commas in the timestamps with a period. It still leaves the numbering system of SRT there, although this shouldn’t break it from working. It’s also important to have a line space between each caption on the SRT file. Without it, the solution can accidentally clump multiple captions together.

WebVTT Captioning & Subtitle Support

Using services to create caption files

Captioning files under a do it yourself approach can be very time consuming. Organizations with vast libraries of content will likely look to outsource this process, or tap into ways to utilize speech-to-text to achieve faster results. This is where services come into play, as they can be paid to create these caption files for you. Below are a few services that offer this capability.

This service provides two methods for generating captions. One is to give them a video file and a transcript, at which point they can produce captions with timestamps for the moment that the dialogue should appear. Alternatively, they can be given a video file to first create the transcripts themselves before then creating the captions for it.

Although they have a community based offering, there is also a professional services approach as well. In this scenario, content owners fill out a form and then send a link to it already hosted. So in this case, a content owner might upload it to IBM Watson Media first and then send a channel link to Amara. Afterwards, they will create a subtitle file that can be used. This particular service specializes in offering multiple languages as well.

This service has a straight forward business model. One minute of video translates to a cost of $1, so a 40 minute video would cost $40 to caption. Their process involves uploading a video file, their team going to work on it, and then receiving a caption file, such as vtt file, in return.

Adding closed captions to videos on IBM Watson Media

Equipped with caption files, the next step is to associate them with video content on a service provider, such as IBM’s video streaming offerings.

WebVTT Captioning: Adding Captions

This involves first uploading the video to an account. Afterwards, someone has to check that video and then click “edit”. This will present a number of options, one of which is called “Closed Captions”.

This view will show previously uploaded captions and present the ability to add more captions or upload one if none are present.

When adding captions, content owners are given options to designate a language, set as a default and enter in an email address that will be used to notify when it’s fully processed. In references to languages, virtually any language is supported and multiple caption language tracks can be associated with the same video. The actual closed caption/subtitle file must be in a WebVTT format (.vtt). If it’s in a different format, they need to be converted first, as outlined above.

After a WebVTT file is uploaded, the user will be taken back to the overall Closed Captions screen for that particular asset. They should be able to see the new language and file shown in this menu, with a notation that it’s “processing”. When done, this will change to “published”.

Once uploaded and published, content owners can later download the WebVTT files. The use case can be if someone notices a typo or error, and this way it can be downloaded, edited and uploaded again. Settings for the captions can also be changed, this includes switching the language or setting it as the default caption option. Users can also unpublish caption options as well.

When setting a default, this means that the caption/subtitle track will display automatically when the video is loaded. As a result, only one track can be set as the default. If it’s desired that a viewer has to select an option, the current default caption can have its settings modified and be removed as a default.

Closed caption user experience

When successfully published, closed captions are easy to access on the viewer’s side. A button will be added to the player controls that shows “CC”. When enabled, captions will appear inside the video player. This can be used to toggle the captions on or off.

The “CC” button can also be used to select from a series of captions if they are available, with an emphasis on different languages.

Closed caption settings

Once enabled, white text will appear near the bottom of the screen. For improved visibility, the is surrounded by partially transparent black as to not interfere with elements inside the video track. Viewers can also alter the appearance of these captions, using closed caption settings found in the player.

Supported languages for captions

IBM’s video streaming solutions support a wide range of caption languages right now. These can be listed not just for language, but also sometimes with regions denoted for slight differences in language. For example, options exist for “French (Canada)” vs. “French (Belgium)”. Below is, presently, a complete list that is accessed from a drop down to select and associate with your VTT file.

  • Abkhazian
  • Afar
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic (Egypt)
  • Armenian
  • Assamese
  • Aymara
  • Azerbaijani
  • Bangla
  • Bashkir
  • Basque
  • Belarusian
  • Bihari
  • Bislama
  • Bosnian
  • Breton
  • Bulgarian
  • Catalan
  • Chinese (China)
  • Chinese (Hong Kong)
  • Chinese (Simplified)
  • Chinese (Singapore)
  • Chinese (Taiwan)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch (Belgium)
  • Dutch (Netherlands)
  • Dzongkha
  • English (Canada)
  • English (Ireland)
  • Esperanto
  • Estonian
  • Faroese
  • Fijian
  • Finnish
  • French (Belgium)
  • French (Canada)
  • French (France)
  • French (Switzerland)
  • Galician
  • Georgian
  • German (Austria)
  • German (Germany)
  • German (Switzerland)
  • Greek
  • Greenlandic (Kalaallisut)
  • Guarani
  • Gujarati
  • Hausa
  • Hebrew
  • Hindi
  • Hindi (Phonetic)
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Interlingua
  • Interlingue
  • Inuktitut
  • Inupiaq
  • Irish
  • Javanese
  • Kannada
  • Kashmiri
  • Kazakh
  • Khmer
  • Kinyarwanda
  • Kurdish
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lingala
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Moldavian
  • Mongolian
  • Myanmar (Burmese)
  • Nauru
  • Navajo
  • Nepali
  • Norwegian
  • Occitan
  • Odia
  • Oromo
  • Pashto
  • Persian
  • Persian (Afghanistan)
  • Persian (Iran)
  • Polish
  • Portuguese
  • Portuguese (Portugal)
  • Punjabi
  • Quechua
  • Romanian
  • Romansh
  • Rundi
  • Russian
  • Russian (Phonetic)
  • Samoan
  • Sango
  • Sanskrit
  • Scottish Gaelic
  • Serbian
  • Serbian (Cyrillic)
  • Serbian (Latin)
  • Serbo-Croatian
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Southern Sotho
  • Spanish (Latin America)
  • Spanish (Mexico)
  • Spanish (Spain)
  • Sundanese
  • Swahili
  • Swati
  • Swedish
  • Tagalog
  • Tajik
  • Tamil
  • Tatar
  • Telugu
  • Thai
  • Tibetan
  • Tigrinya
  • Tongan
  • Tsonga
  • Tswana
  • Turkish
  • Turkmen
  • Twi
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Volapük
  • Welsh
  • Western Frisian
  • Wolof
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu


Closed captions play an important role for video content. Not only does it broaden the potential audience for content, and answer possible legal considerations, but it also addresses changing viewing habits that sees many muting and watching video content. WebVTT is a popular method for adding either captions or subtitles to content as well. IBM Watson Media has also enabled content owners to add this to their VOD (video on-demand) assets as well.

Want to begin using captions and subtitles on your own VODs?

Get Started with a Free Trial of IBM’s video streaming offering