API: Training AI for Video Analysis & Caption Automation

API: Training AI for Video Analysis & Caption Automation

Curious on the artificial intelligence capabilities of the IBM Watson Media solutions for managing video content, but looking for a way to develop them into existing workflows?

APIs are available to integrate IBM Watson Video Enrichment and IBM Watson Captioning into other applications, such as existing dashboards and interfaces. This includes both generating metadata using the artificial intelligence and managing training the AI to be better attuned to a use case. In addition, the APIs are launching with new, additional features, some currently unavailable elsewhere.

If you are interested in putting these APIs to use, contact us to learn more.

APIs for Watson Video Enrichment and Watson Captioning

APIs for programmatic access have always been a key feature for the IBM Watson Media solutions. With it organizations and content owners can take the robust, AI driven features of the solutions and introduce them to other applications. These can range from managing generated captions through speech-to-text to video analysis use cases. Examples of the latter can include object detection, transcription, keyword and entity extraction and many others.

With the use of the APIs, these capabilities can be replicated in a custom dashboard or other applications. As a result, users can manage assets and generated metadata inside existing workflows without having to sign into the IBM Watson Video Enrichment or IBM Watson Captioning dashboards.


Training AI: language model

The APIs for IBM Watson Captioning and IBM Watson Video Enrichment also include the language training model as well. This encompasses powerful capabilities around training the artificial intelligence in ways to improve accuracy, such as AI vocabulary training via a REST API.

It also includes the automated learning process, applied to generated speech-to-text content that is manually edited in IBM Watson Captions. This process sees Watson automatically change the way it captions future content based on edits, with the ability to approve or reject applied learned concepts, such as new words Watson might have picked up.


No stored metadata or video content with APIs

For some organizations, a requirement might be to ensure that video assets, like movies, are stored in as few places as possible. In particular, this can be a consideration around controlling leaks or minimizing risk that content could fall into unintended hands. As a result, a positive aspect of using the APIs for IBM Watson Video Enrichment is that a customer’s metadata and video content is not stored in the IBM solution.

This is one of several benefits inherent to the API over using the dashboard to manage IBM Watson Media solutions. Below are a few other benefits, some of them new features and ones that are being introduced through API use cases.


API: Training AI for Video Analysis & Caption Automation

Automatic punctuation

Previously, IBM Watson Captioning and IBM Watson Video Enrichment wouldn’t do automated punctuation on content, but was setup to manage when something should be capitalized. This included starting every new caption file with a capital word, capitalizing nouns and adding capitalization when the user edited something to introduce a period. To do more than that, like to add an exclamation mark, required the user to manually edit the generated text.

It was always on the roadmap to do more, though, and today content owners can take advantage of automatic punctuation. This includes allowing Watson to add periods, question marks, exclamation marks and commas as appropriate in the content. Once added, the captions and transcripts will be more readable and natural looking.


Profanity detection

To help determine how mature content might be, a new category for offensive language has been created for analyzed content. This can create levels of how offensive the language might be and you can filter according to individual preferences. An example of this can include separating words into obscenities, indecencies and profanities.

To elaborate, an example of an obscenity could be a character saying “ass”, “butt” or “barenaked”. An indecency could be an implied violent act, like a character saying “kill you”. Finally, a profanity is very strong, offensive language such as swear words. In addition to listing examples, each would be designated with how many times it came up, like maybe “butt” came up 6 times in the content.

The ultimate goal with this feature is assisting in designating a rating for the content. For example, if someone came across a piece of content that mentioned “butt” numerous times but nothing else offensive, it might be designated as immature content but still accessible to children. In contrast, something with “butt” mentioned multiple times in combination with strong sexual language could quickly be flagged as being inappropriate for children. In addition, each profanity instance is given a timestamp as well. As a result, in the instance that visual context might be needed to verify intent or tone, the content owner can watch that specific segment to validate it.

This feature is currently only available through the API, but is coming to the dashboard version of IBM Watson Video Enrichment soon.


Epilepsy trigger monitoring

Epilepsy is the fourth most common neurological disorder in the United States. It can affect people of all ages and is characterized by unpredictable seizures, although it can cause other health problems as well. A potential trigger for an epilepsy is visual in nature and can be caused by exposure to:

  • High intensity flashing lights
  • Strobe lights
  • Quick saturation changes

These photosensitive inducing conditions can cause epilepsy seizures or other health issues for viewers such as migraines. Consequently, it can be a good practice to somehow warn viewers ahead of time before they start to watch content that could trigger photosensitive conditions. This can include a warning message before the content starts, or noting it in the metadata in a manner that would be accessible to end viewers, either as a way to avoid them in search results or have them listed in descriptions.

As a result, IBM Watson Video Enrichment can now search content to detect these potential health hazards. It can then flag movies and videos that meet this criteria for the content owner to decide on an appropriate next step.

This feature is currently only available through the API. However, it’s also coming to the dashboard version of IBM Watson Video Enrichment in the future.



The introduction of APIs to manage IBM Watson Media content potentially simplifies the creation and use of existing, custom models to manage assets on a per video basis. This includes complex visual use cases, such as video analysis and epilepsy triggers, to the configuration of speech to text language models via a REST API.

If you want to start using these APIs to manage your own content in unique ways, contact us to learn more. If you are looking for more information on how this technology is shaping and improving the management of video content, download this white paper on Captioning Goes Cognitive: A New Approach to an Old Challenge to get an idea of how it’s improving captioning.