The Future of Closed Captioning with AI

The Future of Closed Captioning with AI

For all the great strides that live streaming video has made in the 21st century, the captioning process has remained largely stuck in the past. Humans still do the heavy lifting by manually typing captions word by word. Captioning pre-recorded video can take up to 10 times longer than the video itself — and the challenge is even greater with live video, which offers no time for review.  

It’s not only clunky and labor intensive — it also can be costly. In fact, many companies agree that budget constraints are one of the top barriers to captioning.

But for full-service video production companies like Suite Spot, manual captioning, arduous as the process may be, still remains the quickest and most accurate way to meet clients’ captioning needs.

That may change soon though, according to Suite Spot Co-Founder Adam Drescher. Automated captioning technology is maturing fast, he notes, and even may be poised to disrupt the entire video industry in the near future. Case in point: IBM’s video streaming and enterprise video streaming offerings recently introduced the ability to convert video speech to text through IBM Watson.

In captioning, quality is priority one

One reason that agencies like Suite Spot still rely primarily on human transcriptionists instead of automated solutions is quality. Inefficient and costly as the manual process may be, it ensures a high degree of accuracy that AI can’t yet provide.

But it can be a challenge to find reliable, highly skilled transcriptionists available for hire exactly when they’re needed.

“There’s often a supply-and-demand problem with transcriptionists, especially in a busy broadcast season like March Madness for NCAA basketball,” says Drescher. “The best resources are usually booked up well in advance to handle national broadcast projects.”

Hiring talent to manage captioning work in-house might seem like a logical solution, but it doesn’t always make good business sense.

“It’s not core to what we do at our company,” Drescher says. “We’d need to have a certain volume of captioning work coming in every day to justify hiring full-time resources with this highly specialized skill set.”

Regulations also add to the pressure to deliver top-quality captions. Producers of live, near-live and prerecorded broadcast video content that is posted online — including video clips — must meet Federal Communications Commission (FCC) requirements for accuracy, timing, completeness and placement of closed captioning. (Consumer-generated media is excluded from FCC regulations.)

How AI could be a game changer

When technology that uses machine intelligence can provide quality, highly accurate automated transcriptions, Drescher believes it could also shake up the captioning space in several key areas.

  • Support multiple languages.
    The manual process for captioning videos in multiple languages can be a logistical nightmare, Drescher states. “Having the option to provide a live, translated closed captioning feed would be amazing,” he says. “It would allow broadcasters to deliver live and syndicated content to audiences around the world, with captions in viewers’ native languages that read like natural conversation.”
  • Deliver on-demand capabilities.
    “This is where automated captioning needs to grow to really disrupt the current market and expand captioning out to multiple channels,” Drescher says. “Closed captioning on demand must be seamless in terms of functionality for users on the broadcast side as well as for viewers.”
  • Make video more searchable online.
    Content discoverability is a significant — and growing —hurdle as companies produce more streaming video content. A recent report found that
    79% of executives surveyed are frustrated with the inability to find relevant information within video archives.  “If closed captioning text was part of a video’s data, and searchable, it would allow people to find the specific moment or content they want without having to watch the whole video,” Drescher says. “It would give more relevance to video content, and help increase its shelf life.”


Whether captions are created automatically with emerging AI tools like Watson, or the old-fashioned way, Drescher stresses that providing a high-quality, seamless viewer experience is key for any organization that produces streaming video content — live or otherwise.

“Closed captioning is an important tool for broadcast, especially because today’s viewers have so much content and so many screens constantly vying for their attention,” he says. Facebook alone hosts more than 8 billion videos per day, but viewers watch nearly 85% percent without sound. “The more information that you can provide to people about why your video content is relevant to them, the more you can increase your viewership.”

Note: the image for this blog comes from the Watson+Ted site, which allows users to ask a question and be served relevant video content from TED Talks.