Ogg Theora Cook Book

Web Video Accessibility

When we talk about Web video here, we explicitly refer to video published in Ogg Theora/Vorbis format inside a Web browser that supports the HTML5 video element.

Accessibility of video refers to several different aspects of usability of video, depending on what user group we are looking at. So, before diving into the different aspects and how they can be supported, we list the user groups and their specific requirements.

Accessibility user groups

  1. Non-native speakers: when watching a video in a foreign language, it is impossible to follow. For this purpose, subtitles have been invented. Subtitles are time-aligned transcriptions of the spoken words in a video, which have been translated into different non-native languages. Alternatively, an audio track in the native language can also be created, which then replaces the original audio track. This is called dubbing.
  2. Deaf or hard-of-hearing (HoH):when a HoH person is watching a video, it is impossible to follow because none of the sounds and spoken words are perceived. Captions are time-aligned transcriptions of the spoken words and the noises, sound effects, music and other sounds in a video.
  3. Blind or vision-impaired (VI): when a VI person is trying to "watch" a video, without special help it is not possible to interact with the video player controls in the first place, and secondly it is impossible to follow the video because none of the visual displays are translated into signals that VI person can perceive. Firstly then it is important to make the video controls accessible. Secondly it is important to provide a time-aligned description of the visual channel. There are two senses that can be used to replace the visual channel: hearing and touch. In order to provide an aural representation of the visual content, we can either create a spoken audio description (AD) through an additional audio track for the video, or we can create a textual audio description (TAD) that a screen reader will read out in a time-aligned manner. Similarly, the TAD can also be output to a braille device, such that a VI person can perceive the visual channel through touch.

Accessible HTML5 video controls

A key accessibility challenge for browser vendors with the HTML5 video element is to make the default controls accessible through the keyboard. The HTML5 video element provides an attribute called controls which requests the browser to create default controls on top of the video.

Here is what the current specification says:

“This user interface should include features to begin playback, pause playback, seek to an arbitrary position in the content (if the content supports arbitrary seeking), change the volume, and show the media content in manners more suitable to the user (e.g. full-screen video or in an independent resizable window).”

In Firefox 3.5, the controls attribute currently creates the following controls:

  • play/pause button (toggles between the two)
  • slider for current playback position and seeking (also displays how much of the video has currently been downloaded)
  • duration display
  • roll-over button for volume on/off and to display slider for volume
  • FAIK fullscreen is not currently implemented

Further, the HTML5 specification prescribes that if the controls attribute is not available, “user agents may provide controls to affect playback of the media resource (e.g. play, pause, seeking, and volume controls), but such features should not interfere with the page’s normal rendering. For example, such features could be exposed in the media element’s context menu.”

In Firefox 3.5, this has been implemented with a right-click context menu, which contains:

  • play/pause toggle
  • mute/unmute toggle
  • show/hide controls toggle

When the controls are being displayed, there are keyboard shortcuts to control them:

  • space bar toggles between play and pause
  • left/right arrow winds video forward/back by 5 sec
  • CTRL+left/right arrow winds video forward/back by 60sec
  • HOME+left/right jumps to beginning/end of video
  • when focused on the volume button, up/down arrow increases/decreases volume.

To make these controls accessible to VI users, Firefox exposes them to screen readers using MSAA or AT-SPI. It implies having to use focus mode for now. Exposure through iSimpleDOM interfaces on Windows (http://www.marcozehe.de/2009/06/11/exposure-of-audio-and-video-elements-to-assistive-technologies/)  are still in development. Once in focus mode, the keyboard shortcuts listed above make the video controls accessible.

Providing video accessibility data

As described above, accessibility for a particular video is provided through creating additional data that accompanies the original video. A fully accessible video may consist of all of the following:

  • original video track
  • original audio track
  • audio tracks that contain dubs in foreign languages
  • captions in all languages (which also covers the need for subtitles)
  • audio tracks that contain spoken audio descriptions in all languages
  • textual audio descriptions in all languages

All of the mentioned data that provides accessibility to video is time-aligned with the original video. It can be provided in two different ways:

Publishing accessible video on the Web

The current HTML5 specification does not contain explicit means to publish, style, and position accessibility data for audio and video. The suggestion is to use in-line accessibility data and have the video decoder deal with it. Also, the suggestion is to use javascript where there is a necessity for out-of-band accessibility data. There is work in progress on improving this situation. The idea is to expose accessibility data to the Web browser in the same manner independent of whether the data originates resides in-line or out-of-band.

Several demos have been made with out-of-band subtitles, captions, and audio descriptions and the HTML5 video tag:

These are all implemented using javascript, so you can learn from them. There is also a more detailed introduction to Jan Gerber's javascript library (see section on Publishing) for subtitle support in this Cookbook.

Silvia Pfeiffer's demo includes a proposal for how to associate out-of-band accessibility data through a new HTML5 tag with videos. This specification is continuing to evolve and is expected to eventually lead to native browser support of time-aligned accessibility data. A similar proposal has been made by Greg Millam from Google (http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-February/018600.html). Such a specification may look as follows:

<video src="../elephant.ogv" poster="elephant.png" controls>
  <itext lang="fr" type="text/srt" src="../elephant.fr.srt" category="SUB"></itext>
  <itext lang="en" type="text/srt" src="../elephant.en.srt" category="CC"></itext>
  <itext lang="en" type="text/srt" src="../audiodesc.srt" category="TAD"></itext>

Note: this is an example proposal only, which is not currently supported natively by any browsers.