The goals of the Semantic Video Annotation use case (a.k.a. multimedia use case) are to derive business-specific requirements with respect to a typical scenario related to the Polymedia context. This leads in turn to the second goal: to show how IKS technology can provide added value to our existing video CMS tool, part of KIT cosmos, by empowering it with semantic capabilities. A third goal is to point at technical difficulties impacting on the integration of the desired functionalities within our commercial, closed product.
We started by designing a use case that fits the industrial needs, beginning with the editing phase, completing it by exploring ways to exploit the semantic capabilities introduced, through a player. Below is an overview of the scenario.
The journalist, in the editorial office of a website dealing with cinema, using a video annotation software adds semantic tags to the various stages of the considered movie, for example the location where the scene was shot or the actors that are present on stage (see figure below). Once finished, the edited video, including the performed annotations, is saved in the internal repository.
The journalist who will review the annotated movie, or the generic end-user, will have the chance to playback the video (using a compatible player) watching the multimedia stream while a set of additional information linked to the semantic metadata are displayed in real time. For example pictures of an actor present into a scene or the details about the location where the scene was shot (see figure below).
With reference to the architecture for semantic CMS provided by IKS we envisage the need of implementing the semantic annotation following a representation format suitable to both the editor and the semantic player, integrating a set of services (that we named Video Semantic Annotation Services) providing storage/retrieving capabilities related to what is indicated in the reference diagram with “persistence” and “knowledge repository”:
Video Semantic Annotation metadata and services have been developed as an open-source addition to the Stack. After an evaluation phase considering different metadata formats (Media Fragments, Event Ontology, Ontology for Media Resources), including a proprietary one (by Polymedia), we decided to adopt the open source implementation of the popcorn.js framework. First of all it shares similarities with features envisaged as requirements, moreover it is already supported by OSS player components and related plug-ins.
Basically the popcorn.js video framework, in its XML grammar, provides capabilities to manage two categories of data that reflects our need of managing both static information, linked to the entire video (e.g.: movie title, the director, the cast, etc.) and dynamic information, linked to a specific part of the video (e.g.: actors’ names, location, etc.). Going into details, popcorn.js format provides:
The Manifest section, which is optional and it can contain the description of:
- People (identifies one specific person involved with the video)
- Places (identifies one specific place involved with the video)
- Attributions (identifies one attribution item, ex. credits)
- Articles (identifies one article related to the video)
- Transcripts (identifies one transcript item)
Timelines, that are mandatory and can contain:
- Subtitles (subtitle to display)
- Footnotes (footnotes to display in a div)
- Captions (popup captions to show on the video or in a target div)
- Lowerthirds (lower third captions to show on the video)
- Location (Google Maps of significance to show for the video)
- Resources (contains other plugins: ex. Twitter, Wikipedia, Flickr, etc.)
The concept of “Resources” is very important since referring to plug-ins running on the player and triggered automatically when parsing the annotated video. A plugin is referenced by name, as an XML tag, and all its options follow as XML attributes; in and out attributes in the XML are the same as start and stop options in the plugin. popcorn.js framework provides a set of already existing plug-ins that can be extended by custom one. We took advantage of both.
Architecture and Integration
Provided the above considerations, below is the high level architecture describing the various actors and components interplay.
The core components are:
- Video Editor – allows to perform video annotation adding semantic tags
- Video Semantic Annotation Services - RESTful Web Service Layer providing access/management services to video annotations
- Video Player – allows to playback annotated contents making use of the semantic information
Both the editor and the player integrate with IKS through the video semantic annotation services.
Once the video is properly annotated, clicking the save button will:
- Store metadata related to the editing phase within the Polymedia CMS internal DB Server.
- Trigger Video Semantic Annotation services in order to store the annotated video segments using key, value pairs.
On the other side, the semantic player foresees two steps in the typical workflow:
The startup phase is about retrieving the video metadata by invoking Video Semantic Annotation services and, in turn, by opening the connection with the Polymedia Streaming Server for the video content referenced by the metadata file.
The playback phase, on the other hand, is about playing back the streamed video content, parsing the metadata file in real-time and triggering the configured plug-ins to produce the enriched output.
Live Demo and Webcast
The video below provides an overview of the Video Editor implementing the following features:
- Video Segment Manual Annotation
- Annotation forms customised for plug-ins (BoxIt, Flickr, VIE plug-in)
- Integration with Video Semantic Annotation Services
The demo is also available online. Since based on a production environment the access is limited: if you’re interested in it please contact me.
The video below provides an overview of the Semantic Player implementing the following features:
- Video playback, including instant timeline browsing
- Integrated 1 existing plug-in from popocorn.js framework (Flickr)
- Developed and integrated 2 custom plug-ins for demo purposes (BoxIT, VIE plug-in)
- Integration with Video Semantic Annotation Services and Polymedia Video Streaming Server
The current prototype provides manual annotation on the editor side. Polymedia plans to evaluate the possibility of adding semi-automatic annotation through Stanbol services provided by IKS. Depending on the effort required below is the detail of the desired additional features:
- Integrate Speech-to-Text functionalities – The goal here is to produce a textual alternative at the available multimedia content that could be semantically processed by Stanbol
- Integrate Stanbol services in the video-import workflow – In order to extract main concepts starting from the speech-to-text output
- Enable semi-automatic annotation – This could be done by injecting concepts extracted by Stanbol as suggested tags within the Polymedia CMS video editor
Improving the player UI is another step we would like to perform. At the current stage the front-end is minimal and we would like to have something nicer to present during road show activities in 2012.