Adopting Linked Media principles for Stanbol Entityhub

The Web is full of entities (and sites providing Entities). But what the hell are Entities? Glad you ask. When we say Entities we mean data structures that refer to “things” in the real world, like people. Here are some publicly available Entities (datasets) just to dazzle you. 3 million+ entries of wikipedia.org, 800,000 albums and 500,00 artists maintained by musicbrainz.org, 1,5 million titles and nearly 1 million actors available as Entities via imdb.com, 7 million GIS entries via geonames.org, 30 million defined paths with over 350 million nodes available from linkedGeoData.org and openstreetmap.org. If you want more numbers to impress your friends with then check out the Linked Data Cloud site. The most common Entities are concepts referring to Persons, Organizations, Places, Events, and Artifacts.

What to do with all this Linkded Data?

The Apache Stanbol component Entityhub provides an infrastructure to manage such Entities from local and referenced Sites, and in turn provides CMS with infrastructure to create new applications that combine data from different sources, improve search and discovery or support Entity Tagging, just to name a few possibilities.

However Stanbol is not only concerned with linking data but also interlinking the web of documents with the web of data. Enter the Salzburg Research proposal to extend Linked Data principles to also support content. First implementation of the Linked Media proposal is based on the Kiwi2/Linked Media Framework.

A proposal to improve the RESTful API of the Entityhub according to Linked Media principles is now available. The first results will be demonstrated at the IKS Paris Community Workshop, so book your seat NOW!

More details about the Entityhub:

The Stanbol Entityhub provides out of the box support for commonly used protocols such as Linked Data but also allows extensions to work with sites that do not support standards. It allows the use of local caches. Such caches can store some/all information of referenced sites and are typically used if one needs to work offline, to increase query performance or to support queries that would not be possible/feasible by directly using the services provided by the referenced site.

In short the entityhub provides RESTful services to work with Entities used by a CMS. This includes services to:

  1. import entities: Import Entities from referenced Sites and to configure which properties should be imported. If necessary, define several mappings to merge information originating from different referenced sites.
  2. manage entities: It defines states for both imported entities and entity mappings. The proposed state allows to add entities to the entity hub even if there would be a need for some kind of approval before they can be used for daily operations. The active state is used for entities that are approved by the organization. In addition it is possible to mark entities as deprecated or removed. Default states are configurable per referenced site (e.g. entities imported form geonames.org are automatically approved – state:active)
  3. work with entities: Search for Entities based on name and language, type, value ranges … This provides suggestions for entities while typing. Looks up Entities for Tags and/or URLs. Retrieves information for Entities mentioned in content managed by the CMS. Gets the knowledge needed to enable [Dynamic Semantic Publishing].
  4. single access point: It allows to query for and to retrieve entities defined by any of the referenced sites. Developers need not to understand the different services, query languages and data formats. It provides a single access point and query language to work with all of them. Typically such functionality is important for semantic lifting of content, but might be also interesting for manually tagging of content if users are allowed to suggest/import new Entities from any of the referenced sites.

 

Comments are closed.