Adding knowledge to JCR/CMIS content repositories

This article describes how to update JCR or CMIS content repositories with external RDF datasets. The goal is to simply integrate existing knowledge structures (e.g. hierarchies) into the native repository of a content management system.

In today’s digital age, more organizations are publishing and sharing their data as “linked data” [1]. This linked data is published as RDF datasets and shared via the Linked Open Data cloud. According to latest statistics[2], the Linked Open Data cloud includes 30 billion RDF triples. The open data is a valuable information source for content repositories. It contains lots of hierarchies which can be used to classify the documents in the content repository or entities representing actual content.

We decided to implement a new feature in the scope of CMS Adapter component of Apache Stanbol to exploit these large sets of RDF data on the web. This new feature allows content management system users and developers to define bidirectional mappings between the content repository and external RDF data. The main purpose of the RDF bridging feature of CMS Adapter is to create hierarchical structure or update the existing one in the content repository. Currently, this feature allows content repositories only supporting JCR/CMIS specifications.

For this blogpost, I use a sample dataset [3] derived from one of the datasets of DBpedia which is a community effort making a RDFized version of Wikipedia available on the Web. More specifically, the example dataset has the http://dbpedia.org/resource/Category:Vertebrates entity as root and other members of dataset are children of the root up to 4 level depth. Also, there are larger and smaller example datasets at [4]. As the root entity implies the data mainly includes vertebrate species and this category hierarchy will be reflected into the repository. Please note that, there are additional http://www.w3.org/2004/02/skos/core#narrower properties in the example dataset on top of the original data obtained from DBPedia as the default implementation of RDF bridge mechanism processes child relations in the provided RDF data. Here is a sample from the dataset:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://www.w3.org/2004/02/skos/core#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
  ...
  <rdf:Description rdf:about="http://dbpedia.org/resource/Category:Electric_fish">
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <j.0:broader rdf:resource="http://dbpedia.org/resource/Category:Electricity"/>
    <j.0:broader rdf:resource="http://dbpedia.org/resource/Category:Fish_sorted_by_adaptation"/>
    <rdfs:label xml:lang="en">Electric fish</rdfs:label>
    <j.0:narrower rdf:resource="http://dbpedia.org/resource/Category:Strongly_electric_fish"/>
    <j.0:narrower rdf:resource="http://dbpedia.org/resource/Category:Weakly_electric_fish"/>
  </rdf:Description>
  ...
  <rdf:Description rdf:about="http://dbpedia.org/resource/Category:Weakly_electric_fish">
    <j.0:broader rdf:resource="http://dbpedia.org/resource/Category:Electric_fish"/>
    <rdfs:label xml:lang="en">Weakly electric fish</rdfs:label>
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
  </rdf:Description>
  ...
</rdf:RDF>

To be able to test this new feature, the first step is to build and run full launcher of Apache Stanbol as described at [5]. After that, the Web interface of CMS Adapter which allows performing of RDF mapping functionality is available at http://localhost:8080/cmsadapter/map with default run configurations of Apache Stanbol. Furthermore, the configurations for the RDFBridge implementation can be adjusted through Apache Stanbol CMS Adapter Default RDF Bridge Configurations entry in the Apache Felix Web Console Configuration panel which is deployed under http://localhost:8080/system/console/configMgr.

Following image shows the RDF bridge configuration properties with default values. They are applied in the example presented in this blog post. These configurations basically provides a selection of resources from RDF data, specifying the name of the object to be created in the repository and children of selected resources. They are also valid while generating RDF from the content repository. Detailed explanations about configurations can be seen in the image below.

Following figures show the Web interface of the CMS Adapter in order to update a content repository based on RDF data. The first image shows the configurations to connect to a content repository, which supports eigther JCR or CMIS.

In the next screenshot, the submission of RDF data in various ways is shown.


In the last image, the populated Nuxeo DM content repository is depicted.


As the CMIS specification does not allow adding custom properties to content repository objects, the metadata of the object is created as a separate file having a “_metadata” extension to the actual name of the object in the same hierarchy with the actual object. As an example, in the next paragraph generated metadata of the “Mesozoic birds” folder is seen.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://www.w3.org/2004/02/skos/core#"
    xmlns:j.1="http://www.apache.org/stanbol/cms#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
  <rdf:Description rdf:about="http://dbpedia.org/resource/Category:Mesozoic_birds">
    <j.0:narrower rdf:resource="http://dbpedia.org/resource/Category:Cretaceous_birds"/>
    <j.0:narrower rdf:resource="http://dbpedia.org/resource/Category:Jurassic_birds"/>
    <j.0:broader rdf:resource="http://dbpedia.org/resource/Category:Prehistoric_birds"/>
    <rdfs:label xml:lang="en">Mesozoic birds</rdfs:label>
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <rdf:type rdf:resource="http://www.apache.org/stanbol/cms#CMSObject"/>
    <j.1:parentRef rdf:resource="http://dbpedia.org/resource/Category:Prehistoric_birds"/>
    <j.1:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Mesozoic birds</j.1:name>
    <j.1:path rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/rdfmaptest/Vertebrates/Birds/Extinct birds/Prehistoric birds/Mesozoic birds</j.1:path>
  </rdf:Description>
</rdf:RDF>

I am looking forward to improve this feature based on your comments and feedback.

References: