In a general sense, Stanbol CMS Adapter (previously named Ontology Generator) transforms an existing content structure (e.g. category hierarchy, folder structure) into a semantic web compliant schema (ontology). For JCR/CMIS supported systems this process is done by defining bridges and for the other content management systems it is done manually. In this blog post, I will try to explain how this mapping is done for those two cases.
Just a small remark before going into details: Current implementation of CMS Adapter can be followed from the source code under Apache Stanbol project SVN. Its build and run instructions is explained in the README file which is located under the root directory of CMS Adapter. README file also contains some documentation about bridges. It is also possible to find documentation about CMS Adapter in the wikipage.
Firstly, for JCR/CMIS supported systems appropriate and easy way of extracting semantics into an ontology is defining bridges and submitting them via related RESTful service. Because in this way, CMS Adapter works in online mode and existing type definitions in the system are lifted as ontology classes automatically by connecting the repository and then all bridges are executed.
In deeper level mapping operations are handled by processors in CMS Adapter. For the time being there are processors implementing the Processor interface. They mainly handles ObjectTypes and Objects of the generic model that is explained below. But it is possible to implement new processors for specific needs content management systems.
There are 4 types of bridges in CMS Adapter:
- Concept Bridge: It is used to transform a repository object hierarchy into a class hierarchy in the ontology. Target repository items are selected with a query specified for the bridge definition.
- Instance Bridge: It is used to transform repository items into individuals in the ontology. A query parameter is also specified for this type of bridge to select repository items.
- Subsumption Bridge: It used to create subclass-superclass relations between the classes in the ontology. Parent classes are again created for the repository items that are select with the query specified. And child classes are selected by a property specified. This bridge can be used inside a concept bridge.
- Property Bridge: This bridge is not supposed to be defined standalone. It is used within a Concept or Instance Bridge. Desired properties having semantics can be transformed to ontology with property bridge.
In deeper level bridges are handled by processors in CMS Adapter. For the time being there are four core processors implementing the Processor interface. But it is possible to implement new processors for specific needs content management systems.
Before continuing with other content management systems it would be helpful to mention about repository content model that is used CMS Adapter. By considering JCR and CMIS specifications a generic model is like in the figure below is determined that can be used by any repository. ObjectType is the model element that is used as types of content repository items. And Object element represents actual repository items. In our generic model ContentObject and ClassificationObject elements are added to distinguish the repository items which holds actual content(ContentObject) and repository items which are used to classify other repository items. (ClassificationObject).
For content management systems that do not support JCR/CMIS specifications each step is executed manually. In the first step object types with property definitions inside are submitted. Then actual repository items are submitted. Submitted objects can contain both classification objects and content objects. There are sample definitions object types and objects in the attachments to give a simple example.
Let’s go over that samples. By submission of ObjectTypes, NewsCategory and NewsArticleItem are transformed a ontology class. Also, their property definitions are transformed into a datatype property or an object property according to property type. You can see created datatype and object properties after this step in below figures.
In the next step, objects are submitted. The attachment Objects includes two classification and two content objects and two classes and two individuals are created in the ontology. Thanks to sameCategory property defined in Sports classification object WorldSports class is set as equivalent class Sports class in the ontology. If we analyze properties of SportsArticle content object, for title property a datatype property assertion having value as value of title property for SportsArticle individual. For categorizedBy property, Sports class is set as a type of SportsArticle individual and for relatedItem property an object property assertion is created having value of SportsArticle2. The generated ontology is stored by using store component of Stanbol and it can be seen under http://localhost:8080/ontology in default settings. Here are some screenshots of the generated ontology elements.