Introducing Apache Stanbol Reasoning Services

Data represented in RDF can benefit from the usage of inference engines which can exploit the semantics of the vocabularies used to generate new statements, and thus new knowledge. Vocabularies can describe specific symbols by the means of RDFS and OWL to benefit from automatic reasoning systems. Both RDFS [1] and OWL (in the first[2] and second version[3]) includes symbols with a specific meaning. This meaning can be explicated by reasoning services.

In this article I introduce the Apache Stanbol (Incubating) Reasoners module, which provides to RDF enabled systems a set of reasoning services to exploit automatic inference engines.

The module implements a common api for reasoning services, providing the possibility to plug different reasoners and configurations in parallel.

Actually the module includes OWLApi and Jena based abstract services, with concrete implementations for Jena RDFS, OWL, OWLMini and HermiT reasoning service.

Within the IKS 5.1 release the Reasoners module expose a REST endpoint at /reasoners with the following preloaded services:

  • /rdfs, which is based on Jena RDFS reasoner and supports almost all of the RDFS entailments described by the RDF Core working group [1]
  • /owl, a Jena reasoner configured to support OWL (with few limitations [4])
  • /owlmini, another configuration that partially supports OWL (see [5])

In addition, it is also possible to use the HermiT [6] reasoner to exploit the full OWL2 specification.

To enable HermiT as OWL2 reasoner you can download and install it from the Stanbol (Incubating) svn repository. The steps are the following:

$ svn co https://svn.apache.org/repos/asf/incubator/stanbol/trunk/reasoners/hermit stanbol-hermit
$ cd stanbol-hermit
$ mvn install -PinstallBundle -Dsling.url=http://localhost:8080/system/console // change this to the path related to your IKS release instance

The HermiT reasoning service is now available in the list of active reasoning services.

Each reasoner can be accessed with one of three tasks:

  • check: to perform a consistency check. This service returns HTTP Status 200 if data is consistent, 204 otherwise (at the current state of implementation the service does not include an explanation about why the input is inconsistent. This feature is in our “todo” list, by the way)
  • classify: to materialize all inferred rdf:type statements.
  • enrich: to materialize all inferred statements.

For example:

  • /reasoners/owl/check expose the Jena owl service with task check, or
  • /reasoners/owl2/classify to use the HermiT service with task classify

To show how the endpoint behave we can use the curl command line utility to ask the Jena OWL reasoning service to materialize all inferences produced by loading the FOAF ontology:

$ curl -H "Accept: text/n3" "http://localhost:8080/stanbol/reasoners/owl/enrich?url=http://xmlns.com/foaf/0.1/"

The above example performs a GET  asking for a text/n3 representation of the result. For example, the equivalency of foaf:Agent and dc:Agent result in the rdfs:subClassOf statements for the foaf:Person type:

[...]
<http://xmlns.com/foaf/0.1/Person>
      a       <http://www.w3.org/2002/07/owl#Thing> ,
              <http://www.w3.org/2002/07/owl#Class> ,
              <http://www.w3.org/2000/01/rdf-schema#Resource> ,
              <http://www.w3.org/2000/01/rdf-schema#Class> ;
      <http://www.w3.org/2000/01/rdf-schema#label>
              "Person" ;
      <http://www.w3.org/2000/01/rdf-schema#subClassOf>
              <http://xmlns.com/foaf/0.1/Person> ,
              <http://purl.org/dc/terms/Agent> ,
              <http://xmlns.com/foaf/0.1/Agent> ,
              <http://www.w3.org/2002/07/owl#Thing> ,
              <http://www.w3.org/2000/01/rdf-schema#Resource> ,
              <http://www.w3.org/2000/10/swap/pim/contact#Person> ,
              <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> ;
      <http://www.w3.org/2002/07/owl#disjointWith>
              <http://xmlns.com/foaf/0.1/Organization> ,
              <http://xmlns.com/foaf/0.1/Project> ;
[...]

This behaviour is equivalent if we use the method POST with the header Content-type: application/x-www-form-urlencoded. In addition, if the target parameter is provided, the service saves the output in the given graph in the triple store and does not return the RDF stream.

Let’s give some example on the differences between the available reasoners (to try the example ontology snippets you can download them from here).

The first snippet uses rdfs:subClassOf to declare that any Article is a Document, which is in turn a ContentItem.

<!-- item_1 is an Article -->
<ex:Article rdf:about="http://www.example.org/reasoners/item_1"/>

<!-- An article is a kind of Document -->
<rdf:Description rdf:about="http://www.example.org/reasoners/Article">
	<rdfs:subClassOf rdf:resource="http://www.example.org/reasoners/Document"/>
</rdf:Description>

<!-- An document is a kind of content item -->
<rdf:Description rdf:about="http://www.example.org/reasoners/Document">
	<rdfs:subClassOf rdf:resource="http://www.example.org/reasoners/ContentItem"/>
</rdf:Description>

download it

Giving it to the /rdfs reasoning service, we obtain as resulted inferred statement that item_1 is also a Document and a ContentItem. Another feature of RDFS is the definition of the domain and range of a property.

<!-- Both enridaga and alexdma are authors of item_1 -->
<rdf:Description rdf:about="http://www.example.org/reasoners/enridaga">
	<ex:author rdf:resource="http://www.example.org/reasoners/item_1"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.org/reasoners/alexdma">
	<ex:author rdf:resource="http://www.example.org/reasoners/item_1"/>
</rdf:Description>

<!-- ex:author wants a person as subject, and a content-item as object -->
<rdf:Description rdf:about="http://www.example.org/reasoners/author">
	<rdfs:domain rdf:resource="http://www.example.org/reasoners/Person"/>
	<rdfs:range rdf:resource="http://www.example.org/reasoners/ContentItem"/>
</rdf:Description>

download it
We will obtain, in this case, that both enridaga and alexdma are Authors, and that item_1 is a ContentItem. RDFS semantics is considered also by other reasoners. The /rdfs service is the less “expressive” of the four.

The following snippet will work with /owl, /owlmini and /owl2 (but not with /rdfs):

<!-- ogrisel, enridaga and alexdma are developers -->
<ex:Developer rdf:about="#enridaga" />
<ex:Developer rdf:about="#ogrisel" />
<ex:Developer rdf:about="#alexdma" />

<!-- We know:
#alexdma #workedTogheter #enridaga and #ogrisel
-->
<rdf:Description rdf:about="#alexdma">
<workedTogheter rdf:resource="#ogrisel"/>
<workedTogheter rdf:resource="#enridaga"/>
</rdf:Description>

<!-- #workedTogheter is an owl:SymmetricProperty (well, this is an example...) -->
<owl:SymmetricProperty rdf:about="#workedTogheter"/>
<!-- #workedTogheter is also a owl:TransitiveProperty (well, this is an example...) -->
<owl:TransitiveProperty rdf:about="#workedTogheter"/>

download it
The OWL vocabulary introduce logical capabilities, allowing more complex inferences to be produced. In the above example we state that alexdma workedWith enridaga and ogrisel. Since we declare the property workedTogheter to be “Symmetric” and “Transitive”, the result will include the following:

  • enridaga workedWith alexdma (is symmetric)
  • ogrisel workedWith alexdma
  • ogrisel workedWith enridaga (is transitive)
  • enridaga workedWith ogrisel

Next snippet is inconsistent. This means that the OWL based reasoners will not return any inference, but a 204 HTTP response:

<!-- enridaga is a person -->
<ex:Person rdf:about="http://www.example.org/reasoners/enridaga" />

<!-- Persons and Organizations are disjoint -->
<owl:Class rdf:about="http://www.example.org/reasoners/Person" />
<owl:Class rdf:about="http://www.example.org/reasoners/Organization">
	<owl:disjointWith rdf:resource="http://www.example.org/reasoners/Person" />
</owl:Class>

<!-- A Public Limited Company is a kind of Company, which is a kind of Organization -->
<owl:Class rdf:about="http://www.example.org/reasoners/PublicLimitedCompany">
	<rdfs:subClassOf rdf:resource="http://www.example.org/reasoners/Company" />
</owl:Class>
<owl:Class rdf:about="http://www.example.org/reasoners/Company">
	<rdfs:subClassOf rdf:resource="http://www.example.org/reasoners/Organization" />
</owl:Class>

<!-- enridaga cannot be a Public Limited Company -->
<ex:PublicLimitedCompany rdf:about="http://www.example.org/reasoners/enridaga" />

download it
The /owlmini implements the OWL language with some (more) limitations then /owl (both are based on the Jena rule based reasoner, as said before).

The following example shows the use of class restrictions, in particular the usage of owl:someValuesFrom:

<!-- john, is an developer, but we don't know anything else -->
<ex:Developer rdf:about="#john">
</ex:Developer>

<!-- a #SoftwareCompany is a kind of #Organization -->
<owl:Class rdf:about="SoftwareCompany">
	<rdfs:subClassOf rdf:resource="#Organization" />
</owl:Class>

<!-- #Developers #worksAt some #SoftwareCompany (they are not the only one..., 
	this is why we use owl:subClassOf) -->
<owl:Class rdf:about="#Developer">
	<rdfs:subClassOf>
		<owl:restriction>
			<owl:onProperty rdf:resource="#worksAt" />
			<owl:someValuesFrom rdf:resource="#SoftwareCompany" />
		</owl:restriction>
	</rdfs:subClassOf>
</owl:Class>

<!-- Employee are all who #worksAt any kind of Organization (owl:equivalentClass) -->
<owl:Class rdf:about="#Employee">
	<owl:equivalentClass>
		<owl:restriction>
			<owl:onProperty rdf:resource="#worksAt" />
			<owl:someValuesFrom rdf:resource="#Organization" />
		</owl:restriction>
	</owl:equivalentClass>
</owl:Class>

download it

We expect an OWL reasoner to state that John is an Employee. This example does not work with /rdfs (it ignores the OWL semantics), and does not work with /owlmini, because the Jena OWL(mini) reasoner omits the forward entailments for owl:someValuesFrom restrictions (see [4]). It works correctly if we use the service /owl.

The /owl service support the most of the semantic of OWL. The HermiT reasoner is based on OWLApi and is an example of a DL reasoner. It fully covers OWL and OWL2, which introduces lot of interesting features. Here is an example:

<!-- any employee must have some features: firstname, familyname, email 
	and worksAt (in one of the allowed places) -->
<owl:Class rdf:about="#Employee">
	<owl:equivalentClass>
		<owl:Class>
			<owl:intersectionOf rdf:parseType="Collection">
				<rdf:Description rdf:about="#Person" />
				<owl:Restriction>
					<owl:onProperty rdf:resource="#firstname" />
					<owl:someValuesFrom rdf:resource="&rdfs;Literal" />
				</owl:Restriction>
				<owl:Restriction>
					<owl:onProperty rdf:resource="#familyname" />
					<owl:someValuesFrom rdf:resource="&rdfs;Literal" />
				</owl:Restriction>
				<owl:Restriction>
					<owl:onProperty rdf:resource="#email" />
					<owl:someValuesFrom rdf:resource="&rdfs;Literal" />
				</owl:Restriction>
				<!-- -->
				<!-- Let's say that Employee can work only in #Rome , #Catania and 
					#Bologna -->
				<owl:Restriction>
					<owl:onProperty rdf:resource="#worksAt" />
					<owl:someValuesFrom>
						<owl:Class>
							<owl:oneOf rdf:parseType="Collection">
								<owl:Thing rdf:about="#Rome" />
								<owl:Thing rdf:about="#Catania" />
								<owl:Thing rdf:about="#Bologna" />
							</owl:oneOf>
						</owl:Class>
					</owl:someValuesFrom>
				</owl:Restriction>
			</owl:intersectionOf>
		</owl:Class>
	</owl:equivalentClass>
</owl:Class>

<owl:DatatypeProperty rdf:about="#firstname" />
<owl:DatatypeProperty rdf:about="#familyname" />
<owl:DatatypeProperty rdf:about="#email" />

<!-- #worksAt has range #Place -->
<owl:ObjectProperty rdf:about="#worksAt">
	<rdfs:range rdf:resource="#Place" />
</owl:ObjectProperty>

<!-- all the following places are distinct (no synonyms here) -->
<owl:AllDifferent>
	<owl:distinctMembers rdf:parseType="Collection">
		<owl:Thing rdf:about="#Rome" />
		<owl:Thing rdf:about="#Catania" />
		<owl:Thing rdf:about="#Bologna" />
		<owl:Thing rdf:about="#Moricone" />
	</owl:distinctMembers>
</owl:AllDifferent>


<!-- enridaga, to be an employee, must fulfill the restrictions defined 
	for the class #Employee. -->
<Person rdf:about="#enridaga">
	<!-- If you comment one of the next 4 statement, you won't have #enridaga 
		to result as #Employee. -->
	<firstname>Enrico</firstname>
	<familyname>Daga</familyname>
	<email>enridaga@example.org</email>
	<worksAt rdf:resource="#Catania" />

	<!-- If you uncomment the two statements below you will obtain an inconsistency, 
		because #Moricone is not an allowed place for developers -->
	<!-- <worksAt rdf:resource="#Moricone" /> <rdf:type rdf:resource="#Employee" 
		/> -->
</Person>

download it

The above differences depend on the semantic supported by the specific reasoner and from the implementation, which limit the power of the system in favour of a better efficiency (is the case of the /owlmini implementation of Jena, more efficient then the respective /owl). If you need to work with RDFS semantic, and don’t need OWL for your inferences, just use the RDFS one.

I can give as last example with real data from data.CNR.it, the Linked Data access point to CNR public data. I can get the information available about me by dereferencing the URI, add the link to the CNR ontology in my RDF file, and POST the whole to the OWL2 reasoning service (The CNR ontology is a network of ontologies connected with owl:imports statements, here we say only that it is a quite large model which covers lot of aspects of the organization)

Input would look like this:

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF
 xmlns="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:owl="http://www.w3.org/2002/07/owl#"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
	<owl:Ontology about="">
		<owl:imports rdf:resource="http://www.cnr.it/ontology/cnr/cnr.owl"/>
	</owl:Ontology>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<rdf:type rdf:resource="http://www.cnr.it/ontology/cnr/personale.owl#UnitaDiPersonaleInterno"/>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/rapportoConCNR/MATRICOLA11472">
		<n0pred:rapportoConPersona rdf:resource="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472"/>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<n0pred:nome rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ENRICO</n0pred:nome>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<rdfs:label xml:lang="it">DOTT. ENRICO DAGA</rdfs:label>
	</rdf:Description>
	<rdf:Description rdf:about="http://data.cnr.it/dataset/">
		<void:exampleResource rdf:resource="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472"/>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<n0pred:cognome rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DAGA</n0pred:cognome>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/strutturaGestionale/UO000.411">
		<n0pred:haAfferente rdf:resource="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472"/>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<n0pred:personaInRapporto rdf:resource="http://www.cnr.it/ontology/cnr/individuo/rapportoConCNR/MATRICOLA11472"/>
	</rdf:Description>
	<rdf:Description rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
		<n0pred:afferisceA rdf:resource="http://www.cnr.it/ontology/cnr/individuo/strutturaGestionale/UO000.411"/>
	</rdf:Description>
</rdf:RDF>

The REST call would look like this:

curl -X POST -H "Accept: application/rdf+xml" -H "Content-type: multipart/form-data" -F file=@enrico.rdf "http://localhost:8080/stanbol/reasoners/owl2/enrich/" >enrico.inferred.owl

In the resulted RDF we can see some interesting inferences, including:

<Thing rdf:about="http://www.cnr.it/ontology/cnr/individuo/strutturaGestionale/UO000.411">
	<suddivisioni:haAfferente rdf:resource="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472"/>

which states that the office I belong to (the afferisceA relation) has me as member (the haAfferente relation), or

<Thing rdf:about="http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleInterno/MATRICOLA11472">
	<rdf:type rdf:resource="http://www.cnr.it/ontology/cnr/personale.owl#UnitaDiPersonaleInterno"/>
	<rdf:type rdf:resource="http://www.cnr.it/ontology/cnr/persone.owl#Agente"/>
	<rdf:type rdf:resource="http://www.cnr.it/ontology/cnr/persone.owl#Persona"/>

which states that, being a UnitaDiPersonaleInterno (“internal employer unit” – we have a quite bureaucratic language ;) ) I am also Agente and Persona.

The current implementation supports input sources to combine ontologies loaded in Ontonet and Rules configured in the Rules module.
The parameters to use are:

  • scope // the ID of an Ontonet scope
  • session // The ID of an Ontonet session
  • recipe // The ID of a recipe from the Rules module (works only with OWLApi based services)

In the actual state, there are some limitations related to performance, which decrease while the input data grows, in some cases dramatically. To face this issue, we are working in two directions:

  • Improving the code to better deal with input preparation, when the service must merge input form different sources (for example a Ontonet Scope and a remote url);
  • Include information for the consistency check service to motivate the reason of the inconsistency by including, for example, the set of unsatisfied classes.
  • Support long-term operations, to start the process from the REST call and then ping it’s process through a dedicated endpoint.

Despite the weaknesses mentioned above, one major improvement would be to extend the input sources to support, for example, a whole graph in the store, and the possibility to schedule reasoning jobs to update the inferred graph regularly. Another direction I am interested into is the possibility of configuring input modifiers, to prune the input ontology to focus the reasoning on a specific inference (or on a set of), or to focus on a specific individual.

There is some work still to do on this aspect, feedbacks and comments are very welcome!

References:

4 Comments

  1. Thanks for this post Enrico, this is very interesting. It would be great to re-use some of this in the Stanbol documentation for the reasoners entry point (maybe using a simple English-language example with a smaller RDF than the CNR ontology).

    Here are two questions:

    1- For the “check” operation, if the data is inconsistent the HTTP code is 204 but does the body of the responses contains some explanations or maybe the a minimal subset of the triples that exhibit the first encountered inconsistency?

    2- Could you please give a short comparative description of each available reasoners (the links to the references do not seem to work). It would be nice to have a single example used on each reasoner that is able to emphasize the differences in their capabilities.

    • Thank you Olivier for your feedback. Yes, I will bring the basics of this article in the documentation soon.
      Regard your questions:
      1 – No, actually it doesn’t :( . This is a mandatory functionality that we need to add (I have added this note in the post);
      2 – I am preparing some snippets, will update this post soon.

      By the way, links are now fixed ;)

      Thanks for the feedback!

  2. Excellent! Thank you very much for giving the details. I know have a much better view of the capabilities of each.