Running UIMA Engines in Stanbol using the same JVM

Introduction

This post is a follow up of my previous posts about using UIMA with Apache StanbolĀ  (see https://blog.iks-project.eu/uima-apache-stanbol-integration/ and https://blog.iks-project.eu/uima-apache-stanbol-integration-2/)! As I have already explained in these posts, running UIMA engines and Apache Stanbol in the same Java Virtual Machine (JVM) is not the easiest thing to do. The root of the problem is that both systems try to do something similar but in different ways: they both rely on customized run-time controlled class loading (in certain settings). In UIMA, you can define your own type system for the annotations. For every element in a type system, a Java class will be generated. When running an UIMA aggregate, naturally all these classes will be needed.Therefore, if you are designing a larger system that relies on many UIMA components, you has to have either all the classes in advance in compile-time, or an special class loader is needed that takes care of loading the classes run-time. UIMA has its own class loader implementation for this purpose that works great: it allows you to load and instantiate classes from a pear file any time.

Stanbol Architecture is built on the top of Felix, that is an OSGi implementation. A principle of OSGi is modularization that allows installing and removing any OSGIi-complatible modules (bundled in jar files) run-time. Of course, this also involves the need of loading classes run-time from jar files that are not there on system startup. For this purpose, Felix also has its own class loading mechanism. Moreover, the visibility of the packages that contain the classes is controlled: OSGi bundles need to specify which classes they export, which classes they use privately and which they import.

This vision I had at the beginning of this work was that Stanbol admins should be able to get a provided UIMA pear package working with the help of an also provided OSGi Stanbol Bundle, without compilation and bundling. You can see now why it cannot be achieved without serious hacking in the same JVM: if you want to execute an UIMA Annotation Engine (AE) in a Stanbol Enhancement Engine (EE), the packages and classes of the AE needs to be imported by the EE bundle, which means that you have to make at least a re-bundling when you know which UIMA packages you will need. Also, the corresponding packages need to be exported, so the UIMA stuff needs to be converted to an OSGi Bundle. This is the approach I will describe in this post.

Alternatively, you could create your own, single, custom bundle that contains everything from the UIMA components to the OSGi services that is needed. I think this requires serious skills though.

These issues are the reason the Remote UIMA client was created, which communicates with a Standalone UIMA SimpleServlet that can load a pear file trough HTTP REST. The method was described in my previous posts, and I recommend that approach for most cases.

This post is for those who are familiar with creating bundles or really need to get rid of the overhead the HTTP REST communication involves for maximum performance.

Bundling an UIMA engine

The first thing we do is to convert every UIMA-related stuff into a bundle. These includes the uima-core jars, the UIMA AE sources or jars, and optionally resource files and configuration files.

In these steps we are relying on the the http://felix.apache.org/site/creating-bundles-using-bnd.html description that details how to turn your jars to bundles.

  1. Create an empty directory
  2. Put the following pom.xmlto the directory
    <project
      xmlns="http://maven.apache.org/POM/4.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    
      <modelVersion>4.0.0</modelVersion>
      <groupId>org.apache.felix.commons</groupId>
      <artifactId>uima-core-osgi</artifactId>
      <name>${pom.artifactId} bundle</name>
      <description>
        This bundle simply wraps uima-core-${pom.version}.jar.
      </description>
      <version>2.3.1</version>
      <packaging>bundle</packaging>
    
      <organization> 
        <name>Apache Felix Project</name> 
        <url>http://felix.apache.org/</url> 
      </organization>
    
      <dependencies>
            <dependency>
                <groupId>org.apache</groupId>
                <artifactId>org.apache.uima</artifactId>
                <version>2.3.1</version>
                <type>jar</type>
            </dependency> 
      </dependencies>
    
      <build>
        <plugins>
    	<plugin>
    	    <groupId>org.apache.maven.plugins</groupId>
    	    <artifactId>maven-compiler-plugin</artifactId>
    	    <configuration>
    		<source>1.6</source>
    		<target>1.6</target>
    	    </configuration>
    	</plugin>
          <plugin>
            <groupId>org.apache.felix</groupId>
            <artifactId>maven-bundle-plugin</artifactId>
            <extensions>true</extensions>
            <configuration>
              <instructions>
                    <Bundle-SymbolicName>${pom.artifactId}</Bundle-SymbolicName>
    		<Export-Package>
    				org.apache.uima,
    				org.apache.uima.analysis_engine,
    				org.apache.uima.analysis_component,
    				org.apache.uima.jcas.tcas,               
    				org.apache.uima.jcas,                                     
    				org.apache.uima.cas,                     
    				org.apache.uima.analysis_engine.metadata,
    				org.apache.uima.resource,
    				org.apache.uima.resource.metadata,
    				org.apache.uima.util,
    				org.apache.uima.cas.impl,
    				org.apache.uima.analysis_engine.annotator,
    				org.apache.uima.cas.admin,               
    				org.apache.uima.cas.text,               
    				org.apache.uima.cas_data,               
    				org.apache.uima.collection,             
    				org.apache.uima.collection.metadata,     
    				org.apache.uima.examples,                
    				org.apache.uima.flow,                   
    				org.apache.uima.internal.util,           
    				org.apache.uima.internal.util.rb_trees,  
    				org.apache.uima.internal.util.text,
    				org.apache.uima.jcas.cas,                
    				org.apache.uima.pear.tools,              
    				org.apache.uima.pear.util,               
    				org.apache.uima.search,                  
    				org.apache.uima.uimacpp,    
    
    				org.apache.uima.examples.tagger             
    		</Export-Package>
    		<Private-Package>			
    				org.apache.uima.analysis_engine.asb,     
    				org.apache.uima.analysis_engine.asb.impl,
    				org.apache.uima.analysis_engine.impl,    
    				org.apache.uima.analysis_engine.impl.compatibility,
    				org.apache.uima.analysis_engine.metadata.impl,
    				org.apache.uima.analysis_engine.service.impl,
    				org.apache.uima.resource.impl,           
    				org.apache.uima.collection.base_cpm,    
    				org.apache.uima.collection.impl,        
    				org.apache.uima.resource.metadata.impl,  
    				org.apache.uima.resource.service.impl,   
    				org.apache.uima.search.impl,             
    				org.apache.uima.flow.impl,               
    				org.apache.uima.impl,                   
    				org.apache.uima.jcas.impl,               
    				org.apache.uima.util.impl,
    
    				org.apache.uima.annotator,
    				org.apache.uima.examples.tagger.trainAndTest
    		</Private-Package>              
              </instructions>
            </configuration>
          </plugin>
        </plugins>
      </build>
    
    </project>
  3. This pom file was created for the Hidden Markov Model based tagger that is trained on the Brown corpus and is in the UIMA Examples package.
    In the <Export-Package> section delete the last item (org.apache.uima.examples.tagger) and add the main package of your UIMA Engine.
  4. In the <Private-Package> section delete the last two items (org.apache.uima.annotator, org.apache.uima.examples.tagger.trainAndTest) and add every class that your UIMA AE contains and is not needed to be visible externally)
  5. put the uima-core-2.3.1.jar in the root of this directry
  6. If your UIMA AE is in a jar file, put that also here. If you have source files, put them under src/main/java. In the unlikely case you only have compiled classes, put them into target/classes
  7. run mvn package
  8. Now you have your bundle in the target directory!

You can download the bundle that contains the HMMTagger from here:http://pedia2.sztaki.hu/stanbol/bundles/uima-core-osgi-2.3.1.jar

The folder that contains the pom project is here: http://pedia2.sztaki.hu/stanbol/sources/uima-bundle-manual/

Modifying UIMALocal Bundle

In the previous section we have created a bundle that exports the necessary UIMA packages. I have written an other bundle that can be configured to load the UIMA engine described in a a descriptor XML, use the engine for analysing Stanbol content and turn the UIMA Annotations into CasLight annotations and store them in the ContentItem. This can be converted to RDF triples using the UIMAToTriples bundle that is also used in UIMARemote setting, described here: https://blog.iks-project.eu/uima-apache-stanbol-integration-2/

UIMALocal was specifically designed for using UIMA engines in a configurable way. Ideally, you would have to do no modifications at all, only deploying and configuring the bundle. However, as I explained at the beginning of the post, this cannot be achieved: you have to import the classes in the bundle manifest that contain the specific UIMA AE types. Unfortunately, for this minor thing you will have to make a small modification in the UIMALocal project and re-bundle it.

  1. Download the UIMALocal project from here: http://pedia2.sztaki.hu/stanbol/sources/UIMALocal.tar.gz
  2. The tar contains a NetBeans Project. So you can open this in NetBeans, but you should also be able to do the modifications manually and the running mvn package.
  3. Edit the hu.sztaki.uimalocal.Imports class: import at least one class from every package of your UIMA project. This will result in an import statement in the bundle manifest, when you re-package it.
  4. run mvn package. It should invoke a compile operation, etc.
  5. the resulted bundle is now ready to be deployed.

Alternatively, if you do not want to do any of the above, you can un-zip the UIMALocal-0.2.0.jar, edit to META-INF/MANIFEST.MF and import the packages of your specific UIMA bundle. In this case you have to take care of the proper line breaking of the java manifest files (lines in manifest files cannot be longer than 72 characters)

Installing and Configuring the bundles

At this point the following steps remain:

  1. Install the uima-core bundle that now contains the uima-core+the UIMA classes in question
  2. Install the UIMA-Local bundle
  3. Configure the UIMA-Local bundle. The configuration of this bundle is somewhat similar to the configuration of UIMARemote.source name: this will be set as the source name for the annotations that are coming out from this bundle. You can refer to these annotations using the source name in the UIMAToTriples config.
    UIMA descriptor file path: the location of the UIMA descriptor file to load.
    ContentPart Uri Reference: this will be the UriRef under which every annotation will be stored in the ContentItem.
    Supported Mime Types: the mime type this UIMA adapter should support.

Here is an Example Configuration:

A corresponding UIMAToTriples Configuration that reads the Annotations provided by UIMALocal and keeps the noun-typed words (posTag = n.*) which will be turned to RDF in the output.

 

Trying it out

The configuration above is the one that is deployed here right now: http://pedia2.sztaki.hu:9090/enhancer

If you enter an English text, you will get RDF graphs like this:

<urn:enhancement-de51d84d-8e96-d6c0-2b0c-2cfa3ec6c1c0>
      a       <http://fise.iks-project.eu/ontology/TextAnnotation> , <http://fise.iks-project.eu/ontology/Enhancement> ;
      <http://fise.iks-project.eu/ontology/end>
              "674"^^<http://www.w3.org/2001/XMLSchema#int> ;
      <http://fise.iks-project.eu/ontology/extracted-from>
              <urn:content-item-sha1-82082a61141a7914cc68b08180f8992ebe660302> ;
      <http://fise.iks-project.eu/ontology/start>
              "665"^^<http://www.w3.org/2001/XMLSchema#int> ;
      <http://purl.org/dc/terms/created>
              "2012-08-28T21:34:41.734Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
      <http://purl.org/dc/terms/creator>
              "hu.sztaki.uimatotriples.UIMAToTriples"^^<http://www.w3.org/2001/XMLSchema#string> ;
      <http://purl.org/dc/terms/type>
              <Noun> ;
      <sso:posTag> "nns" .

I hope you will enjoy using UIMALocal (in the case UIMARemote does not fit your needs)...

Comments are closed.