Apache Stanbol Stress Test Tool

The Apache Stanbol Enhancer provides a RESTful API for extracting information from parsed content. So every time a user sends a document Stanbol applies applies its semantic engines to analyze the content. Extracted information is represented as RDF and returned in the response of the enhancement request. The next Figure visually presents this idea.

The Stanbol Enhancer applies Semantic Engines to parsed Content and returns extracted Information as RDF

But what happens if a lot of users want to enhance their documents or one needs to patch process a lot of documents? The new Apache Stanbol Stress Test Tool allows you to test exactly that as it allows to send multiple concurrent enhancement requests to your Stanbol server.

This tool is useful for Stanbol users that want to check if their Stanbol installation can cope with the expected amount of requests. In addition it can be also used to check/optimize specific Enhancement Chain configurations. For developers this tool is core for stress testing/optimizing enhancement engine implementations. Finally we expect that this tool will also be handy for reporting/replicating and fixing multi-threading related issues such as STANBOL-669.

Usage

This utility is implemented as Apache Stanbol Integration Test. To use it you need to checkout and build Stanbol first. After that you can use the tool by changing to the {stanbol-source}/integration-tests directory and calling

For detailed information on how to use/configure this tool please have a look at its documentation.

    mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest

This will run the default tests against the default Enhancement Chain of the {stanbol-server}. You can also customize the utilities by providing additional parameters using ‘-D{parameter}={value}’ as arguments:

  • stanbol.it.multithreadtest.chain: The name of the Enhancement Chain to test
  • stanbol.it.multithreadtest.data: The data used for Testing. See the documentation for supported formats.
  • stanbol.it.multithreadtest.threads: The number of concurrent requests (default 5)
  • stanbol.it.multithreadtest.requests: The maximum number of requests (default 500)

So a typical call to the tool might look like

     mvn -o test -Dtest=MultiThreadedTest \
-Dstanbol.it.multithreadtest.data=/stanbol/test/data/stanbol-test-data.txt.gz \
-Dstanbol.it.multithreadtest.requests=10000 \
         -Dstanbol.it.multithreadtest.threads=20 \
         -Dstanbol.it.multithreadtest.chain=myChain \
         -Dtest.server.url=http://www.example.org:8080/stanbol

Running a Test

If you run the test as described in the previous section you will get results similar to those in the following listing.

First you get some information about the configuration the test is using.

   – -----------------------------------------------------
    T E S T S
   – -----------------------------------------------------
   Running org.apache.stanbol.enhancer.it.MultiThreadedTest
       0 StanbolTestBase - test.server.url is set: not starting server jar (http://localhost:8080)
      15 MultiThreadedTest - Read Testdata from '10k_long_abstracts_en.nt.bz2'
      16 MultiThreadedTest -  ... init via Classpath
      17 MultiThreadedTest -   - InputStream: BufferedInputStream
     202 MultiThreadedTest -   - Media-Type: text/rdf+nt
    3070 MultiThreadedTest - Testing default Enhancement Chain

In this case the test uses the included default dataset of 10000 DBpedia.org abstracts compressed using BZ2 and encoded using Node Triples. If you provide your own test data this information will allow you to check that the data are read correctly by the tool.

The next section of the Log provides information about the initialization of the connection to the configured Stanbol server. The toll will wait up to 180 seconds for the server to become available.

    3071 StanbolTestBase - Will wait up to 180 seconds for server to become ready
    3635 StanbolTestBase - Got expected content for all configured requests, server is ready
    3651 MultiThreadedTest - Enhancement engines checked for '/enhancer?executionmetadata=true', all present

After the initialization the Log provides updates about the ongoing test. This allows to track the progress of long lasting tests.

    3680 MultiThreadedTest - Start Multi Thread testing of max. 500 requests using 5 threads
    3688 MultiThreadedTest - Iterate over values of property http://dbpedia.org/ontology/abstract
    3703 MultiThreadedTest -   ... sent 0 Requests (0 finished, 1 pending, 0 failed
    4076 MultiThreadedTest -   ... sent 100 Requests (2 finished, 99 pending, 0 failed
   10109 MultiThreadedTest -   ... sent 200 Requests (101 finished, 100 pending, 0 failed
   15213 MultiThreadedTest -   ... sent 300 Requests (201 finished, 100 pending, 0 failed
   20784 MultiThreadedTest -   ... sent 400 Requests (301 finished, 100 pending, 0 failed
   25879 MultiThreadedTest - > All 500 requests sent!
   25880 MultiThreadedTest -   ... wait for all requests to complete
   28881 MultiThreadedTest -   ... 449 finished, 51 pending, 0 failed
   31883 MultiThreadedTest -   ... 500 finished, 0 pending, 0 failed

After all tests are completed the Tool provides statistics about the performance of the Stanbol Enhancer, the Enhancement Chain and each configured Enhancement Engine.

   31883 MultiThreadedTest - Multi Thread testing of 500 requests (failed: 0) using 5 threads completed
   31883 MultiThreadedTest - Statistics:
   31883 MultiThreadedTest - Chain:
   31883 MultiThreadedTest -   Round Trip Time (Server + Transfer + Client):
   31883 MultiThreadedTest -      max: 1140ms | min: 33ms | avr: 265ms over 500 requests
   31883 MultiThreadedTest -   processing time (server side)
   31883 MultiThreadedTest -      max: 1109ms | min: 23ms | avr: 241ms over 500 requests
   31884 MultiThreadedTest - Enhancement Engines
   31884 MultiThreadedTest -   dbpediaLinking: max: 325ms | min: 0ms | avr: 34ms over 500 requests
   31884 MultiThreadedTest -   entityhubExtraction: max: 527ms | min: 4ms | avr: 96ms over 500 requests
   31884 MultiThreadedTest -   langid: max: 74ms | min: 7ms | avr: 19ms over 500 requests
   31884 MultiThreadedTest -   ner: max: 535ms | min: 1ms | avr: 87ms over 500 requests
   31884 MultiThreadedTest -   tika: max: 51ms | min: 0ms | avr: 4ms over 500 requests
   Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.952 sec

If you run this tool to validate the performance of the Stanbol Enhancer and/or specific Enhancement Engines you should warm-up the Stanbol Enhancer. This is important to give the JVM some time to optimize execution of java byte code, warm-up caches of Apache Solr. In addition Stanbol uses lazy initialization so the first request will need a lot of additional time.

Further Information

The full documentation of this Tool is available on the Apache Stanbol Webpage. If you have any question feel free to ask on stanbol-dev@incubator.apache.org.

Comments are closed.