The Apache Stanbol Enhancer provides a RESTful API for extracting information from parsed content. So every time a user sends a document Stanbol applies applies its semantic engines to analyze the content. Extracted information is represented as RDF and returned in the response of the enhancement request. The next Figure visually presents this idea.

The Stanbol Enhancer applies Semantic Engines to parsed Content and returns extracted Information as RDF
But what happens if a lot of users want to enhance their documents or one needs to patch process a lot of documents? The new Apache Stanbol Stress Test Tool allows you to test exactly that as it allows to send multiple concurrent enhancement requests to your Stanbol server.
This tool is useful for Stanbol users that want to check if their Stanbol installation can cope with the expected amount of requests. In addition it can be also used to check/optimize specific Enhancement Chain configurations. For developers this tool is core for stress testing/optimizing enhancement engine implementations. Finally we expect that this tool will also be handy for reporting/replicating and fixing multi-threading related issues such as STANBOL-669.
Usage
This utility is implemented as Apache Stanbol Integration Test. To use it you need to checkout and build Stanbol first. After that you can use the tool by changing to the {stanbol-source}/integration-tests
directory and calling
For detailed information on how to use/configure this tool please have a look at its documentation.
mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest
This will run the default tests against the default Enhancement Chain of the {stanbol-server}. You can also customize the utilities by providing additional parameters using ‘-D{parameter}={value}’ as arguments:
stanbol.it.multithreadtest.chain
: The name of the Enhancement Chain to teststanbol.it.multithreadtest.data
: The data used for Testing. See the documentation for supported formats.stanbol.it.multithreadtest.threads
: The number of concurrent requests (default 5)stanbol.it.multithreadtest.requests
: The maximum number of requests (default 500)
So a typical call to the tool might look like
mvn -o test -Dtest=MultiThreadedTest \
-Dstanbol.it.multithreadtest.data=/stanbol/test/data/stanbol-test-data.txt.gz \
-Dstanbol.it.multithreadtest.requests=10000 \
-Dstanbol.it.multithreadtest.threads=20 \
-Dstanbol.it.multithreadtest.chain=myChain \
-Dtest.server.url=http://www.example.org:8080/stanbol
Running a Test
If you run the test as described in the previous section you will get results similar to those in the following listing.
First you get some information about the configuration the test is using.
– ----------------------------------------------------- T E S T S – ----------------------------------------------------- Running org.apache.stanbol.enhancer.it.MultiThreadedTest 0 StanbolTestBase - test.server.url is set: not starting server jar (http://localhost:8080) 15 MultiThreadedTest - Read Testdata from '10k_long_abstracts_en.nt.bz2' 16 MultiThreadedTest - ... init via Classpath 17 MultiThreadedTest - - InputStream: BufferedInputStream 202 MultiThreadedTest - - Media-Type: text/rdf+nt 3070 MultiThreadedTest - Testing default Enhancement Chain
In this case the test uses the included default dataset of 10000 DBpedia.org abstracts compressed using BZ2 and encoded using Node Triples. If you provide your own test data this information will allow you to check that the data are read correctly by the tool.
The next section of the Log provides information about the initialization of the connection to the configured Stanbol server. The toll will wait up to 180 seconds for the server to become available.
3071 StanbolTestBase - Will wait up to 180 seconds for server to become ready 3635 StanbolTestBase - Got expected content for all configured requests, server is ready 3651 MultiThreadedTest - Enhancement engines checked for '/enhancer?executionmetadata=true', all present
After the initialization the Log provides updates about the ongoing test. This allows to track the progress of long lasting tests.
3680 MultiThreadedTest - Start Multi Thread testing of max. 500 requests using 5 threads 3688 MultiThreadedTest - Iterate over values of property http://dbpedia.org/ontology/abstract 3703 MultiThreadedTest - ... sent 0 Requests (0 finished, 1 pending, 0 failed 4076 MultiThreadedTest - ... sent 100 Requests (2 finished, 99 pending, 0 failed 10109 MultiThreadedTest - ... sent 200 Requests (101 finished, 100 pending, 0 failed 15213 MultiThreadedTest - ... sent 300 Requests (201 finished, 100 pending, 0 failed 20784 MultiThreadedTest - ... sent 400 Requests (301 finished, 100 pending, 0 failed 25879 MultiThreadedTest - > All 500 requests sent! 25880 MultiThreadedTest - ... wait for all requests to complete 28881 MultiThreadedTest - ... 449 finished, 51 pending, 0 failed 31883 MultiThreadedTest - ... 500 finished, 0 pending, 0 failed
After all tests are completed the Tool provides statistics about the performance of the Stanbol Enhancer, the Enhancement Chain and each configured Enhancement Engine.
31883 MultiThreadedTest - Multi Thread testing of 500 requests (failed: 0) using 5 threads completed 31883 MultiThreadedTest - Statistics: 31883 MultiThreadedTest - Chain: 31883 MultiThreadedTest - Round Trip Time (Server + Transfer + Client): 31883 MultiThreadedTest - max: 1140ms | min: 33ms | avr: 265ms over 500 requests 31883 MultiThreadedTest - processing time (server side) 31883 MultiThreadedTest - max: 1109ms | min: 23ms | avr: 241ms over 500 requests 31884 MultiThreadedTest - Enhancement Engines 31884 MultiThreadedTest - dbpediaLinking: max: 325ms | min: 0ms | avr: 34ms over 500 requests 31884 MultiThreadedTest - entityhubExtraction: max: 527ms | min: 4ms | avr: 96ms over 500 requests 31884 MultiThreadedTest - langid: max: 74ms | min: 7ms | avr: 19ms over 500 requests 31884 MultiThreadedTest - ner: max: 535ms | min: 1ms | avr: 87ms over 500 requests 31884 MultiThreadedTest - tika: max: 51ms | min: 0ms | avr: 4ms over 500 requests Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.952 sec
If you run this tool to validate the performance of the Stanbol Enhancer and/or specific Enhancement Engines you should warm-up the Stanbol Enhancer. This is important to give the JVM some time to optimize execution of java byte code, warm-up caches of Apache Solr. In addition Stanbol uses lazy initialization so the first request will need a lot of additional time.
Further Information
The full documentation of this Tool is available on the Apache Stanbol Webpage. If you have any question feel free to ask on stanbol-dev@incubator.apache.org.