Using Stanbol in Alfresco: Semantic Search and Knowledge Management

At Alfresco DevCon 2012 in Berlin Zaizi presented Alfresco & the Semantic Web.The focus of the presentation is how semantic technologies can be used to enable the discovery of information within an Alfresco repository and improve the user experience.

The agenda of the presentation;

  • Introduction to Semantic Web and Semantic Technologies: Introduction described the Semantic Web and Linked Data. I also explained the difference between these two concepts, and the importance of describing the data.
  • IKS Project: Zaizi were one of the early adopters of IKS, within Alfresco. I also explained the different IKS projects: IKS, Apache Stanbol and VIE.
  • ECKM – Enterprise Content and Knowledge Management: The internal project name at Zaizi for the integration of ECM and Semantic Web/Technologies. The functionalities presented were: language detection, entity extraction, semantic annotations and intelligent search.

I got great feedback for this presentation. Below, I documented all the functionalities I demonstrated so you can also enjoy the “Alfresco & the Semantic Web” demo. I’ll write another separate post explaining Semantic Web/Technologies or IKS technologies.

Language detection

International companies with offices worldwide, create documents in multiple languages. With the language detection functionality, they can auto-detect the language in which documents are written so it can be indexed correctly by the search engine. Apache Solr search engine can use this to index content correctly, build facets by language and allow multilingual search. This enables us to provide pretty much the same functionality that Google does: Pages in English.

Some documents even have more than one language (e.g. a contract in two languages).

Entity extraction

The metadata for a document or content is only just a small percentage of useful information within that document. A lot of information is held within the unstructured body of the document. With the entity extraction functionality, we can identify useful data within the content and annotate with additional metadata. Entities can be names of people, places or organisations mentioned within the body of the document, but also custom entities in specific domains knowledge. Using this extracted information we can help users navigate and discover documents in new ways through richer user interfaces.

Semantic Annotations

This functionality uses VIE to write HTML documents and annotate it with entities from DBPedia.

Intelligent Search

Intelligent Search aims to add more features to search engines that improves the search experience, like filtering results with semantic facets and disambiguated terms suggestion. With facets, users can refine their searches to get what they are searching for based on the underlying knowledge in their content. On the other hand, using terms disambiguation the system can guess what the user is searching for, and suggest more specific searching terms. Using Stanbol disambiguation engines, we can properly annotate documents with ambiguos concepts and entities, therefore the user can filter the results by selecting the correct meaning for their query because each concept is identified by its semantic category.

Through the following demo you’ll see how Zaizi’s Intelligent Search approach has been implemented in Alfresco.

My Presentation

Here are the slides from my presentation.

Author: Wernher

Wernher Behrendt is senior researcher at Salzburg Research and the coordinator of the IKS project

Comments are closed.