This is going to be a summary after two hectic days spent in Leipzig with the cleverest minds of the European Semantic Web community and it’s also my personal answer to an interesting post that appeared on semanticweb.com a few weeks ago (the post written by Jennifer Zaino was titled: “On What Shores Will Semantic Tech Be Better Commercialized?“) – before getting into the answer…what did we learn?
DATA is NOISY
…even loud sometimes: the need to cleanse it, manipulate it and systematically present it to the end user in an efficient way requires a lot of hard work; while adoption of Big Data and Semantic Web technologies starts to take off in Europe new challenges arise that will keep the academic community busy for many years to come; what are these challenges and who’s leading the research?
Relevance of Linked Data Facts is a big issue
As the number of entities (conceptual representation of real world things) and the data around those entities increase there is a need to create consolidated models that extract relevant information (and that can summarize it) – is it more relevant to display that Albert Einstein was an extraordinary physicist or that he was a fervent vegetarian? And this goes along with the problem of properly identifying the context (is my research about quantum physics or am I compiling a list of well-known vegetarians?) – a great presentation on Summarization and Relevance of #LinkedData facts was given by Dr. Harald Sack (aka @lysander07) and of course here is also worth mentioning the research activities of the STLab-ISTC of CNR in Italy on Aemoo and the science of knowledge patterns.
Multilanguage support is not an option – it has to be there!
If concepts are intrinsically multi-lingual (we call the Sun “Sole” in Italian, “Soleil” in French and “شمس” in Arabic but we’re referring to the same star) the data referring to these concepts is most likely expressed in different languages; this means that NLP techniques need to cover (at least in Europe) a great variety of languages. For this specific purpose Rupert Westenthaler (@westei) from the Salzburg Research explained how the modular approach of Apache Stanbol does provide a great advantage –
- along that same line we were happy to share our new FreeLing engine (FreeLing is an open-source language analysis tool suite) for Apache Stanbol and WordLift that now adds supports for Italian, Portuguese and Russian; last but not least we announced support for Freebase entity recognition (along side with DBpedia) to “wordlift” your unstructured texts.
Automatic Named Entity Disambiguation makes a big difference
While it’s quite straightforward for a human being to grasp if “Copyright”, in the context of a phrase, refers to a music band or to an IP legislation this isn’t always the case if it’s the machine that has to execute the task. In this field we were glad to hear that DBpedia Spotlight now supports Italian (and pretty much any language you wish to configure – here is the link for those of you interested in trying it out) – this is going to be soon integrated as standard disambiguation module of Apache Stanbol thanks to the great work of Pablo Mendes and his team at the Freie Universität Berlin.
DBpedia vs WikiData and the importance of property mapping
DBpedia is the most prominent interlinking hub in the Web of Data and enables access to many data sources in the Linked Open Data cloud. The genesis of DBpedia, as most of you already know, relies on web scraping of Wikipedia resources; few months ago thanks to the generous contribution of the Paul Allen’s foundation, Google and other partners of Wikimedia started WikiData with the purpose of re-engineering its content structure (keeping data aside from textual contents) and providing consistent datasets containing all information currently available on Wikipedia. Needless to say the task is massive and at present we couldn’t see other then a cross-linking strategy and a methodology to guide the whole project (more on this one from @brightbyte and @anjeve who were present at the event).
As far as WordLift is concerned, we’re using the official mappings as well a custom mappings file that can be tailored to practical specific needs; all this completed by refactoring rules that nest properties and entities according to the schema.org vocabulary (the most notable example being the connection between an entity and its location, which is expressed by a dependent entity of type Place and GeoCoordinates in schema.org while coordinates are directly attached to the entity in DBpedia).On the other end as WikiData prepares to become the interlinking hub of the future, although an extensive effort is required to polish data and to properly map them to shared vocabularies as schema.org (independently from the used format, being it microdata, RDFa Lite and so forth). DBpedia is already undertaking this effort via the mappings initiative (http://mappings.dbpedia.org/), which hopefully will grow in terms and community support.
Semantic Tech can be better commercialised in Europe than in the States but…
EU funded projects like LOD2 and IKS do represent a significant edge for companies that are ready to take advantage of Big Data; the problem is…how?
These large European-funded projects had contributed with a great wealth of open-source technologies but, we know from the experience acquired in the last two years, that implementation costs are a blocking point for small to medium enterprises. Now rather then getting lost in the hidden traps of Semantic Tech let’s get an overview of what can be really done to increase the competitiveness of European companies and why it is worth our attention:
- from our direct experience Big Data can transform social media streams into relevant facts helping companies making better decisions, improving their online reputation and finding new leads;
- content marketing, social media optimisation along with Semantic SEO can also prove extremely beneficial for business that want to promote their product and services in a consistent way across all channels;
- content discovery is another great feature semantic tech can provide to boost an online experience over web or mobile – increasing the time spent on a site in most cases provides advertisers with a much better ROI;
- Big Data can also increase sales volume for e-commerce websites. If you’re running an online music store being able to “suggest” the proper list of authors, songs or albums can create a great value in terms of conversions (and this is possible if you enrich your data with data available in the LOD cloud); needless to say using structured mark-up languages like GoodRelations can also provide a great visibility enhancement for your products;
- As companies becomes more “open” the data that is coming out of their databases (or the databases of their partners) can benefit a lot when it is interlinked within itself and with the LOD;
- Last but not least now that the public sector in Europe went massively “open” (Italy is an example) it’s time to move all these datasets in the LOD (I have not a precise figure here but I suspect that the great majority of institutions that embraced the “open data” movement still do not provide other then .csv and .xml – and thank God they do!).
What is missing to make these Web Data technologies useful from the business point of view?
Why if you’re setting up your e-commerce store to sell wine (or music) or if you need to set the price to rent the rooms in your hotel you’re not using any of these Semantic Tech goodies?
The answer is simple and most likely goes in the following directions:
- Your IT team is quite small (or it has been heavily reduced in the past 24 months) and cannot afford to dive in the endless amount of information required to take advantage of these technologies;
- Your budget constraints don’t leave room to invest in setting up … yet another framework;
- The data that can really make your business grow isn’t out there yet and/or it’s too expensive for a single player to build and maintain even a single dataset.
And this is why I do believe any “Big-Data-as-a-Service” or “Platform-as-a-Service” venture that aims in providing easy-to-use services for SMEs in Europe in the next 6 months can make a big difference and it will speed the adoption of Semantic Tech by taking real advantages of the EU-funded investments in this field.