Afterthoughts – IKS workshop in Paris and the changing commercial landscape of semantics in the Web

©♀Μøỳαл_Bгεлл♂'s/ Flickr

So, IKS did it again – and managed to bring together some 90 people in Paris to present to each other, how they were already using IKS and other semantic technologies. Add to this a bunch of “from the lab” demos showing what “IKS and friends” are brewing up next, and add to it, half a dozen wise men and women sharing their knowledge about the semantics and CMS market. Present the whole thing in a culturally open and diverse setting such as the FIAP Jean Monnet Centre – with young artists, technologists and language learners floating around and you have two days of being refreshed by open minds and spirits. Paris reciprocated with summery temperatures and good wine.

©freddy/Flickr

You would expect me now to go inside the workshop, with all the demos and the discussions about how to leverage semantics, and present to you, all the great things IKS will be doing for semantic CMS. Before I do that, let me take you up a virtual hill outside IKS and briefly look down on the grand landscape of the commercial Web:

Way back in the distance, there is a small village. About eight or nine years ago, somebody in my professional surroundings (the village) started a PhD about “trails” in the web and how one could use such trails for better personalisation. I had a feeling then, that they should rather do research on how not to leave trails and traces in the web.

Very close in front of us, building work has started – two buildings are clearly visible, but there is some other stuff going on, too – four things in all, that I can work out from where I am standing:

  1. In May 2011, the large search engine companies (yahoo, Google, Microsoft Bing) have issued their guidelines to structured semantics in Web Content, on their web-site schema.org
  2. In June 2011, Google started their own social network application, Google+ and is rumoured to have 10 mio users as of today (13/07/2011): https://plus.google.com
  3. In July 2011, the Eclipse foundation relaunched their Web-ID project “Higgins” via a semantics-based new crowd-sourcing approach: http://www.person-ontology.org/
  4. Also in July 2011, a tweet from a well-respected colleague in the European Commission caught my attention: “dark pools of data?” He points to http://blogs.forbes.com/kashmirhill/2011/07/11/how-banks-plan-to-compete-with-groupon/ for the issue at stake, and to http://en.wikipedia.org/wiki/Dark_liquidity for the theory behind the issue at stake.

Zooming out again and looking a bit further back into the distance, in 2005, I wrote a paper for a semantics conference where in the last section I asked: “My body belongs to me – how about my data?”

The cases (1) to (4) are ample evidence for the immense value of personal data in the networked commercial world. As scientists, we were intrigued by the difficulties of extracting valuable information from unstructured data sources. We should note that the commercial world has understood two things: a) structured data is better processable by machines than unstructured data and b) now is “Gold-Rush” time: most users of the Internet have not yet understood in full, that their data is the gold-dust of web-based companies and that the balance of legal frameworks is tilting towards the gold-diggers and away from those on whose land (actually, body and life!) the data originates.

Why is the coordinator of “interactive knowledge” taking you up this hill? The reason is that we (all) have some uncomfortable choices ahead of us, and IKS is at the forefront of these dilemmas:

© onkel_wart / Flickr

1) are you, as a researcher, more interested in how to extract information from unsuspecting users of the Web, or in helping them to keep control of their privacy? My underlying ethical assumption here is that my data in the Web is equivalent to my body in the real world. And in the real world, people cannot even take photographs without my permission, let alone capture my body without a search warrant.

2) are you, as a CMS technology provider, more interested in getting your share while the gold-rush is on, or are you planning to develop a long-term business relationship where you gain perhaps only moderate benefits in exchange for a good service to customers who would like to maintain control over their private data.

3) are you, as a “netizen” of 2011, more interested in getting many services on the web “for free” – in exchange for a few bits of data here and there, or would you prefer to have clear choices that have a clear business transaction with a clear value on each side of the transaction? Do you think you are on the “winning” side of the Web as it develops?

Let’s go into the workshop now – what have we seen?

  • several commercial vendors who show that they can semantically augment their web content by recognising named entities, and by categorising them as places, organisations or people.
  • several research prototypes that let you combine local data with publicly available data and that let you configure your virtual, distributed semantic content store
  • several key notes that addressed the opportunities for the CMS market, for semantic recognition and synthesis tools and for making use of big data in the web, and for relating techie challenges to end user benefit in the enterprise world

We at IKS, have also seen where our strong contribution is: making it possible for small firms to catch a small share of a diverse market that has high connectivity on the one hand (the Web), but is often very localised on the other hand. Our components are not high-end in performance, but top-of-the range in terms of getting 80% benefit with 20% of effort needed. This is where Stanbol is winning, at present.

We are also seeing where the next significant contribution is likely to be: making it possible for small firms to create web based user interfaces which exploit the semantic annotations or enrich them, in a particular customer application. In other words, we need to support fast-to-develop customer applications for business uses of small-to-medium complexity. This is where VIE needs to create impact in the next 12 months. Schema.org can be a great multiplier if you are able to find your ethical balance between information monopoly and fair data share.

In June 2011, I decided to de-activate my facebook account. I have elected not to join Google+. My currently active, two social media applications are Linked-In and Twitter. My main working tool remains email, with a flat reverse-chronological order. I am over 50. I expect others to make other choices, and I wish mine to be respected by a pluralistic society in the Web.

Thanks to all who have contributed to IKS so far – I hope it is enriching and rewarding for you, to be part of this project. We still have 18 months of journey and each time you climb on that virtual hill, something has changed in the landscape below. You play a role in the changes that are taking place.

Cheers,
Wernher

Wernher

Author: Wernher

Wernher Behrendt is senior researcher at Salzburg Research and the coordinator of the IKS project

2 Comments

  1. Dear Wernher,

    good points you make – concerning the journey and trails people leave: I guess the trail-idea is specifically one (possible) approach to keep privacy AND benefit from the fact that others (that you do not personally need to know) have been here before. It’s a sort of anonymous recommendation.

    But still: you are quite right that there has been a tremendous change in being on the Net: in previous days there quotes like “on the internet nobody knows you’re a dog”. Today it’s lot of private details that are given away freely by people.

    I am convinced that IKS can make a good contribution here as well!

    Cheers,

    SR

  2. Wernher,
    Thoughtful summary of an interesting couple of days in Paris. I feel that recent announcements, like Schema.org, will increase the understanding of the benefits of semantic mark-up in the wider world. Your comments on data publishing and who controls access to published data are interesting. The privacy of personal data is a hot topic in the UK and something I think more people are going to be concerned about, but I suspect the scope of some of the data in the hands of very large organisations may not be fully appreciated. As a CMS vendor we are, currently more interested in relating the less data focus site content with the data that is being published to try and reduce the scope for misinterpretation.

    Our summary of the conference can be read at http://bit.ly/qBwODF.

    Final comment, I don’t remember much ‘good wine’, just chicken wings and beer!

    Gary
    GOSS Interactive.