Three phases of the Semantic Web

The slides I presented at the IKS Workshop are now on slideshare (font messed up a bit, I'll have a go at uploading a pdf version later) and at slides.odp. Probably more useful for a skim are the preparatory notes. I think my main quasi-novel point was that historically the (Semantic) Web could be said to have been through three phases:

1. "It's all about the docs"

the traditional Document Web, with a bit of metadata

2. "No, it's all about the things"

the upper-case Semantic Web, reaching a zenith with Linked Data

3. "Ok, maybe the docs are important after all"

the current phase, not docs exor data but a synthesis of what's gone before - all the Linked Data goodness, what we've learnt about REST, with Web APIs and a variety of media types (like JSON plus JSON-LD), all the smarter CMS stuff with natural language processing bits, the search stuff, bringing in RDFa/microdata/microformats, all together with some gentle relaxation of constraints (think schema.org) - and gaining truly mainstream adoption

Apologies to anyone in Salzburg that followed the link I gave in the slides, I'd totally forgotten that the service there was broken. Just spent this morning setting up a live instance of Seki on hyperdata.org to fix that. Well, kinda live, all it's actually doing now is serving up a handful of static pages and giving the crawlers a 404. There are quite a few things I need to fix up - some thought needed around config and most of all I need to get some auth in place, like yesterday. But having it live is pretty good motivation to get things fixed up.


danja
2012-06-22T13:09:33+01:00
cms iks salzburg semweb rdf
Related
Comments
Edit

IKS Salzburg, Day 1

At the Semantic Enterprise Technologies workshop in Salzburg. Very good so far. Too busy listening to comment :)


danja
2012-06-12T14:31:34+01:00
iks semantic salburg rdf
Related
Comments
Edit

A first taste of the schema.org carbonated soft drink

I recently realised that in my Seki project it made sense to have any exposed HTML include its own description, amongst other reasons to support IKS-flavoured decoupled content management. I'll use RDFa because the mapping to RDF is more straightforward than HTML5 microdata and there's more comprehensive vocab coverage than microformats. But given that I'm exposing this stuff, it also makes sense to have it understandable by as many consumers as possible. Which pretty much means using schema.org vocabularies (straight RDF representations will also be available via conneg, there I might stick to existing well-known vocabs, see note below).
My initial raft of use cases are around having content that's (loosely) blog post-shaped, but even though schema.org has a section for blogging it isn't immediately obvious how to express this. (Now would probably be a good time to revisit AtomOwl, it got left in a very complicated state, Atom-in-Schema.org would tick quite a lot of boxes).
My typical item looks something like:
<http://hyperdata.org/Hello> a sioc:Post ;
	dc:date "2012-04-02T07:24:53.676Z" ;
	dc:title "Hello World!" ;
	sioc:content "My first post." ;
	foaf:maker [ foaf:nick "danja" ] .
Checking at the excellent schema.rdfs.org I found the following mappings pretty quickly:
schema:articleBody owl:equivalentProperty sioc:content .
schema:author owl:equivalentProperty foaf:maker .
sioc:content isn't quite right in my original as that's meant to be plain text, Dave Beckett's planet:content is probably better - it's like the old RSS 1.0 content:encoded except as a more sensible XMLLiteral. articleBody isn't perfect, for my app or for that matter for a lot of RSS/Atom/blogging-like apps. A more generic content would be better (which might be an articleBody, or it might be a description of the link or whatever, more on description in a mo).
Though I found near-enough mappings, the following suffer similar problems:
schema:name rdfs:subPropertyOf dc:title .
schema:datePublished owl:equivalentProperty dc:issued .
schema:Article rdfs:subClassOf sioc:Item .
name is one of those ultra-generic terms alongside title and label, mixed blessing: very easy to work with but don't offer very much information. For my purposes there isn't much to choose between them. datePublished seemed slightly more suitable than dateCreated or dateModified. Here I would have preferred to be able to use a more generic date, further qualifying only when necessary. Again Article is a bit on the specific side, I want to be able to use this for things like a del.icio.us-style bookmark, for this coverage rss:item, sioc:Item and atom:Entry are all a bit closer. Which leaves:
foaf:nick rdfs:subPropertyOf schema:additionalName .
Near enough.

Top-level terms

I think it would be very helpful if schema.org was a bit clearer about "top-level" terms. Right now Thing has description, name, image, url. Ok, not bad as a first pass against what's needed on the Web. But url is/should be redundant (but that's just my semweb prejudices), there's slight conflict between description and content-oriented terms like articleBody which has the intermediate node of Article. (This isn't a new phenomenon, RSS history is littered with the wreckage of content vs. description, and higher up the architectural tree it's one of the features of httpRange-14). Ok, maybe description is useful enough to leave alone, similarly name is probably reasonable to cover the top level of label, title, name. image I suppose is fair enough, a pragmatic approach to something that could easily get messy if more WebArch was brought into the picture. I guess my recommendations then would be to add a term Item (for a generic Information Resource, superclass of Article etc) and date (for a superproperty of all dates).

Automatic mapping

I haven't yet decided whether or not to use the Web vocab or schema.org versions of the terms in my internal RDF, I suppose I could even use both. But my little experience above demonstrates it's not yet obvious how to map across even with these really common terms. If the starting point was something richer, the amount of work involved could easily explode. Some kind of automation is desirable, for the benefit of someone like me in the current situation, a publisher of semantically marked-up HTML that would like their material to connect with the Linked Data Cloud, or someone writing an app that consumes data across different vocabularies. A service (or two) springs to mind: give it a term and it responds with correspondences from other vocabs, or give it a lump of data and let it offer a translation to the preferred vocab(s)/format. There are at least two approaches to implementation: SPARQL CONSTRUCT and/or RDFS/OWL inference (in both cases the use of generic superclasses/properties could be useful). The front end could offer something like the Rich Snippets Testing Tool for authors together with an open API for translation by app developers, to give a leg-up for integration/mashups. It would be nice if the good folks behind schema.org would consider throwing some resources in this direction.

See also :

Comments to G+ please


danja
2012-04-05T15:13:53+01:00
iks seki rdfa html schema.org semantic semweb rdf
Related
Comments
Edit