A first taste of the schema.org carbonated soft drink

I recently realised that in my Seki project it made sense to have any exposed HTML include its own description, amongst other reasons to support IKS-flavoured decoupled content management. I'll use RDFa because the mapping to RDF is more straightforward than HTML5 microdata and there's more comprehensive vocab coverage than microformats. But given that I'm exposing this stuff, it also makes sense to have it understandable by as many consumers as possible. Which pretty much means using schema.org vocabularies (straight RDF representations will also be available via conneg, there I might stick to existing well-known vocabs, see note below).
My initial raft of use cases are around having content that's (loosely) blog post-shaped, but even though schema.org has a section for blogging it isn't immediately obvious how to express this. (Now would probably be a good time to revisit AtomOwl, it got left in a very complicated state, Atom-in-Schema.org would tick quite a lot of boxes).
My typical item looks something like:
<http://hyperdata.org/Hello> a sioc:Post ;
	dc:date "2012-04-02T07:24:53.676Z" ;
	dc:title "Hello World!" ;
	sioc:content "My first post." ;
	foaf:maker [ foaf:nick "danja" ] .
Checking at the excellent schema.rdfs.org I found the following mappings pretty quickly:
schema:articleBody owl:equivalentProperty sioc:content .
schema:author owl:equivalentProperty foaf:maker .
sioc:content isn't quite right in my original as that's meant to be plain text, Dave Beckett's planet:content is probably better - it's like the old RSS 1.0 content:encoded except as a more sensible XMLLiteral. articleBody isn't perfect, for my app or for that matter for a lot of RSS/Atom/blogging-like apps. A more generic content would be better (which might be an articleBody, or it might be a description of the link or whatever, more on description in a mo).
Though I found near-enough mappings, the following suffer similar problems:
schema:name rdfs:subPropertyOf dc:title .
schema:datePublished owl:equivalentProperty dc:issued .
schema:Article rdfs:subClassOf sioc:Item .
name is one of those ultra-generic terms alongside title and label, mixed blessing: very easy to work with but don't offer very much information. For my purposes there isn't much to choose between them. datePublished seemed slightly more suitable than dateCreated or dateModified. Here I would have preferred to be able to use a more generic date, further qualifying only when necessary. Again Article is a bit on the specific side, I want to be able to use this for things like a del.icio.us-style bookmark, for this coverage rss:item, sioc:Item and atom:Entry are all a bit closer. Which leaves:
foaf:nick rdfs:subPropertyOf schema:additionalName .
Near enough.

Top-level terms

I think it would be very helpful if schema.org was a bit clearer about "top-level" terms. Right now Thing has description, name, image, url. Ok, not bad as a first pass against what's needed on the Web. But url is/should be redundant (but that's just my semweb prejudices), there's slight conflict between description and content-oriented terms like articleBody which has the intermediate node of Article. (This isn't a new phenomenon, RSS history is littered with the wreckage of content vs. description, and higher up the architectural tree it's one of the features of httpRange-14). Ok, maybe description is useful enough to leave alone, similarly name is probably reasonable to cover the top level of label, title, name. image I suppose is fair enough, a pragmatic approach to something that could easily get messy if more WebArch was brought into the picture. I guess my recommendations then would be to add a term Item (for a generic Information Resource, superclass of Article etc) and date (for a superproperty of all dates).

Automatic mapping

I haven't yet decided whether or not to use the Web vocab or schema.org versions of the terms in my internal RDF, I suppose I could even use both. But my little experience above demonstrates it's not yet obvious how to map across even with these really common terms. If the starting point was something richer, the amount of work involved could easily explode. Some kind of automation is desirable, for the benefit of someone like me in the current situation, a publisher of semantically marked-up HTML that would like their material to connect with the Linked Data Cloud, or someone writing an app that consumes data across different vocabularies. A service (or two) springs to mind: give it a term and it responds with correspondences from other vocabs, or give it a lump of data and let it offer a translation to the preferred vocab(s)/format. There are at least two approaches to implementation: SPARQL CONSTRUCT and/or RDFS/OWL inference (in both cases the use of generic superclasses/properties could be useful). The front end could offer something like the Rich Snippets Testing Tool for authors together with an open API for translation by app developers, to give a leg-up for integration/mashups. It would be nice if the good folks behind schema.org would consider throwing some resources in this direction.

See also :

Comments to G+ please


danja
2012-04-05T15:13:53+01:00
iks seki rdfa html schema.org semantic semweb rdf
Related
Comments
Edit