What various smarty-pants individuals have noticed is that there's low-hanging fruit in these parts. Every single site on the Web is already exposing an API. You get the pages by doing a GET over the HTTP protocol. What you see in the browser is human-readable info. But a considerable proportion of these sites are actually delivering machine-readable data, only it's getting obfuscated in the publishing process. The stuff from the MySQL DB behind the scenes of the blog, railway timetable, photo collection is fluffed out into HTML to make it human-readable, but along the way loses any direct hooks into the raw data.
Enter Microformats (µFs) and Structured Blogging (SB). The basic idea is simple - keep publishing that human-readable content, just make it possible to read the data back out of the HTML. The two initiatives are addressing different aspects of the problem. uFs are mostly about getting the formats right. This involves minimal invention, just developing conventions based on existing specifications and generally recognised good practice. SB is mostly about adapting existing tools (WordPress, Drupal etc) to allow the content author, with minimal effort, to publish material that works not only as human-readable content but machine-readable data. SB's approach is two-pronged, they're using µFs plus embedded XML.
What these initiatives have in common is that they ask little of their target audiences: the developer of publishing tools will already be using HTML/CSS, microformat support only means minor (additive) changes to their templates. The blogger who e.g. reviews books will only be entering the same set of information in an SB tool as they would in a regular blogging tool. The initiatives ask little; but they give more back. Because they're conventions based on best practices, any user of µFs can be reasonably confident that they're following best practices. Ok, things like standards-compliance, accessibility and so on are low on the list of priorities for many web publishers. They want something more tangible. Remember Zen Garden? One set of XHTML conventions, numerous (many stunning) pluggable CSS skins. Microformats offer the potential for Zen Garden for blogs. What the SB tool user gets is - well, a tool designed for creating the kind of content they want to create and displaying it in a nice fashion.
There are plenty of skeptics of µFs, Norm's analysis is a good one. similar points have been made by Uche - if you want data in a machine processable XML format (like microformats), then rather than force it into XHTML, just use a straight XML format. If I remember correctly, in past discussions Shelley has expressed the opinion that you might as well just expose RDF/XML as embedding data in the HTML (but she has also worked on the SB systems).
My opinion in short is that the µFs and SB initiatives are very positive. More data on the Web? Bring it on. This is a huge plus for us SemWeb types. The fact that these groups are working together rather than having format food-fights makes a good role model.
Here's an interesting thing: following a link from Alex Barnett's coverage, there's the Maimed Leech (!) talking about Talking about microformats and WinFS. The Leech asked the following questions at the BoF, got answers from Tantek:
1. How do you get people to agree on the schemas?
Derive them from current practice on the web rather then inventing them from scratch (paraphrased).
2. How do you handle the need for both a rich schema for "high end" applications and a simple schema for "low end" applications?
Both will exist (paraphrased).
Bravo Tantek. The second is absolutely consistent with the Semantic Web approach, a slight refactoring of the first gives the the corresponding SemWeb one there: where possible derive your schemas/ontologies from current practice rather then inventing them from scratch (but it's not the end of the world if you make everything up, you can stitch vocabularies together afterwards). This includes modelling that is currently off-web (I doubt anyone's done the many-thousand term models from biochemistry as straight HTML).
Enron Wright picks up on this with
reference
to schema modularisation in this context. Indeed, when expressing
data in XHTML modular profiles work. But for the model behind the
formats: RDF, RDF schema (and OWL) works. The similarity between
WinFS and the RDF model has come up before. These posts have me
laughing out loud - in an entirely positive, life is sweet way.
Analogy time.
The pilgrims know the right general direction to their goal.
But to get there, they have to find their way through a dense wood.
Some paths have been cleared. The pilgrim will always follow a path
in the right general direction as the goal. But from any given
place, it's not that obvious which are the good paths and which are
paths made by the wild animals of the wood which lead into
impassible brambles. These paths can be paved over, but it's not
quite a cowpath thing. Because everyone's heading in the same
general direction, it's often possible to rejoin a clearer route,
worst case it's possible to backtrack a little to get on a better
path.
The goal of the pilgrims is the future Web in all its
data-enhanced glory. RSS, even OPML are paths heading in the right
general direction. But the Big Dog formats and things like WinFS
are off the clearest paths - the former lack the solidity of decent
specifications, but there's a clear path nearby with Atom and
microformats. Near WinFS the clear path is RDF - the same basic
model but with hooks to the current Web (URIs). So it's nice to
have standards-bearing initiatives like microformats and Structured
Blogging to highlight the clear paths. Even people that have been
late to the party in the past are noticing. Even
Bill
Gates has noticed.
I've been getting some really good comments on the blog
recently, I must find a way of making them more visible. Here's a
part of
one
from
Richard Cyganiak which is
apposite:
The [SemWeb] tools are still far from a state where they are usable for "non-believers." I still can't download MyRDF anywhere. Sure, just let the tools mature for another few years ... but till then, something will have grown out of the OPML/RSS/JSON/RoR frenzy, something that crudely solves 80% of the SemWeb problems with 20% of the cost. We'll scream "But RDF did this ten years ago!" and will be ignored, and we'll retool once more. Personally I'd rather work with RDF Lite than with some OPML/Javascript mutant.
As I seem to be saying a lot these days, I don't really
disagree. At least about the observations. But the prognostication
seems to assume an isolated, passive RDF community. If the LazyWeb
doesn't deliver kit like
MyRDF soon (which sounds in a similar space as
Semantic
Web in a Box -
Ontogon
is getting close there) I'll build the bloody things myself. From
the existing tools this will only need a little GUI work and a bit
of packaging.
Things are growing out of the OPML/RSS/JSON/RoR frenzy, many
of which conceptually (if not in implementation) could be viewed as
RDF Lite. I'll nitpick here a second - there's a qualitative
difference. JSON and RoR are potentially very useful with RDF Full
(See e.g. Kendall's
Semantic
Rails post, Elias
calls
SPARQL+JSON a "killer combination"). There are things still to
figure
out around using these technologies with SemWeb tech, but
there's no fundamental mismatch. There is something of a mismatch
with RSS and OPML in that they're littered with ambiguities.
Yep, it can be frustrating when folks like
Adam
Green favour tech like OPML over e.g. XHTML and/or RDF. The
biggest discomfort is that this approach will show immediate gains.
From a standing start it's usually necessary to get a little speed
up before you can really enjoy the advantages of RDF. With the
"simple", naive quasi-XML formats it's quicker to get results. But
however we may argue amongst ourselves, the primary reason people
like myself, Richard and Shelley would generally favour tech like
RDF (and likely XHTML and Atom) over less capable technologies is
because the latter are (*dramatic pause*)
less capable. They may get some immediate results, but
that doesn't make the problems go away.
During the "another few years" that Richard suggests the folks
building on naive quasi-XML will be devoting their time to hacking
through the undergrowth: working around spec shortcomings, findings
answers to questions, many of which have long since been figured
out around RDF (and better-practice XML). They won't be able to
solve 80% of the SemWeb problems by magic - it's taken a good few
years work around RDF.
Yep, it's taken a long time. But here's a thing - RDF tools
are rapidly approaching the stage where immediate results
can be as little effort as with naive quasi-XML. Hare meet
Turtle. He's
like the tortoise, but Turtle has super-turbo-powers.
To date the RDF community has been relatively isolated -
mostly because the benefits, even the technical goals haven't been
obvious. But the Web 2.0 thing has opened a lot of eyes to the
potential of data on the Web, and community-wise there are now
whole
scientific
disciplines turning to SemWeb tech to help solve problems that
have proven difficult by other means. With all due respect to
developers of content management tools, and certainly without
wishing to devalue it, there's more to life than the blogosphere.
Software can help across virtually every aspect of civilization:
science, the arts, journalism, developer diaries, cat photos. Long
term I've no doubt an integrated system will emerge. The shortest
path right now is the Semantic Web.
Speaking of paths, sometime soon I must have another look at
the
FOAF co-depiction
stuff. Alex has a (not very flattering)
photo
of Marc and Tantek. I've yet to meet Marc, and I don't think
there's a photo anywhere featuring Tantek and I. But I do have a
rather cool co-depiction path to them both through this photo of
Rohit and
I (not very flattering for me, but I bet that's the case for
most people pictured alongside Rohit ;-) and onto this
microformats
panel.