PS. New mailing list : Semantic Web And User Interaction Design Group, public-semweb-ui@w3.org
PPS. Ora Lassila's been getting crystalline with an RDF browser built with Lisp called "OINK" (excellent name!) :
I have just been browsing about 500,000 triples worth of enzyme, protein and gene ontology data. Way cool, even if I don't understand anything about the subject matter. :-)---
Rats, this morning I was all fired up to get cracking with work.
But I made the mistake of checking my inputs. First I notice a
comment
from
Bill deHora re.
Crystalline RDF and then a
response
from Phil Jones in my expose-RDF-as-OPML trail.
Luckily I can feed both birds with some of the same beans.
First an aperitif for Bill. I think he's got a good point about a
tendency in the SemWeb community to repeat mistakes of old AI (I
can't find his profile, but if I remember correctly Bill is an AI
veteran). In particular the assumptions that time and resources
aren't constrained, and that things that work in the lab will work
out in the real world. I don't quite follow all Bill's wording
around the 404 and OWA points, but this I can respond to:
In practical terms open world means information is not going to be rendered as usable now, because things might change. How does that help anyone make decisions?
Virtually every decision we make is based on incomplete
information: our data may be inaccurate, new information comes
along later in time, circumstances change. It's more than
usable now, we are forced to make decisions based on a
snapshot of what we know at a given point in time. The snapshot is
in effect a closed model of the world. But the acknowledgement that
there may be other information we don't know about gives the
decision-making system a robustness over time.
It's a forced example, but consider the decision to buy a
new laptop. I might put together my list of requirements (1400x1050
display) and constraints (less than $500) go to Wal-mart and see
what fits the bill. Wal-mart today is my closed world - they may or
may not have something suitable. But hardware gets cheaper/more
capable over time, Wal-mart may have a sale coming up. So an
appropriate decision might be to wait a few weeks. Alternately, if
Wal-mart don't have what I'm looking for, I might drive to another
store and see what they've got to offer. Either way, this is
changing the local model by adding more data. The cost of waiting
or driving to another store (obtaining more data) may be
prohibitive, in which case I'll either give up or make do with
800x600. But allowing for the possibility that more information may
be available offers a route to other solutions.
~
Phil's post talks at length about Dave Winer, Dave
responds
at length. I think Phil is probably right on a lot of his points
relating to Dave's stratagems, but missing important pieces of
information in his analysis of the state of the Semantic Web.
Phil tends to describe things in terms of conflict (in this
subsequent
post
he explains how he sees it as a useful analytical technique).
There's conflict implied in Bill's comment too, let me quote :
Crap like RSS and microformats can come along and wipe the floor with RDF in terms of deployments. RDF is not getting adopted en masse and RDF is as old as XML as makes no difference. Something's fundamentally wrong.Whoah! There's no denying there's a lot more RSS out there than RDF/XML. Take another look at the SemWeb stack (and leave aside the arguments over whether the structure makes sense ;-) [Damn, I was going to point to my Content Model Layer slides here, but I've not moved them from the old system]. Although in common parlance Semantic Web technologies means RDF, OWL etc, the Semantic Web is an extension of the current web: existing Web technologies are also Semantic Web technologies.
The bottom layers feature XML, Namespaces, URIs, Unicode. RSS is built on XML and URIs, microformats on all four. What's more the data in both RSS and microformats can be expressed in the RDF model. Take a look at this book review, which is a StructuredBlogging post. There may still be minor bugs, the SB stuff is relatively new, but on the surface this is a regular HTML web page. But if you view the source, you will see a few things of relevance. For a start it's got RSS and Atom autodiscovery links. These formats are mappable to the RDF model (I won't bother hunting links now, but an rss2rdfxml XSLT can be applied to this using existing online services). In the head element there are two metadata profiles: http://gmpg.org/xfn/1 and http://structuredblogging.org/profile/. This states that the body of the HTML conforms to various microformats, here there's at least XFN (the first profile) and hReview (there doesn't appear to be a profile doc in place at the second URI, that's a bug). These URIs declare that the HTML can be interpreted to yield data according to the profiles. That data can be read as RDF using GRDDL - there's a vocab for the review stuff (one of mine - that'll be 404'ing right now too) and XFN maps to FOAF. In this page there's additionally embedded structureblogging XML, along with a link to the XSLT to convert it into RDF/XML.
That's a fairly contrived example, but my point is that both RSS and microformats can be viewed as RDF. There's no conflict.
There's an obvious comeback here - where are the tools to make use of the RDF? Ok, here I'll give the totally unconvincing answer: they're on their way. But I'll back that up by saying go and look! There are plenty of programming tools for using this stuff, they just need wiring up.
Both Phil and Bill make the point that this SemWeb stuff has been talked about for a long time, yet there's very little visible on the web. But a huge amount of work has been done. Things like RSS are oriented towards human-readable content, there's a easy incremental path for getting this stuff visible - the view window in most aggregators is a HTML browser. But data browsing on the web is relatively unexplored territory. Even if there's a hugely complex RDBMS behind the website, the facilities in the browser tend to be limited. Phil suggests the problem with the SemWeb may be a "lack of a plausible strategic objective". The strategic objective is getting the world's data on the web, in a form that computers can make maximal use of.
I'd really better get some work done, but there an issue hanging there I'd better deal with. The SemWeb strategy is Web of Data/Web as Platform. Phil says:
...as the amount of machine-treatable metadata in "web 2.0" explodes over the next couple of years, the proportion that's part of the SemWeb (ie. in RDF, marked-up with URIs who's "meaning" is defined in OWL ontologies) is going to be infinitesimal.
The Semantic Web technologies are designed so that they can
handle almost any kind of machine-treatable data (not just
metadata). The microformats stuff is a good example. Every site
which publishes XFN data is publishing perfectly good Semantic Web
data about people (the usual vocabulary would be FOAF, which
includes a little OWL). WordPress includes XFN out of the box. As
more data becomes available on the web, Semantic Web technologies
will become more useful. Web 2.0 is good for the Web, what's good
for the Web is good for the Semantic Web.
Ok, one last thing. I've taken it as read here that SemWeb
technologies offer something useful. Ok, so try this: you've got
your feedlists in OPML, maybe some other attention stuff in there
too, and maybe some other OPML files offering hierarchical
categorisation of various site. You've got the RSS content+metadata
found in those feeds. You also have material like XFN. How do you
answer a question like : "show me the last day's posts about cats
from friends of Phil who aren't actually cat specialists". All the
data's there, how do you query it? I know how I'd do it. RDF offers
a way of intelligently merging/integrating diverse information into
a common data model. SPARQL offers a way of query that model.
Phil suggests OPML could in future be used for FOAFlike
material, supplanting FOAF. It might be relatively straightforward
to create a convention for describing people. But if you're wanting
to do things more interesting than display this stuff in a browser
(HTML or OPML) there are certain problems you need to solve - many
of which have already been solved in FOAF. It's too early to say
whether RDF (etc) in itself can solve the underlying big problem,
of integrating heterogenous data in a global space. But it does
greatly simplify a lot of otherwise difficult subproblems.
Phil reckons that in suggesting things like exposing OPML
from RDF apps I'm trying to make bridges between SemWeb technology
and the users who actually have the data. Yes and no. The users
have data in many forms - relational DBs probably being the
biggest. Yes, these need to be (and slowly are being) bridged to
the Semantic Web. But when it comes to content authoring and
especially "microcontent" browsing, things like feed readers are
the most developed UIs. In the same way that it's pragmatic to use
existing calendar tools (iCalendar etc) for schedule data because
of their UIs. But such tools are generally domain-specific. I'd
love a fully integrated personal knowledge manager, and existing
RDF tools offer a convincing solution to the knowledge
representation and storage aspects. But there's still a long way to
go with user interfaces. RSS and OPML are good enough for content
intended for fairly direct human consumption, and they're both easy
enough to produce from RDF systems.
Assume for a moment there wasn't the history around Dave
Winer, and that OPML and RSS 2.0 had good specs. As applications of
XML which used URIs a lot they would firmly be part of the bottom
two layers of the SemWeb stack. But the specs are weak, and the
material you find in these formats has a high level of ambiguity.
Tools like aggregators take this into account, compensating enough
that they can be useful. They deal with bad data. If you forget the
history, is that actually any worse than dealing with poor-quality
HTML? Pragmatically speaking, I don't think so. Which leaves the
man and his history. Are they really so important that we should
discount the work that's being done around the formats by other
people? I don't think so.
Formats like RSS are popular and widely deployed (and they
do have pretty good UIs). But so far they only really work as
simple transports for human-readable content with a little bit of
metadata. To go beyond that you need to find solutions to a set of
questions that have been steadily worked on by a wide community of
developers since the web came online. The fruits of those labours
so far can be found around RDF.
(Lest we forget, RSS itself originated as an early fork from
RDF).