More long-winded stuff@en

PS. New mailing list : Semantic Web And User Interaction Design Group, public-semweb-ui@w3.org

PPS. Ora Lassila's been getting crystalline with an RDF browser built with Lisp called "OINK" (excellent name!) :

I have just been browsing about 500,000 triples worth of enzyme, protein and gene ontology data. Way cool, even if I don't understand anything about the subject matter. :-)
---

Rats, this morning I was all fired up to get cracking with work. But I made the mistake of checking my inputs. First I notice a comment from Bill deHora re. Crystalline RDF and then a response from Phil Jones in my expose-RDF-as-OPML trail.



Luckily I can feed both birds with some of the same beans. First an aperitif for Bill. I think he's got a good point about a tendency in the SemWeb community to repeat mistakes of old AI (I can't find his profile, but if I remember correctly Bill is an AI veteran). In particular the assumptions that time and resources aren't constrained, and that things that work in the lab will work out in the real world. I don't quite follow all Bill's wording around the 404 and OWA points, but this I can respond to:

In practical terms open world means information is not going to be rendered as usable now, because things might change. How does that help anyone make decisions?

Virtually every decision we make is based on incomplete information: our data may be inaccurate, new information comes along later in time, circumstances change. It's more than usable now, we are forced to make decisions based on a snapshot of what we know at a given point in time. The snapshot is in effect a closed model of the world. But the acknowledgement that there may be other information we don't know about gives the decision-making system a robustness over time.



It's a forced example, but consider the decision to buy a new laptop. I might put together my list of requirements (1400x1050 display) and constraints (less than $500) go to Wal-mart and see what fits the bill. Wal-mart today is my closed world - they may or may not have something suitable. But hardware gets cheaper/more capable over time, Wal-mart may have a sale coming up. So an appropriate decision might be to wait a few weeks. Alternately, if Wal-mart don't have what I'm looking for, I might drive to another store and see what they've got to offer. Either way, this is changing the local model by adding more data. The cost of waiting or driving to another store (obtaining more data) may be prohibitive, in which case I'll either give up or make do with 800x600. But allowing for the possibility that more information may be available offers a route to other solutions.

~

Phil's post talks at length about Dave Winer, Dave responds at length. I think Phil is probably right on a lot of his points relating to Dave's stratagems, but missing important pieces of information in his analysis of the state of the Semantic Web.



Phil tends to describe things in terms of conflict (in this subsequent post he explains how he sees it as a useful analytical technique). There's conflict implied in Bill's comment too, let me quote :

Crap like RSS and microformats can come along and wipe the floor with RDF in terms of deployments. RDF is not getting adopted en masse and RDF is as old as XML as makes no difference. Something's fundamentally wrong.
Whoah! There's no denying there's a lot more RSS out there than RDF/XML. Take another look at the SemWeb stack (and leave aside the arguments over whether the structure makes sense ;-) [Damn, I was going to point to my Content Model Layer slides here, but I've not moved them from the old system]. Although in common parlance Semantic Web technologies means RDF, OWL etc, the Semantic Web is an extension of the current web: existing Web technologies are also Semantic Web technologies.



The bottom layers feature XML, Namespaces, URIs, Unicode. RSS is built on XML and URIs, microformats on all four. What's more the data in both RSS and microformats can be expressed in the RDF model. Take a look at this book review, which is a StructuredBlogging post. There may still be minor bugs, the SB stuff is relatively new, but on the surface this is a regular HTML web page. But if you view the source, you will see a few things of relevance. For a start it's got RSS and Atom autodiscovery links. These formats are mappable to the RDF model (I won't bother hunting links now, but an rss2rdfxml XSLT can be applied to this using existing online services). In the head element there are two metadata profiles: http://gmpg.org/xfn/1 and http://structuredblogging.org/profile/. This states that the body of the HTML conforms to various microformats, here there's at least XFN (the first profile) and hReview (there doesn't appear to be a profile doc in place at the second URI, that's a bug). These URIs declare that the HTML can be interpreted to yield data according to the profiles. That data can be read as RDF using GRDDL - there's a vocab for the review stuff (one of mine - that'll be 404'ing right now too) and XFN maps to FOAF. In this page there's additionally embedded structureblogging XML, along with a link to the XSLT to convert it into RDF/XML.



That's a fairly contrived example, but my point is that both RSS and microformats can be viewed as RDF. There's no conflict.



There's an obvious comeback here - where are the tools to make use of the RDF? Ok, here I'll give the totally unconvincing answer: they're on their way. But I'll back that up by saying go and look! There are plenty of programming tools for using this stuff, they just need wiring up.



Both Phil and Bill make the point that this SemWeb stuff has been talked about for a long time, yet there's very little visible on the web. But a huge amount of work has been done. Things like RSS are oriented towards human-readable content, there's a easy incremental path for getting this stuff visible - the view window in most aggregators is a HTML browser. But data browsing on the web is relatively unexplored territory. Even if there's a hugely complex RDBMS behind the website, the facilities in the browser tend to be limited. Phil suggests the problem with the SemWeb may be a "lack of a plausible strategic objective". The strategic objective is getting the world's data on the web, in a form that computers can make maximal use of.



I'd really better get some work done, but there an issue hanging there I'd better deal with. The SemWeb strategy is Web of Data/Web as Platform. Phil says:
...as the amount of machine-treatable metadata in "web 2.0" explodes over the next couple of years, the proportion that's part of the SemWeb (ie. in RDF, marked-up with URIs who's "meaning" is defined in OWL ontologies) is going to be infinitesimal.

The Semantic Web technologies are designed so that they can handle almost any kind of machine-treatable data (not just metadata). The microformats stuff is a good example. Every site which publishes XFN data is publishing perfectly good Semantic Web data about people (the usual vocabulary would be FOAF, which includes a little OWL). WordPress includes XFN out of the box. As more data becomes available on the web, Semantic Web technologies will become more useful. Web 2.0 is good for the Web, what's good for the Web is good for the Semantic Web.



Ok, one last thing. I've taken it as read here that SemWeb technologies offer something useful. Ok, so try this: you've got your feedlists in OPML, maybe some other attention stuff in there too, and maybe some other OPML files offering hierarchical categorisation of various site. You've got the RSS content+metadata found in those feeds. You also have material like XFN. How do you answer a question like : "show me the last day's posts about cats from friends of Phil who aren't actually cat specialists". All the data's there, how do you query it? I know how I'd do it. RDF offers a way of intelligently merging/integrating diverse information into a common data model. SPARQL offers a way of query that model.



Phil suggests OPML could in future be used for FOAFlike material, supplanting FOAF. It might be relatively straightforward to create a convention for describing people. But if you're wanting to do things more interesting than display this stuff in a browser (HTML or OPML) there are certain problems you need to solve - many of which have already been solved in FOAF. It's too early to say whether RDF (etc) in itself can solve the underlying big problem, of integrating heterogenous data in a global space. But it does greatly simplify a lot of otherwise difficult subproblems.



Phil reckons that in suggesting things like exposing OPML from RDF apps I'm trying to make bridges between SemWeb technology and the users who actually have the data. Yes and no. The users have data in many forms - relational DBs probably being the biggest. Yes, these need to be (and slowly are being) bridged to the Semantic Web. But when it comes to content authoring and especially "microcontent" browsing, things like feed readers are the most developed UIs. In the same way that it's pragmatic to use existing calendar tools (iCalendar etc) for schedule data because of their UIs. But such tools are generally domain-specific. I'd love a fully integrated personal knowledge manager, and existing RDF tools offer a convincing solution to the knowledge representation and storage aspects. But there's still a long way to go with user interfaces. RSS and OPML are good enough for content intended for fairly direct human consumption, and they're both easy enough to produce from RDF systems.



Assume for a moment there wasn't the history around Dave Winer, and that OPML and RSS 2.0 had good specs. As applications of XML which used URIs a lot they would firmly be part of the bottom two layers of the SemWeb stack. But the specs are weak, and the material you find in these formats has a high level of ambiguity. Tools like aggregators take this into account, compensating enough that they can be useful. They deal with bad data. If you forget the history, is that actually any worse than dealing with poor-quality HTML? Pragmatically speaking, I don't think so. Which leaves the man and his history. Are they really so important that we should discount the work that's being done around the formats by other people? I don't think so.



Formats like RSS are popular and widely deployed (and they do have pretty good UIs). But so far they only really work as simple transports for human-readable content with a little bit of metadata. To go beyond that you need to find solutions to a set of questions that have been steadily worked on by a wide community of developers since the web came online. The fruits of those labours so far can be found around RDF.

(Lest we forget, RSS itself originated as an early fork from RDF).

@en

Danny Ayers
2006-03-21T15:14:56+01:00

Related
Comments
Edit