Live clipboard and identifying things@en

Scott Anderson (rats, what's your preferred URI?) just pointed me towards this post from Jon Udell: Exploring Live Clipboard. Jon has done a great screencast (Flash) walking through the use of a microformat - hCalendar - within the Live Clipboard setup.

But Jon identifies a problem with identifying resources; Scott suspects there may be a solution around URIs and RDF.

Yup. Or maybe not a solution as such, but a bit of theory and quite a lot of practical experience in exactly this kind of issue. The naive answer is to give everything you want to identify a unique name. Another approach is to identify things through their characteristics.

Jon's name is "Jon Udell". Can't be many of them on the planet. But there's a pretty good chance there's more than one. But he writes for InfoWorld - that's pretty certain adequate to disambiguate him. But there could well be loads of Thomas Anderson's working for Matrix Inc.

You could build an identification-by-reference like this with lots of attributes and values. But you get a whole lot of leverage by reusing existing identification schemes which are narrowly defined already: don't say the company named "InfoWorld", say the company with the URI "http://infoworld.com". You can get closer: the person with the Social Security number "IR123456". It's not a very accurate model to say that number directly identifies the person, but you can be reasonably sure that there will only be one person with that number. The Semantic Web languages have a way of expressing this - an inverse functional property. You might have statements looking like this:

_:person x:name "Thomas Anderson" .

_:person x:ss "IR123456" .

if you combine this with:

x:ss rdf:type owl:inverseFunctionalProperty .

You have your unique identifier associated with the person. I've often got mixed up with which way around things go, but the OWL Guide puts it concisely:

If a property, P, is tagged as InverseFunctional then for all x, y and z:

P(y,x) and P(z,x) implies y = z


This is exactly the approach taken in classic FOAF, and the classic writeup is Dan Brickley's Identifying things in FOAF. A person is identified through their description, parts of which may uniquely identify them (email address, homepage).

But what about the naive answer: give everything a unique name? This approach is simple on the web - give the event or whatever a URI. Not long ago Tim Berners-Lee publicly proposed doing this for people: Give yourself a URI!

It may look clunky in modelling or semiotic terms, there may be practical issues: a HTTP-dereferenceable URI is recommended - what if you forget your domain name registration? Had I done this a few years ago, his would be me. But as a pragmatic, do-it-now approach it saves a lot of thought.

This is talking about identifying people - events are a fair bit less troublesome. If there's a fairly solid reference service, e.g. Wikipedia, IMBD, use their URIs. Going pragmatic, it's a close approximation to say that the guy described on the Wiki is the same resource as the URI he gives himself identifies:

<http://www.w3.org/People/Berners-Lee/card.rdf> owl:sameAs <http://en.wikipedia.org/wiki/Tim_Berners-Lee> .

In RDF/OWL this is easy to work with, even if someone else uses a different reference point...or for that matter if the individual in question uses a different reference point:

<http://www.w3.org/People/Berners-Lee/card.n3> owl:sameAs <http://www.w3.org/People/Berners-Lee> .

Jon reckons we might need some kind of bottom-up tagging kind of agreement setup to map the event as identified at evdb.org to map to the same event at upcoming.com. Well maybe. But it would be a lot easier to exploit the work that's already been done on describing resources on the web.

For Jon's event scenario, I think using direct URIs is probably easiest for identification. In part because for things like this a sameAs mapping is probably going to be pretty easy to discover. I put Jon's work homepage into Technorati's search, chances are it'll spit out his blog URI too.

Scott included a very good question in his email, assuming RDF did have a solution -

...how this might be implemented along with a strategy for getting such a solution accepted by the developer community.

Dunno, I'm not sure at all how this could be leveraged without buying into another fairly big slice of SemWeb. Maybe identification of things on the web has to go through a whole cycle of reinvention via Web 2.0 tags etc. It'll end up in pretty the same place in a few years time, whatever, problems like how to identify things are fairly independent of any particular bunch of technology. Those who forget RDF, etc etc.

----

PS. Jon's responded to my comments, along with those of David Janes in an addendum to his post. He refers again to using folksy tags for what he's dubbed "collaborative aliasing".

That's a pretty useful notion. Dear LazyWeb , can you please implement a collaborative aliasing service. You may find the distinction between owl:sameAs and owl:equivalentClass useful, see below.

PPS. Scott follows up with a mail pointing out that the problem Jon describes is more than merely knowing the two listed events are the same, but having the clipboard usefully recognise the data provided by each service, given that they use different internal representations.

So how does this sound - wire that collaborative aliasing service up using Live Clipboard. When someone wishes to say that event A on service X is the same as event B on service Y, they paste each representation into the aliasing service. So later, when someone visits event A on service X, the representation used by service Y of this event is available. 

Anyhow as I was responding via email to Jon's update it turned into bloggishness, so here you go:

----

Hmm, tags are good for identifying collections of related things - but are they much good for individual things? (I'm rather pleased David suggested URIs too).But anyhow - why should the syntactic reasons be a blocker? URIs can be created from tags -



http://del.icio.us/danja/udell




is a tag I can use for you (only one piece tagged - nothing personal, I just don't use del.icio.us very much :-). So when I use the tag "udell" you can be pretty sure that I mean the same as that URI. It's part of my personal vocabulary.



What I think is going to be interesting in the near future is people starting to exploit the other information that's available when something's tagged. The tagging is an event at a particular point in time done by a particular person in relation to a particular resource. Although most systems currently assume everyone means the same thing when they say "chat", this doesn't have to be the case. It's certainly a really useful approximation, but sometimes it might be nice to be able to distinguish between chat=talk and chat=French cat. What's potentially more useful would be to make the equivalence between synonyms:



http://del.icio.us/freda/jonudell


and

http://del.icio.us/danja/udell




It might make sense to model this a little differently than the event-identification case. There owl:sameAs is appropriate because its talking about an individual resource (the event). For the tagging stuff, the tag (and hence the personalised URI) is associated with a set of resources - owl:equivalentClass may express this better. This does offer some other potentially useful things, e.g. it's pretty likely that:



<http://del.icio.us/danja/udell> rdfs:subClassOf

<http://del.icio.us/udell> .



Anyhow, to be able to start from a position where the extra tag info is captured is getting into semweb territory, and here's something he (Rich) prepared earlier: the Tag Ontology

@en

Danny Ayers
2006-04-04T01:40:05+02:00

Related
Comments
Edit