Everyone has a Graph Store

Try this thought experiment.

For practical purposes we often assume that everyone has a computer, a reasonable Internet connection and a modern Web browser. We know it's an inaccurate assumption, but it provides conceptual targets for technology in terms of people and environment.

Ok, now add to that list a Graph Store: a flexible database to which information can easily be added, and which can be easily queried. The data can also be easily shared over the Cloud. The data is available for any applications that might want to use it. The database is schemaless, agnostic about what you put in it: the data could be about contacts, descriptions of people & their relationships (i.e. a Social Graph), it could be about places or events, products, technical information, whatever. It can contain private information, it can contain information that you're happy to share. You control your own store and can let other people access as much or as little of its contents as you like (which they can do easily over the cloud). You can access other people's store in the same way, according to their preferences. It's both a Personal Knowledgebase and a Federated Public Knowledgebase.

So, make the assumption: everyone has a Graph Store. Now what do you want to do with yours? What can your friends and colleagues do with theirs? How can you use other peoples information to improve your quality of life, and vice versa? What new tools can be developed to help them take advantage of their stores? How can you get rich quick on this? What other questions are there..?

Note that if everyone has a Graph Store, for free they automatically get the value-add of the linked data cloud.

Ok, I'm presenting this as a thought experiment, but we pretty much already have all the necessary tools and infrastructure for it to be reality. They aren't generally packaged up in a form that's user-friendly, but that part is becoming increasingly trivial (see below). If you want to run such a store on a local machine there are masses of alternatives - to pick the first three that come to mind there's 4Store, Fuseki and Stardog. If you have a server or other kind of cloudspace available then tools like these are an option there too. For an enterprise kind of environment you probably should look at OpenLink Virtuoso. If you want to leave everything to the cloud, there's Kasabi - note their free hosting option. (I can't remember offhand what other hosted cloud-based options are available, I'm pretty sure there are a few others but a quick search only yielded Dydra which is currently in private beta - please ping me if you know of others...or set up your own :)

The reason I'm prompted to post this now is because of a couple of projects I've had on the go for a while. One (Scute) is an attempt to make my hacking with RDF easier - it's essentially a glorified text editor with a bit of HTTP clientness built in. The other (Seki) was started as a demo more or less to show how a triplestore could be used as a general-purpose read/write Web server, supporting content as well as data. Neither of these is remotely mature enough for proper reuse (Scute has become bloaty/buggy and Seki doesn't do much yet, both are lacking tests and documentation, work in progress innit). But what I found interesting was that although they are approaching semweb tech from a very different direction, there's some definite convergence going on. That convergence is more or less around what I was calling the Semantic Web in a Box (SWIB) a few years ago (jeez, 2006 - tempus fuggits).

The thing is, although this Web stuff does evolve gradually over time, there are also developments that are in effect big steps forward. In the context of the Semantic Web there was the publication of the 2004 specs (solidifying the material that came before), the development of SPARQL (allowing loosely-coupled access to triplestores) and the perspective shift that the notion of linked data offers (bringing the Web back into the Semantic Web). That's not mention the initiatives that have appeared outside semweb cognoscenti circles - things like schema.org.

I reckon SPARQL 1.1 is another big step. Yes, we already knew how to write to the Web with good old RESTful HTTP. But SPARQL Update, Graph Store Protocol etc. offer a standard, loosely-coupled way of writing to triplestores. Ok, a purist may point out that a lot of this stuff isn't RESTful, hence isn't truly Webby. But that doesn't matter - it completes the decoupling of the backend layer (arguably, paradoxically, disintermediating the layers) making it possible to commodify that layer and allow middleware to use generic interfaces, plugging in to any store at one end and potentially any client at the other.

This means the SWIB idea just got a whole lot easier. All it needs to be at heart is a triplestore which supports read/write SPARQL. As noted above, these are already available. I do think the packaging could be improved, to totally minimise the installation effort. One click to download, one click to install, another click to run. A bit of shiny GUI is also desirable, not only to make things easier that the default HTML form for endpoint access but also to reduce the surprise to the end user. It should look a lot more like familiar tools - ideally including something general-purpose (think Microsoft Access) and one or two domain-specific apps (FOAFish contacts/social net client is an obvious one, taking advantage of recent developments a Rich Snippets aware bookmarking app might be nice). A little configuration tool would be good to have too, not everyone is comfortable editing exotically-formatted text files.

Of course it would make me very happy if someone else put a SWIB together like this, dear lazyweb, as it'll probably take me another 6 years to get it together myself. But irrespective of what I say or do on the matter the personal/shared graph store is such a gaping niche that it's bound to happen in some form pretty soon anyway. Whatever, the current absence of "everyone has a graph store" is a conceptual block to imagining the possibilities. So try assuming this is already a done deal.

Comments to G+ please


danja
2012-02-26T15:02:58+01:00
swib federated semweb rdf
Related
Comments
Edit

Social nets and shared objects

Just checked back on the geek pop video I put up on Tuesday: 111 hits, 4 likes, 1 dislike - heh, satisfactory ratio.

I don't have the energy for advocacy and am not really interested in marketing, but it did get me wondering how you would actually target an audience in this day and age - talking to the right people is efficient communication, right? Clearly folks like Google believe they can target arbitrary demographics with their advertising, identifying the appropriate audience through analysis of user behaviour. Done accurately, it's no longer advertising as such but more about making a connection between some kind of provider and a willing recipient.

In this specific case, the primary target would really be perhaps a person who uses a computer a lot, but only has a minor interest in dev, if any. They probably get most of their desktop software through regular commercial channels, supplemented by dodgy copies of things from their friends. It would be in the interests of this person to know about open source if only in the sense of better software for free. But most of the people reading this will be a hop or two removed from that demographic. Exaggerating for effect, the Open Source Circle has no intersection with the Regular User Circle. How do you find paths through? Ok, maybe there's one that goes [open source user] - [open source geek] - [.net geek] - [MS Windows user]. Yeah, (social) graph problems.

There's potential around communities of interest. Again in this particular case a graphic designer that normally uses Photoshop may be in contact with a Gimp user.

There's an aspect of this I reckon is still really virgin territory, ripe for colonization: I'm sure I've heard better terms but call it "shared objects". My guitar is of generic type Stratocaster, so if someone else has a guitar is of generic type Stratocaster there's a very good chance we've got other things in common. It's close to what Amazon already does around recommendations, but I reckon it could be done a whole lot smarter and in a way that's more broadly useful. It's a Semantic Web/Linked Data idea that's also entirely in scope for schema.org and RDFa/microdata work.

Uldis Bojars did some work around the "shared objects" thing a year or two back, I must pester him again for references.

Comments to G+ please


danja
2012-02-17T13:42:18+01:00
federated social semweb rdf graph
Related
Comments
Edit

Small Data

I'd just like to plant a little flag in the sand. Big Data seems to be the flavour of the month (and is undeniably extremely useful and interesting), but I've a gut feeling that might be symptomatic of not seeing the wood for the trees (or maybe vice versa).

I've not thought this through much, but surely any trends/correlations/relationships that are important enough to be of interest should be detectable without having to build a terabyte+ store? Rather that trying to capture as much raw data as possible up front, I suspect a more productive approach long-term will be to work with (maybe federated) crawler farms, with lots and lots of algorithms running in parallel over what they see. If there are appropriate training feedback loops in place, the shape of algorithms themselves could be treated as the results of the analysis.

It could be argued that once you have accumulated a corpus of raw data you can subsequently throw whatever you like at it without having to get the raw data again. But that corpus will never be complete or truly fresh - as new data appears on the Web all the time. More critically, under normal circustances you can never be sure you've got a dataset that contains a good sample representation covering whatever unknowns you're exploring. But crawlers can be directed to favour slices of the Web that contain information relevant to your hypotheses.

So, in the context of the Web, the Web itself should be the only big data needed. Which gives a neat parallel in the other sciences: reality itself is the only database you'll ever need :)

Ok, in the same way that Big Sites (like Wikipedia/dbPedia) adds big value to the Web alongside lots of small pieces, loosely joined, the same no doubt goes for Big Data. But let's not forget the vice versa, a complementary Small Data approach.

Somewhat orthogonal to this, one way in which the Web is a game changer for data is that here the relationship between pieces of data (/documents) is at least as significant as those pieces of data stacked on top of each other. Link Rank is a special case, an aggregated, flattened view of link value. If topics and entities (i.e. thing in general, people, places, concepts etc) and their interrelationships are inferred and/or explicitly named, it should expose some interesting facets of how human knowledge works.

Comment to G+ please.


danja
2012-01-30T10:04:06+01:00
algorithms federated ai science rdf data
Related
Comments
Edit

A Role Model of Consciousness

Past few weeks I've been on pause, my head not working properly. Finally got around to seeing doctor yesterday, now waiting for antidepressants to take effect. I haven't totally wasted my disconnected time, watched a lot of stuff. Including a Midsomer, a couple of Bargain Hunts and a geeky-great vid on poker bots (have I said I really like Berlin? This is a Chaos Communication Camp production, wonderful material). Simulating an actual poker player is really hard, but it got me thinking about the similarly hard problem of what consciousness is, appropriately mental for my state of mind.

Caveat, I'm not up to date on theories in psychology or even AI. Last big thing I read anywhere near this was a lay-reader book I think with "Intelligence" in the title, about what humans are really good at is predicting the future - pretty good hypothesis IMHO. Maybe someone can enlighten me about current thought (I'll cc Planet RDF). But the thing that has been on my mind is more old-school, the internal model bit I think was popular around the 17th century, gone downhill since. Although it may well be rubbish as human stuff, something makes me imagine it might be worth thinking about for machine stuff. I really like the agent metaphor.

Ok, generation 0, we have an agent (A) in a universe (U), and it just sits there. It's a rock. It's surrounded by other agents (which might also be rocks).

a blob in a universe

Generation 1, we have an agent capable of interacting with the environment, but its interactions are pretty minimal, starting somewhere around a pebble on a beach that has a wander with each tide up to a living creature that has built-in stimulus-response maps along with learnt ones. Kinda Behaviourist. I'm starting with the pebble because interaction with the environment can take a lot of forms, and there's quite a history from at least the Neolithic of generally anthropomorphic agency views of facets of the environment (weather etc) through the Bronze Age deities up to the modern-day religious mythologies.

a blob interacting with environment

Generation 2 we approach the Enlightenment and/or Smalltalk. The agent in question has an internal model of the universe containing the agents outside.

a blob with an internal model

On generation 3 we come to the bit that I'll call novel until someone points to an 18th century philosopher who already suggested this. The agent in question has had all its sensors and actuators geared up to the outside world for a while, as well as sensors (and actuators) connected internally. By the mechanisms of Intelligent Design, Natural Selection and copy, paste and tweak a bit, it notices parallels between interactions with the external agents and interactions with itself. It develops a sense of self as another model very similar to the models it has for external agents. Here's the novelty - first the agent becomes aware of external agencies, only then by analogy it becomes aware of itself.

a blob including a model of itself

Like all the great (as in most entertaining) theories this is of course unverifiable. But I like the notion that the local stuff only appears after some level of comprehension of the remote stuff, feels like it might be useful somehow.

Comments to the big G+


danja
2011-10-15T20:59:10+01:00
mind intelligence psychology federated ai mad model rdf
Related
Comments
Edit

Sell Out

A couple of days ago I got another mail from someone wanting to put links here to their client's. Unusually this seemed written by a human, so I didn't immediately bin it. Insert our links in your old posts and we'll give you some dollars (the figure I think was $50 a link), and the targets will be either relevant and/or to educational resources. Given that I'm in the red right now, and given my recent amount of enthusiasm for paid work, I said ok, bring it on.

It would have been better if I'd been able to do Sebastian Trüg's approach, having a real project to which to donate, but bugger it, I've added a donate button to this blog. Now go visit my sponsor.


danja
2011-10-06T20:48:17+01:00
federated money rdf
Related
Comments
Edit

A couple of days ago I got another mail from someone wanting to put links here to their client's. Unusually this seemed written by a human, so I didn't immediately bin it. Insert our links in your old posts and we'll give you some dollars (the figure I think was $50 a link), and the targets will be either relevant and/or to educational resources. Given that I'm in the red right now, and given my recent amount of enthusiasm for paid work, I said ok, bring it on.

It would have been better if I'd been able to do Sebastian Trüg's approach, having a real project to which to donate, but bugger it, I've added a donate button to this blog. Now go visit my sponsor.


danja
2011-10-06T20:47:55+01:00
federated money rdf
Related
Comments
Edit

Check Sums

I was offline last week when the news broke that CERN folks announced they'd found a discrepancy between the assumed speed limit of the universe and the way their neutrinos appeared to behave, 20 parts per million. That's a pretty big anomaly when you consider dogs can detect salami in 9 parts per billion of the kitchen (that paper will be published once I've got 99 other co-signaturies who don't mind their crotches being sniffed). I was offline because I was feeling pretty crap after a boozy weekend, lightweight compared to previous exploits but after the hangover had passed I was left in an ultra-violet funk.

Incidentally, for a few days, going to sleep I wound up picking a random, unloaded word that flashed by on my Cartesian plasma screen, "mink", repeating it as a voiceover in said theatre as a mantra to keep demons at bay. I have since rationalised the word - it's a potential HTML5 rel value to correspond to URIQA's MGET. But that's by-the-by.

The too-fast neutrinos went from CERN to Gran Sasso. After dopplering my funk, I was curious about the constant thing. I knew where CERN was (because I watched The Champions as a child) but though I'd heard of Gran Sasso, couldn't place it. As any good mental illnessity goes, my funk featured a good proportion of guilt (getting sweary on social networks leaves you a bit shamefaced).

Now looking on the map my funk shifted back up the spectrum, if you draw a line on the globe from CERN to Gran Sasso it goes straight through this house. Those faster-than-light neutrons came through here (ok, a little underground, but I do leave my empties in the cantina). So how's that for something to feel guilty about - screwing up the model of the universe..?

Which is why I can be sure they got their sums wrong. My empties would have slowed them down. You're probably 40 parts per million out guys.


danja
2011-10-04T21:50:35+01:00
federated cern physics rdf c neutrinos light
Related
Comments
Edit

stupid computers

Train of thought. The world imagined by machines won't ever be a direct reflection of the world experienced by animals. But that's not a bad starting point, go all Plato and have the computers as the shadowplaw. Maybe the current generation of computers aren't capable of doing the 3D of a child's first discovery of a 4-leaf clover. They will though, probably in my lifetime. But there's the map/shadow, and there's stuff we can do well in this world, stuff that the machines are good at. A virtual reality with rules that are consistent with this side, but take advantage of that side.

Perhaps I'm getting a little too excited about being back online again.


danja
2011-09-21T23:38:47+01:00
federated rdf
Related
Comments
Edit

Speed

A passing observation. It's bloody slow this Web thing. I have terrible wire bandwidth here, but it isn't that that is the bottleneck. Me, ask anyone, might be smart but he's slow witted. Not as slow as this thing.

Picture a couple of people that know each other fairly well, have a spoken language in common. Say they've been out and are trying to figure out the best way of getting home. Bang bang bang bang, the ideas will flow. The Interwebs know the best way to get a taxi, walk or bus. Augmented by the smartphone. But it don't quite work. Computers + data = knowledge. Not.

Even if this machine in front had better than human standard AI, it would still be slow and useless right now compared to a (stupid) talking human. We are missing bits we need to take advantage of the technology. The back end seems to function well, the front end seems like it's as good as it gets. So why do these things behave as if they are slow and stupid?

Passing observation, I honestly don't know. But I feel we should be able to find out. How? Dunno.


danja
2011-09-21T23:24:15+01:00
federated rdf
Related
Comments
Edit

RDF, where art though

In comments on a post on G+ I said something I might regret:

"There are plenty of RDF-based applications around, but none really have much broad public appeal."

Ade Oshineye responded with "why do you think that is?"

Ok, overnight I remembered there's at least one app (or set of apps if you prefer) that uses RDF and has a lot of adoption: Drupal. According to Wikipedia it's used on at least 1.5% of Web sites worldwide, and has RDF in its core. Then there's data.gov.uk, a public-facing national government site that's RDF through-and-through. I'm a little out of touch, there are no doubt quite a few other good examples of where I'm wrong.

But given that RDF has been around for 5 years*, it's the way of doing data on the Web and virtually every Web-oriented app uses data somewhere, why isn't it ubiquitous?

(* solid specs came out in 2004 although SPARQL wasn't until 2008 so I'm splitting the difference for a rough date for when it became usable)

RDF isn't something that's going to be in your face anyway, so "broad public appeal" is slightly off-target. Developer adoption may be a better key. Whadever.

In terms of it as a database tech, compared to relational DBs (MySQL etc), custom data handling (Twitter uses Ruby message queues), novel DBs (Facebook uses a key-value store Cassandra apparently) RDF stores don't get much of a look-in. Ok, arguably the big scale things need to be custom to hone performance, but why, alongside the Big Data handling, don't we see RDF augmentation?

For consuming apps and desktop apps, I can't actually think of any well-known ones off the top of my head (I think quite a few of the music apps on Linux use librdf under the covers). I don't have a mobile device - any iPhone apps?

What I find a little bizarre (and please give me counter-examples), is that in the areas where RDF really shines - Web-oriented data integration and reuse - there are hardly any well-known apps out there at all, using any technology. There are a handful of feed aggregators and things like techmeme, but the level of integration there is pretty trivial. (Before Kingsley jumps down my throat - OpenLink Virtuoso is seriously good at this kind of stuff out of the box - but what I'm after is where these things are being used by twitter-sized demographics).

There's certainly something to what Lee Feigenbaum said the other day, the wrong question is usually asked, it should be: What can I do with Semantic Web technologies that I wouldn't do otherwise?

In terms of app-building, right now most parts of most things can be built relatively easily using other technologies, so unless the RDF stack is part of the developer's on-hand toolkit (like e.g. LAMP) it won't be first choice. I do suspect that while the false perception that RDF is complex per se isn't so prevalent these days, there's still a notion around that RDF is complex for the benefits it offers. i.e. linked data isn't perceived as a significant value-add, so why bother? The primary objectives can be acheived by pushing around little JSON objects ("jobbies"?) in a fairly arbitrary fashion, so why look further? But data on the Web surely isn't a niche thing...

Feel free to shoot me down in flames from all angles over this one (I'm not interested in advocacy here so don't care if I expose the wrong message) - I also suspect there's still something in the idea that people simply don't get it. While developers seem to have no problem representing pretty much anything in local databases, the idea that anything can be represented on the Web in a similar way hasn't been grasped. I reckon there's good evidence in virtually every high-profile project. Things tends to be focused on HTML (with a little Javascript) and the browser experience. For service-oriented systems the unwritten assumption is that the services will tie into the same view. I'm certainly not saying that this focus is wrong (those user-facing components are vital), just that it can lead to a blinkered view of what is possible. Only relatively recently have developers at large started looking at things like the identity of people on the Web. You still don't see the same attention given to everything else in the world - products, ideas, activities. Ok, you might point to activity streams and the like, but the subject of those activities still largely tends to be doc-oriented: messages or posts. You might point to schema.org and microdata as ways in which people in the Web development community can put data on the Web. But scratch the surface and the main goals underneath are things like SEO, most of the data being expressed is document metadata, not data about the real world. (Next time you go shopping, notice your interactions with the world from finding your car keys onwards, compare and contrast with the Amazon experience.)

The other day I posted a question on G+ that probably should have gone here: All the necessary components were in place for online social networks, in a distributed form, before Facebook & co. came along: blogs, aggregators, the various protocols. So why were Facebook & co. so successful? (got some good comments there, and was very pleased to find out Andreas Kuckartz is researching the question)

The question of data on the Web seems to lie in a similar socio-politico-technical morass. On federation, I'm afraid I'm inclined to agree with Eric Siegel : "I predict decentralization is inevitable, but its very very far away." I feel pretty much the same about the Web of data, though perhaps not so far away (unless I'm confusing small and far away :)

[ooh - a good point on that from Seb Paquet I'd missed before: The folks who grokked decentralization didn't master social experience design and UI design as well as Zuck, and decentralized infrastructure is harder to monetize so getting funding was difficult.]

One final question dedicated to folks on Planet RDF, from danbri in response to (the Facebook re-presentation of) my post yesterday:

If RDF is so great, we should all be rich by now? :)

Another quote, it must have some relevance - via the BBC, from Sir William Preece chief engineer of the British Post Office in 1876: "The Americans have need of the telephone, but we do not. We have plenty of messenger boys."

Still no system here yet, comments to G+ again.


danja
2011-09-17T13:52:14+01:00
federated semweb rdf
Related
Comments
Edit

Plan B - RDF for fun and profit

Last night, after finding out that part of the G+ API had gone public I skimmed their docs and the docs of some of the specs they draw on: Portable Contacts, Activity Streams and OAuth 2.0. Of course it's great that G+ is exposing an API, and great that they're drawing on existing standards. But after looking at those standards I came away shaking my head, feeling rather discouraged. Again and again they contain data expressed use JSON mappings like "kind": "plus#person" (G+ API) and "objectType" : "person" (Activity Streams) and "" (Portable Contacts assumes that if you've got data you're looking at contacts). Aside from the variation in the naming across these, there's a common theme, the assumption that a simple token (like "person") is adequate for definition of something on the Web. How do you know that their definition of "person" is compatible with your system's definition of "person"? Sure, there are the spec docs to back them up, but how do you get from the data to the spec docs? Ok, there's openness in the publication and dev of these specs and standardization to the extent that they're high-profile enough that vendors like Google will see them and adopt them. But in their technical detail they have more in common with pre-Web, offline proprietary formats - "person" means person because we say so, and everybody knows what we mean.

Digging a bit deeper there's reference to the Discovery Protocol Stack which draws on XRD (the OASIS spec for describing resources) and Web Linking (RFC 5988 for defining typed links). Here there's more of an attempt to make the stuff Web-friendly, entities (resources) and relations (links) are identified with URLs so Web-based discovery of further information is in principle possible. But the "One True Ontology" registry-based approach of Web Linking is questionable in a distributed environment (and comparable to schema.org).

The description of things using schema like "kind": "plus#person" looks like what RDF does, except rather than using a Web-based approach to naming (so you could derive a URL from "plus#person", look it up and find out what it means) instead we see ad hoc token-based naming schemes. With Web Linking we have something that corresponds exactly with RDF properties (they are typed links), and if you can look things up in a registry then that's a step in the right direction. We already use registries to decode the meaning of terms in other major vocabularies - e.g. the HTTP media types through which HTML is delivered lead you to the definitions of terms like "strong" in the relevant specs. But is a registry appropriate for every term we're ever going to use? Does a word like "strong" only have one meaning?

Ok, so far there's a phrase which sums up all this: Cargo Cult RDF

But the theory is that grassroots, use case-driven development will tend to create cowpaths in the environmnent, and all standards orgs have to do is pave these. Except it doesn't seem to quite work that way. On the one hand we have the XKCD Standards effect (check the first paragraph on the Portable Contacts page), on the other hand the simple fact that, even with the best will in the world and with good information, people often get things wrong. Take for example:

OAuth [1.0] aims to unify the experience and implementation of delegated web service authentication into a single, community-driven protocol.

[time passes]

OAuth 2.0 is a completely new protocol and is not backwards compatible with previous versions....As more sites started using OAuth, especially Twitter, developers realized that the single flow offered by OAuth was very limited and often produced poor user experiences...OAuth 1.0 was largely based on two existing proprietary protocols: Flickr’s API Auth and Google’s AuthSub. The result represented the best solution based on actual implementation experience. (Introducing OAuth 2.0)

So...even when good, informed standardization is aimed for, flawed technologies built with flawed processes are unavoidable.

But these things are so popular! Vendors and developers can't get enough of this kind of stuff. It's a continuous stream: XML APIs become JSON APIs, microformats become microdata, but the same patterns are repeated again and again.

Years of these developments passing RDF by. Plan A : The Semantic Web still seems as far in the future as it did 5, 10 years ago. The RDF technologies demonstrably work, and adoption is growing, but it's hardly viral. However you look at it, the world of trendy new specs repeatedly steers around that fact. What's a jaded RDF enthusiast to do? Here's what I recommend:

Exploit the situation!

With a continuous flow of different specs that each covers some little part of data on the Web, focusing on any specific development can only work in the short term. A strategy based on technologies that support flexibility and agility, using known best practices of the truly distributed Web is the best option in the long term, so that systems can be rapidly adapted to meet any new requirements. It doesn't matter that e.g. schema.org misses the point, the data is still useful. "Think globally, act locally" is a great expression - in this context it could mean accept whatever the world of Web 2.0+ has to offer, but handle it on your own terms.

In practice, let's say you're developing a system for a particular vertical market: dog leads (I'm getting serious hints as I type). Don't build the system from scratch based on what people in the dog lead market are doing, don't tie yourself to domain-specific schema or protocols. Wherever possible use commodity, off-the-shelf tools. Then if dog leads take a nose dive on the international market you can regroup with a different target - cowbells for cats - using the same tools, and same skill set. The only parts that need change are at the edges. Basically RDF technologies offer a long-term commercial advantage.

Comments to G+ please.


danja
2011-09-16T14:31:52+01:00
google streams contacts rant federated web semantic semweb activity rdf portable
Related
Comments
Edit

Affordances, described with less clutter

Posts on this blog get picked up by Facebook. Alison who's an experienced Web developer spotted my last post over there and couldn't make much sense of it. Hardly surprising, I referred to rather a lot of obscure stuff and used a lot of jargon without much explanation. But given that this affordances thing relates directly to the way everyone uses the Web, a developer should be able to make sense of it. So here I go again, this time trying to stick to the main points, glossing over the detail. [Blimey, but I've ended up rambling on a long while]

So on the Web you've got lots of documents in HTML on servers and lots of people with clients (browsers) that understand HTML. Those documents and various other messages are passed between server and client using the HTTP protocol. Most of HTML is about document structure, which with the aid of CSS can make text look good on the screen. But it has several things built in that allow a client to communicate over HTTP and hence allow the end user to interact with the Web. Most used is almost certainly the <a href="http:/example.org/here">something</a> link. When interpreted by a browser, that bit of markup highlights the word something and enables the link http://example.org/here to be followed by clicking on the something.

One fairly archaic definition of the word afford is to provide or supply (an opportunity or facility). Presumably this is where a 1970's psychologist got the word affordance (Wikipedia) which he defined as an "action possibility" (and some other stuff). This got picked up by human-computer interaction folks and mutated a bit, but "action possibility" is good enough here. So what the browser does with the bit of markup above - enables the link http://example.org/here to be followed by clicking on the something - can be described as an affordance.

The Web can be looked at as an information store with which we interact, and borrowing from database speak we have four basic operations: Create, Read, Update and Delete (CRUD). Through the highlighted, clickable link the browser provides the Read operation. When we want to Create e.g. a new blog entry, Update or Delete it we typically interact through a HTML <form>. So the kind of things a form enables can also be described as affordances. It's not unreasonable to expand the definition to include certain things the browser does that go beyond displaying a document with structure, things like displaying an image file that's linked to by an <img> element. Nowadays we're surrounded by loads of other different potential interactions thanks to Javascript and Ajax, these are also affordances. With the rise of blogging, online photo/video sharing and social platforms like Facebook, Twitter and now Google Plus, there's a new emergent breed of affordances that's been identified that include things like share, like, +1 etc. These are typically powered by Ajax and very often operate across sites and involve some data transfer, e.g. if you post a link on Facebook to a photo on Flickr it'll add it to your wall display a thumbnail of the image and the title. This new breed of affordances has been called Web Intents or Web Actions depending on where you look. (The Web Intents thread is I believe partly derived from a similar thing called Intents on Android phones, but having never used one I can't comment).

Ok, now there's an increasing amount of data on the Web expressed as Linked Data. This is published using the Resource Description Framework, RDF (depending on who you ask, linky non-RDF formats can also be considered linked data, but that's not really relevant here). The question is, how best to interact with this material, in other words what affordances do we need? There's a natural expression of documents on the Web - just show them as documents - but even for a passive display it's not altogether clear how to represent data. Ok, with traditional databases we usually have a table of some kind. But in that context we have a good idea in advance what can go in the rows and columns. On the Web, where the data can potentially be any shape it's a much trickier creature to pin down. With documents there is the familiar constraint of the individual document or page, whereas data doesn't chunk so neatly - the data we're interested in might be spread wide across the Web, between files containing only a handful of statements and stores containing millions. Links are part of the expression of the data, and links are the fabric of the Web, twisty eh? And this is just considering the Read aspect, there's also (at bare minimum) Create, Update and Delete to throw into the mix. We also need to not only interface with simple file-like linked data representations, there are also triplestores with SPARQL interfaces to consider (although the linked data API should help there, it can make a triplestore+SPARQL setup look more like normal Web representations).

However, to put these kind of problems into context - we don't need every possible operation for all data in all environments, far from it. One thing the work around Web Intents shows is that a handful of little facilities (share, like etc) are making a big difference in the benefit people get out of the Web. One thing that should really be avoided is making things as special cases - if you can share from A to B then you should be able to use the same mechanism to share from C to D and so on (this isn't that different from the centralized system setup, things on the Web should be distributed and ideally federated).

Ok, seems that affordances are going to be pretty important for working with the Web of Data. Some fairly good analysis has been done of HTML-in-browser affordances, and taking a leaf from the HTML book the simple hypermedia click-following of links seems a reasonable place to start in assembling suitable tools (in fact there are quite a few tools out there that support this in one form or another). It's fairly certain that some of the affordances will be a vastly different than those we're familiar with - data supports things like merging (trivial in RDF), query and inference, completely different kinds of transformation and analysis than text and so on. At the moment it's not even really clear that a general-purpose tool like the HTML browser is for documents makes sense for Web data (my guess is most likely a variety of different tools will be built inside the Javascript-capable browser, with different tasks being spread between clients and services).

But again to put these problems into context, there's no reason why any individual applications should be much different than they are today. Passing an image and its title between Flickr and Facebook requires the same basic machinery whatever kind of markup is used to describe the material. One of the aims for the Web as a whole, augmented by the Web of Data, has to be a reduction in complexity for common tasks. The fact that a whole new world of potential applications becomes feasible is just, well, interesting.


danja
2011-08-29T02:56:10+01:00
federated actions intent affordances rdf
Related
Comments
Edit

Magnificent Seven for APIs

Some interesting survey results have just been published about APIs: the good, the bad and the pains. I commented about this on G+ and the discussion there got on to Atom. Some interesting points made, including the likelihood that we're stuck with snowflake APIs (every one is different) for the foreseeable future. I think it was Bill de hÓra who had a post years ago (can't find it now) about the N x N problem of diverse APIs (/models/formats). Essentially if you've got N different APIs then to connect them all you need N x N different translators. But it's also worth noting that this can be reduced to 2 x N if you have mappings to a common format/model. I reckon recent history has shown that formats are secondary, assuming certain boxes are ticked (see below). Regarding the model - there is a well-known, Web-friendly one. So here I'll simply point to ConverterToRDF and ConverterFromRDF.

In the G+ discussion Bill referred to an old blog post of his, Magnificent Seven - the value of Atom. In it he highlights the 7 'primitives' that Atom (format and protocol) uses and that he suggests should be used in any carrier format. I'm inclined to agree, if you are creating an API, tick these boxes, repeated here without Atom-specificity:

  1. ID - a globally unique identifier for the chunk of data, ideally this should be a HTTP URL
  2. Link - as above, it's rare that a separate ID and URL are needed
  3. Updated - the most recent change, invaluable for keeping things in sync
  4. Extension rules (mustIgnore, foreign markup) - anything the parser doesn't understand, it simply ignores. This allows other people to reuse and extend the format in a compatible fashion.
  5. Date construct rules - using a standard date format is basic politeness
  6. Content encoding rules - generally follow the rules for the media type you're using, and if there's textual content use an existing standard format (XHTML is good). Rule of thumb: UTF-8.
  7. Unordered elements - insisting on order in the structure is (or at least should be) unnecessary, accessing things by name is more reliable
The most significant bit is the ID/Link, this is essential for any API on the Web. It allows the use of the "follow your nose" protocol: if you want any more information about a thing, follow the link. It works for regular Web documents and increasingly for linked data.
Incidentally (1), if you are an API developer/user you may like to have a look at the Linked Data API, looking at what's needed to make access to data in a SPARQL-capable store more developer-friendly. Comments welcome there.
Incidentally (2), Google Plus is emerging as a pretty good discussion space, if you're in need of an invite mail me.


danja
2011-08-13T09:42:21+01:00
apis atom federated json rdf
Related
Comments
Edit

Protocol

Sasha and Primo demonstrate a combination of "follow-your-nose" and authentication:

Protocol


danja
2011-07-21T19:52:53+01:00
primo federated dog sasha rdf protocol cat
Related
Comments
Edit

FSW SFW?

See http://dannyayers.com/2011/07/20/FSW-SFW

[I've not got any handling in for ? on the end of titles in my blog engine...sorry if this post appears twice]


danja
2011-07-20T11:57:21+01:00
federated social web rdf fsw2011
Related
Comments
Edit

FSW SFW

[oops, I've not got any handling in for ? on the end of titles in my blog engine...sorry if this post appears twice]

As usual, after the Federated Social Web meet in Berlin I'd planned to write comprehensive blog posts about it. As usual I didn't get far before getting distracted. So far I've done a bit of overview of the conf, a brief note on privacy issues and a fairly random think-piece on decentralized vs. distributed networks. But I haven't actually covered what were probably the two main take-aways from the conf - Federated Social Web stuff itself and the role of WebID. In lieu of something better I'll drop a few key links in now. In both cases things have moved along very quickly in the past few weeks with Google+ and BrowserID, more on those in a mo.

FSW

One big meme was that of the Facebook-killer - basically we need something that has all the user-friendliness of Facebook but not as a walled garden (and with a better story on privacy etc). Step forward Diaspora - you can use it as a service a la Facebook (with which it shares many features), but also set up your own install. There were also a handful of other apps with a similar style. It took me about a 1/2 hour to set up my own install of Status.Net, essentially an open version of Twitter. though I have yet to start using it and probably more significantly yet to connect it up to the other services I use.

Another pointer I must include is to the W3C Federated Social Web Incubator Group. As the charter describes, its scope is pretty wide, including the various emergent protocols and technologies in this space. One of the initial targets is to move forward the Social Web Acid Test - Level 0 (SWAT0) - an integration use case for the federated social web. On the Wiki there are potential use-cases or user-stories that could become part of SWAT1. They're both fairly short so I'll paste SWAT0 and the list of non-W3C technologies from the charter below. The incubator group is encouraging people to join, so if you're interested in this material please sign up.

WebID

To quote from the WebID site, "With WebID, logging into a website is as simple as selecting a WebID and clicking 'log in'". It's a very nifty bit of tech, secure, relatively straightforward to implement, much simpler than most of the alternatives. In essence it's about passing a URI in with a PKI certificate. When Henry presented this at the conf, the audience response was interesting. Although it isn't rocket science, the certificate stuff used isn't very intuitive (personally I have a blind spot on all things auth), so not everybody got it. Of those that did get it, very few could believe what it provided. A question from the audience was telling : "What can be easier than using username + password to log in?". Henry : "One click.".

Although not critical to the functioning of WebID, one of the coolest aspects is that it cleanly supports FOAF (and other) profile discovery, the service can learn more about the user to improve their experience. In other words it's entirely compatible with the Semantic/Linked/Data Web.

WebID was initially known as FOAF+SSL, on the Wiki oh, also here, there are lists of implementations etc. Watch the video and read the notes from Berlin for more.

There's also a W3C WebID Incubator Group.

...

Videos of presentations of the FSW meet in Berlin are online, along with most of the papers.

Google+

Before going any further, I should remind you that we already have a Federated Social Web, the blogosphere. However this is weak on many aspects - the social graph is fairly inaccessible, often poor UIs - in particular feed aggregators are clunky things, immediacy is seriously lacking, identity management and the personal profiles that there are messy, privacy, auth and access control systems are virtually non-existent. Of course all that has left a convenient niche for Twitter, Facebook, and now Google+.

I largely agree with Edd in his (must-read blog post) Google+ is the social backbone. As a competitor to Facebook it does open up the social aspects as a commodity, and it's considerably more open and linkable, i.e. Webby (here's my stuff). I do worry about Google becoming all-powerful in this space, but as they say this too shall pass. I personally believe the nature of the Web is such that any attempts to monopolise or centralize systems will inevitably fail - because decentralized/distributed systems have inherent evolutionary advantages, though they may take time to take effect. So I reckon Google+ should be viewed by Web technologists not as an end in itself, rather as a bootstrap to a more social Web.

Although Google+ doesn't have any Semantic Web features per se, it does a reasonable job of giving people URIs and linking them together. But rather than a niche, there's a gaping void for describing things in general in a machine-friendly form. Whether RDF-oriented linked data activity will expand to fill this void or some Googlesque reinvention (cf. microdata overlords) of RDF remains to be seen, but either way this also seems inevitable (see also Smarter (Hash)Tags and Google+). I'm not sure we're seeing it yet, but with a bit of luck, once the commercial world sees the SEO etc advantages, GoodRelations should cause a large expansion of semwebbiness.

BrowserID

BrowserID is a recent development from Mozilla. It's close to WebID in that it's in the identity space and about secure signing in, but arguably the primary goal is somewhat different. Broadly speaking, it boils down to the payload of WebID being a URL and the payload of BrowserID being an email address. Discussion is ongoing about the (/any) relationship between the two protocols. All other considerations aside, I'd suggest that WebID is more versatile in that there's a lot more you can do with a URL than an email address and because BrowserID is easier to integrate with existing email-based auth, there's better impedance matching with existing systems. I've tried to argue that BrowserID should allow the user to associate a (non-secret) URL with their email address to allow profile discovery etc. But consensus seems to be that keep-it-simple now trumps easier stuff later (WebFinger has been suggested as the route to discovery, I'm not altogether convinced as it's quasi-centralized, requiring a service to assert the email/URL mapping). Whatever happens on this particular point, BrowserID is certainly an interesting and useful development.

- - - -

SWAT0 Use Case

  1. With his phone, Dave takes a photo of Tantek and uploads it using a service
  2. Dave tags the photo with Tantek
  3. Tantek gets a notification on another service that he's been tagged in a photo
  4. Evan, who is subscribed to Dave, sees the photo on yet another service
  5. Evan comments on the photo
  6. David and Tantek receive notifications that Evan has commented on the photo

FSW-related Technologies

ActivityStreams
ActivityStreams is an evolving format for syndicating social activities around the web.
OpenID Foundation
The OpenID Foundation is the group responsible for OpenID-related standardization. Although work like OpenID Connect is a moving target, the test-cases and specification should be compatible with OpenID.
OStatus
OStatus is an architecture combining Pubsubhubbub, WebFinger, ActivityStreams, and PortableContacts.
Portable Contacts
The goal of Portable Contacts is to make it easier for developers to give their users a secure way to access the address books and friends lists they have built up all over the web.
Pubsubhubbub
Pubsubhubbub (PUSH) is a server-to-server publish/subscribe protocol as an extension to Atom and RSS. Servers compliant with PubSubHubbub can get near-instant notifications when a feed they're interested in is updated.
Salmon Protocol
As updates and content flow in real time around the Web, conversations around the content are becoming increasingly fragmented into individual silos. Salmon aims to define a standard protocol for comments and annotations to swim upstream to original update sources -- and spawn more commentary in a virtuous cycle.
SMOB
SMOB (Semantic MicroBlogging) is a framework that enables an open, distributed and semantic microblogging experience based on Semantic Web and Linked Data technologies.
Webfinger
WebFinger is about making email addresses more valuable, by letting people attach public metadata to them.

danja
2011-07-20T11:55:44+01:00
federated social web rdf fsw2011
Related
Comments
Edit

The Symbiotic Web

During the Federated Social Web meetup in Berlin a few weeks ago, most folks used the phrases "distributed network" and "decentralized network" interchangeably, which doesn't seem unreasonable at this point in time when both appear in major contrast to the prevailing "centralized network" architecture of Web sites. On my last night in Berlin, on the steps of a crashed space station at around 4am (early flights to catch) I was chatting with Harry Halpin and he had the following diagram on his netbook:

networks

It's from Paul Baran's landmark memo from 1964, "On Distributed Communications: 1. Introduction to Distributed Communications Networks" (see also some related network diagrams), some of the work which eventually led to the development of the Internet.

Harry was quite insistent on the significance of the "decentralized" net, saying that it was the one you found in nature (e.g. plant structure). I suggested that "distributed" looked at lot like (a 2D representation of) biological cell structure. That wasn't a very satisfactory analog, and since I've had my eyes open for a good natural world example of "distributed". Now I think I have one, and while it's in a different dimension than e.g. plant structure I reckon it maps quite nicely onto Web systems.

Lichen!

lichen

(in the woods up the hill)

To quote Wikipedia:

Lichens are composite organisms consisting of a symbiotic association of a fungus (the mycobiont) with a photosynthetic partner (the photobiont or phycobiont), usually either a green alga (commonly Trebouxia) or cyanobacterium (commonly Nostoc). The morphology, physiology and biochemistry of lichens are very different from those of the isolated fungus and alga in culture.

Now imagine how these things might have evolved. Initially there must have been an inheritance tree for the fungi and an independent tree for the algae (following the "decentralized" form), but then at some point the organisms started to get benefit from each other (I am not a microbiologist, but I'd guess that it probably started as a parasitic relationship, then the host side evolved some advantage). So there's a structure something like this:

lichen net

The tree has become a graph. [PS. ok, strictly speaking a tree is already a graph, but you know what I mean]

Analogies get useful when you can use known aspects of one perspective to predict unknown aspects of the other (like the weird old alchemists' "As Above, So Below"). I don't know, vague hand-waving, maybe the nutrient molecules the fungi handle in lichen could be said to correspond to data, the photosynthesis of the algae corresponding to processing.

While clear-cut symbiosis like this isn't exactly the most common relationship in nature, there's obvious interdependence between every kind of organism on this planet. I don't think it's much of a stretch to suggest there are good parallels with Web systems, especially if you view the interfaces between organisms and their environment as corresponding to APIs between online systems. Certainly client tools and services (agents, in other words) correspond nicely to organisms.

The Web of Data (alongside the Web of documents) is already pretty distributed, the Linked Open Data cloud diagram being a nifty representation. This aspect of the Web isn't in itself particularly dynamic in its operation (data usually just sits there, periodically updating). But given the number of processors connected to the Web as servers and clients, the digital environment certainly has the potential for extremely interesting interactions.


danja
2011-07-15T15:06:41+01:00
federated networks decentralized lichen rdf distributed
Related
Comments
Edit

Privacy bullet points

Federated Social Web stuff.

It seems privacy can't really be pinned down, the definition is evolving. But you can effectively use a working definition (pick one).

Things are different depending where in the world you live.

Your average internet user hasn't a clue.

What's being leeched from your online activity - virtually nobody takes on the implications. But people are learning, they're as far as 1999.

Even when the browser vendors get together and make a button to limit things - still no-one gets the implications (see Aleecia at the link above).

Ok, so far is mostly "duh!".

But there was a lovely little revelation (from Soren I believe) that statistically the people more aware of privacy tend to be those with more disposable income [bum, that's twitterable]. If you want this demographic's dollars in your consumer base, you better get your privacy sussed.


danja
2011-06-09T22:31:14+01:00
federated social web rdf fsw2011
Related
Comments
Edit

Federated Social Web

I've been out of the tech loop somewhat the past couple of years, and had decided not to go to conferences for a while. Ennui mostly. But when a Federated Social Web meet in Berlin showed up on the radar, it struck me I might get the shot in the arm I needed. Wasn't far off the mark. Berlin itself I found awesome, but right now I want to get down some notes on the conf. Falk (my new pen-pal) has a couple of overview posts. Good start Thursday night meeting up with Henry and a good crew. Friday morning I was tempted to sit in on the WebID WG but decided to leave them to it, relax in the hostel instead. That was until I got a ping from danbri, flying visit, unexpected f2f. Then the conf. proper started.

It opened with a pep talk from timbl via video link (captured by Dan Romescu, who has also written up the event). Nothing remarkable (aside from how hyper the man can be at 5am local or whatever :), just reinforcement that the notion of "Federated Social Web" is pretty much the same as Tim's notion of how the Web should be.

After that, all the stage stuff was captured on video by the organisers (bravo!).

For most of the presentations and discussion, Facebook was the mammoth in the room. All the stones they've turned over regarding identity, privacy, Web-wiring is astonishing. But there are people generally very well aware of these issues, which was nice.

beh, I'm really struggling writing this up, I get to 135 chars and start counting. Have to do it PowerPoint. The first bullet:

Lessons learned from Social Networking in Egypt (Amr Gharbeia) is really a must-see. A lot of the media bollocks about Facebook and Twitter playing a role in recent Middle Eastern events was true.

A related must-see presentation happened after the FSW event, over at starship c-base. How some European hackers were able to get communications going again after a govt. had pulled the plug - go to about 1700 on the vid here at telecomix (so I'm told, not got bandwidth here to check :)


danja
2011-06-09T21:00:19+01:00
federated social web rdf fsw2011
Related
Comments
Edit