[in reply to John Sowa on the cg@conceptualgraphs.org list, unfortunately the mail didn't get through - something up with the server]
I reckon the activities around Linked Data are somewhat different to the typical "Next Big Thing". I'd suggest the NBT here if anything is the Semantic Web, which has suffered from industry hype, and as yet does not live up to the promises. However Linked Data is essentially the same idea as the Semantic Web, but with more emphasis on the "Web" side and less on the "Semantic".
The central idea of treating the Web conceptually as one big (graph-shaped) database works fine (and the LOD cloud [1] is a notable concrete manifestation), but as you note, most applications do require fast access to relevant data. Some of the more recent RDF stores/SPARQL engines do have performance comparable to traditional RDBs, but I don't think this is entirely relevant to the core paradigm. The tendency in the past has been for the creation of data silos, where each company or organization has their own discrete database. Where data is exposed to the Web it has been in the form of human-readable documents. This makes for a huge impedance mismatch for anyone wishing to use computers to make use of multiple data sources.
Where data is exposed to the Web as linked data, the material is available for direct recombination and reuse by other parties. When the appropriate standards are used (primarily URIs for identification, RDF for structure and HTTP for transfer) the notion of a database takes on a different form: a triplestore is a (fast) cache of a little chunk of the global Web of data.
Let's say electricity providers and water providers have their own databases. A company wishing to know where to lay fibre-optic cables would probably want to know where the existing (and planned) wiring/piping lies. Right now that would typically mean they'd need fairly in-depth knowledge of the database schemas and local conventions used by the utility companies. But if the data is available in a consistent form (i.e. RDF) then the work of aligning the source data and extracting the information becomes that much easier. The utilities may still have their own idiosyncratic ways of describing their systems, but then again if they happen to use some common vocabularies (e.g. for geo-location) considerably less expert knowledge of the individual systems is needed to get started. The fibre-optics company could run selective queries (or run a crawler) over the utilities' Web-exposed data, and trivially merge the results in their own, local, performant store.
The adoption of linked data has to some extent slipped under the radar of industry hype, a good example being http://data.gov.uk, which aims to take (non-personal) UK government data and expose it to the Web in a reusable form. The change in paradigm and increased potential for reuse is pretty apparent when you consider that a lot of the source data is held in Excel spreadsheets or buried in documents. This government-backed project has yielded a couple of surprises - on the one hand the willingness of gov departments to hand over their data and help out (the material being technically publicly available already, for practical reasons that can be far from the case). On the other hand developers have been fairly clamouring to get their hands on the data to build end-user applications.
(Incidentally, some of the data.gov.uk folks are working on the Linked Data API [2] which provides interfaces to triplestores which don't require any knowledge of RDF or SPARQL, which has traditionally been something of a blocker).
Linked Data and Hype
2010-08-29T07:00:29+01:00
linkeddata semweb hype
Related
Comments