A useful metric for the rate of technological change is the average period during which speed or capacity doubles or, more or less equivalently, halves in price. For storage, networks, and computing power, these periods are around 12, 9, and 18 months, respectively. The different time constants asssociated with these three exponentials have significant implications.
I suspect these figures could be applied to support almost any thesis, including Foster's Grid work and Bill's thoughts re. the future of data. Still, interesting to note...
I certainly don't disagree with Bill's points re. distribution
& parallelism, including his final point -
...if data needs to be distributed because there's so much of it, and managing a lot of data is consequently difficult, but not core to most business or personal operations, a data grid is a potentially huge utility market to be part of.
- but I'd approach that from a slightly different angle.
I think we need to think bigger picture, a tiny bit bit further down the road. Data is already distributed, on every laptop, desktop and LAN around the world. Management of it is usually local, but that means the value that can be accrued from data integration is limited. While shifting to natively parallel languages and things like Hadoop and MapReduce may improve matters significantly in a (relatively) closed environment, the biggest gains are to be made from global integration. The most promising system we have for that so far is the Web.
While some of the current Grid systems do go some way towards attempting global integration, they still IMHO suffer from an underlying attitude that it's a local system being scaled up (a la RPC). Of course the Grid folks have noticed the advances in distributed architecture made by the Web, but unfortunately tend to be locked in the WS-* groove (in fact their own branch, WSRF).Â
Thing is, if systems are scaled massively only using WS-* grid techniques or their low-cost-of-entry coding counterparts, the end result at best is still going to look like Google. Their index is a silo, only accessible through a narrow tube. Google is on the web, rather than being of the web.
I reckon the best way forward is to aim for the best of both worlds - think global (have a web-resource-oriented model), act local (expose RESTful interfaces, if necessary use the gridlike techniques to scale internally).
Going back to the point about data management and integration, it's worth noting that these are achieved first and foremost through manipulating metadata. Integration of laptop/LAN data is possible without exposing (and potentially shifting around) all the data on the wire all the time. This why I think just-in-time use of HTTP (i.e. linked data) is viable, when supported by distributed caching and querying (i.e. triplestores). I think timbl's notion of the (Semantic) Web as being a fractal entity makes sense here, data can appear at lots of scales, interactions between systems can happen independently.
Hmm, didn't mention agility...but the principles and benefits around there are fairly well known. In short, I reckon the late commitment that loosely-coupled systems can offer is conceptually closely related.
See also: how to write parallel programsÂ
[this didn't come out all that coherent, but I was only going to paste the Foster quote over in Bill's comments, then they didn't work for me...ah well, I do have a proper write-up on this general area in the pipeline]@en