Talking with Stefano Mazzocchi this morning, slightly at cross purposes, action item : I have to say publicly that GRDDL is an edge case thing.
I don't see a problem with that characterisation. It does seem likely that there will be a significant amount of extra data on the web from people for whom GRDDL works as specified (e.g. the chemists), but as far as the messy old wild web is concerned GRDDL only satisfies edge cases.
I could spout Long Tail cliches here, but I don't need to. It doesn't matter that the GRDDL spec only covers subset_of ( well-formed , well_intentioned), it's a facility for publishers that need it (straight declarative data at low cost), for consumers that want to exploit it, it offers a route to slightly cleaner scraping (so you use Tidy, no-one dies). It's also shifting an interesting window a little in an interesting direction.@en