Sam Ruby's just been pulling feedlist info out of OPML and inserting it into PlanetPlanet's config file. While reading this I remembered a little item on my todo list and being in (potentially structured) procrastination mode decided there was no time like the present.
When I've played with aggregation in RDF with Python in the
past, I've pulled out the feed subscription list programmatically
(I've also been working around
Redland, as does the
Chumpalogica
Planet code). But now
SPARQL's
available pretty much everywhere (supported by Redland/Rasqal,
naturally), there's no good reason to hardcode such stuff. The
expression of feedlists in RDF/XML is very verbose (mostly because
it carries loads more information than URI+title) but is easy to
work with. Here's a (slightly trimmed) example from
PlanetRDF's
blogroll:
<foaf:Agent>
  <foaf:name>John
Barstow</foaf:name>
  <foaf:weblog>
   Â
<foaf:Document rdf:about="http://www.nzlinux.org.nz/blogs/">
     Â
<dc:title>Visions of Aestia</dc:title>
     Â
<rdfs:seeAlso>
       Â
<rss:channel
rdf:about="http://www.nzlinux.org.nz/blogs/wp-rdf.php?cat=9" />
     Â
</rdfs:seeAlso>
  </foaf:Document>
 </foaf:weblog>
</foaf:Agent>
As a first pass on getting the necessary info out, ten
minutes playing with Leigh's
Twinkle
(*snigger*) gave me this:
PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rss: <http://purl.org/rss/1.0/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?name ?title ?feed ?blog
WHERE {
  ?agent foaf:name ?name ;
        Â
foaf:weblog ?blog .
  ?blog dc:title ?title ;
       Â
rdfs:seeAlso ?feed .
  ?feed rdf:type rss:channel .
}
That's ok as far as it goes, but there's a good chance that
automatically harvested data might be missing either the blog title
or blogger name. So here's version two:
SELECT ?title ?feed ?blog
WHERE {
 ?agent foaf:weblog ?blog .
 ?blog rdfs:seeAlso ?feed .
 ?feed rdf:type rss:channel .
 OPTIONAL {
   ?blog dc:title ?title .
 }
 OPTIONAL {
   ?agent foaf:name ?title
.
}
}
i.e. if the blog title is available, use that for the value
of title, otherwise use the name of the agent (blogger).
The XML results of that query look like this:
<sparql
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns="http://www.w3.org/2001/sw/DataAccess/rf1/result" >
 <head>
  <variable name="title"/>
  <variable name="feed"/>
  <variable name="blog"/>
 </head>
 <results>
  <result>
   <title>John
Barstow</title>
   <feed
uri="http://www.nzlinux.org.nz/blogs/wp-rdf.php?cat=9"/>
   <blog
uri="http://www.nzlinux.org.nz/blogs/"/>
  </result>
 <result>
  <title>Plan B by Libby
Miller</title>
  <feed
uri="http://planb.nicecupoftea.org/index.rdf"/>
  <blog
uri="http://planb.nicecupoftea.org/"/>
 </result>
...
 <results>
</sparql>
The PlanetPlanet configs are simple text files (
config.ini).
After switching toolkit (to emacs + xsltproc at the command line),
10 minutes later I had this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:res="http://www.w3.org/2001/sw/DataAccess/rf1/result">
<xsl:output method="text" />
<xsl:template match="res:sparql/res:results">
  <xsl:for-each
select="res:result">
[<xsl:value-of select="res:feed/@uri"/>]
name = <xsl:value-of select="res:title"/>
 <xsl:text>
  </xsl:text>
 </xsl:for-each>
</xsl:template>
</xsl:stylesheet>
which produces:
[http://www.nzlinux.org.nz/blogs/wp-rdf.php?cat=9]
name = John Barstow
[http://planb.nicecupoftea.org/index.rdf]
name = Plan B by Libby Miller
...
- the way the feedlist looks in the config files.
Ok, procrastination over, I haven't time to look into how you
might integrate this with PlanetPlanet, but that shouldn't be
difficult - the Python RDFLib (as used in Sam and co's
FeedValidator, no less) has
had some SPARQL support for a while, not sure of the current
status. But the interesting stuff only really starts after being
able to read foafrolls - there's all kinds of other info available
in RDF that could be useful to a Planet-style aggregator
(especially if you did FOAF autodiscovery/XFN/Geo tag
snagging/hCalendar GRDDL on the blogs).
But then the use of config.ini for the data would probably start
looking clunky, so the logical thing to do would be to use an RDF
store (maybe an RDF/XML file fronted by RDFLib). This would be a
move from the simple elegance of the current planet.py. But then
pretty much for free you could also use the store for persistence
of entries, and facetted views of the person/entry data through
SPARQL, plus (assuming RDFLib supports it), text search through
SPARQL's regex support. A little RDF goes a long way...
Hmm, there's a little data point - it took me well over twice
as long to write this post as it did to do the query and XSLT. The
code was a lot more fun too ;-)