Soup Wrestling

I just got around to reading Uche Ogbuji's interesting Wrestling HTML article at xml.com on parsing HTML with Python. Uche mentions HTML to XML, something the rough SAX2-styled HTML soup/ill-formed XML parser code I did a while ago can do pretty easily (not well, but easily). So I've added a quick handler to demo that (there's already an RSS/Atom feed reading demo in there) and here it is: psoup_2004-09-13.zip.

It's not been tidied up like JSoup (Java version, still hacky!), but it still might make a useful starting point for someone.

PS. in the process of packaging it up, I introduced a bug - characters() aren't passed through. There's a duplicate method in the handler, not had chance to delete yet…

[Danny]

Danny Ayers
2004-09-13T14:13:31Z

Related
Comments
Edit