HTML Tidy from Python on Mac OS X@en

Still playing with microformats, now GRDDLing. I started around Dom's online Demonstration of an RDF in XHTML processor on results from Technorati's microformats search results, but hit an immediate barrier in that the processor (naturally) expects well-formed XML input (for XSLT), not the scruffy HTML most microformat data is marked up in. But what I need is some demo code (I'm going with Python), so could use local stuff.

For production code I'd maybe have opted for Beautiful Soup (Python) or TagSoup (Java) for cleaning, but because it's for demo purposes I needed something that wouldn't need much explanation. Dave Raggett's HTML Tidy was the obvious choice. I've used it with Python in the past on both Win32 and Debian Linux, using the uTidyLib wrapper. On those platforms it's straightforward - install ctypes, the relevant tidylib and uTidyLib and you're laughing. This is my usual snippet for tidying HTML:

import tidy



def toXHTML(html):   

    options = dict(output_xhtml=1, add_xml_decl=1, indent=1, tidy_mark=0)   

    return tidy.parseString(html, **options)


But on OS X it just didn't work. Bit of nosing around I discovered Tidy was already here on the machine (out of the box?). But uTidyLib couldn't see it. Bit of Googling, I found this, which didn't actually work itself, but pointed me in the right direction. On this machine the key file was /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/tidy/lib.py, here's a version that worked for me, here's a patch for the one that came with uTidyLib.

Funnily enough I met Dave and Dom for the first time in Cannes earlier in the year - Dave helped me with issue-minibar-too-full and Dom, had me, er, eaten.

@en

Danny Ayers
2006-07-06T19:35:51+02:00

Related
Comments
Edit