David's blog

Err and err and err but less and less and less

David's blog

Err and err and err but less and less and less

Schema validation with LXML on Ubuntu Hardy

LXML is an amazing Python module that picks up where the standard xml.dom(.minidom) left off.

It’s basically a set of wrapper code around the libxml2 and libxslt libraries, and provides functionality missing in Python’s standard library, including XML validation and XPaths.

On a project I’m currently working on I needed a good XML library for Python and ended up trying out lxml. But I simply could not get the schema validation to work, and after several wasted hours I understood that the default lxml that ships with Ubuntu Hardy (the distro I’m using) used the relatively old 1.3.6 python-lxml package.

I’m usually very reluctant to install anything as root that does not come from the “official” repository, but for lxml I made an exception and installed the python-lxml package from the upcoming Intrepid distribution.

Add the following line to your /etc/apt/sources.list file:

deb http://ch.archive.ubuntu.com/ubuntu intrepid main

Then run Synaptic as usual and install python-lxml version 2.1.1. To verify that it works fine, you can test schema validation thus:

>>>> from lxml import etree
>>>> schema_tree = etree.parse('path_to_schema.xsd')
>>>> schema = etree.XMLSchema(schema_tree)
>>>> doc = etree.parse('path_to_some_document')
>>>> schema.validate(doc)

That last command returns as a boolean the result of the validation.

Schema validation with LXML on Ubuntu Hardy
Scroll to top