homeR: an R package for building physics

For the past few weeks we’ve been very busy here at Neurobat with the analysis of field tests results. In the process of doing that, we had to implement several functions in R that relate to building physics.

We thought it might be useful for the community to have access to those functions, so we decided to begin submitting them to CRAN. That’s why we’re pleased to announce the first release of the homeR package, which we will fill over time with such functions.

This first release, version 0.1, contains just `pmv`, a function to calculate the so-called Predicted Mean Vote, i.e. a measure from -3 (cold) to +3 (hot) of thermal comfort as a function of several variables, including air temperature, clothing and metabolism.

Here I show how with this function one can derive a contour plot showing, for given clothing and metabolism, the optimal indoor temperature (assuming 50% relative humidity). We’re basically going to solve `pmv(clo, met, temp, sat) = 0` equation for `temp` across a grid of `clo` and `met` values with the `uniroot` function.

> clo <- seq(0,2,length=21) > met <- seq(0.6,3.2,length=21) > zero.pmv <- function(clo, met) uniroot(function(temp) pmv(clo,met,temp,50), c(-10,40))$root > contourplot((outer(clo,met,Vectorize(zero.pmv))),
cuts=20,
aspect=”fill”,
panel=function(…) {
panel.grid(h=-1,v=-1,…)
panel.contourplot(…)
},
row.values=clo, column.values=met,
xlim=extendrange(clo), ylim=extendrange(met),
xlab=”[Clo]”, ylab=”[Met]”)

And here is the resulting plot:

Predicted Mean Vote contour plot
Predicted Mean Vote contour plot

As you can see, this is pretty similar to that sort of plots one finds in standard textbooks on the subject, such as Claude-Alain Roulet’s Santé et qualité de l’environnement intérieur dans les bâtiments:

PMV contour plot from textbook
PMV contour plot from textbook

Please give the `homeR` package a try, and give us your feedback. There’s only the `pmv` function in there at the time of writing but we plan to extend the package in the weeks to come.

Git and Scientific Reproducibility

I firmly believe that scientists and engineers—particularly scientists, by the way—should learn about, and use, version control systems (VCS) for their work. Here is why.

I’ve been a user of free VCSs for a while now, beginning with my first exposure to CVS at CERN in 2002, through my discovery of Subversion during my doctoral years at EPFL, culminating in my current infatuation with Git as a front-end to Subversion. I’m now a complete convert to that system and could not imaging working without it. Every week I discover new use cases for this tool that I had not thought about before (and that I suspect the Git developers didn’t, either).

This week I found such a use case for Git: enforcing scientific reproducibility. Let me explain. I’m currently working on prototype software written in MATLAB that implements some advanced algorithms for the smart, predictive control of heating in buildings. As part of that work we need to evaluate several competing algorithm designs, and try out different parameters for the algorithms.

The traditional way of doing this is, of course, to set all your parameters right in your code for the first simulation, to run it, then to set the parameters right for the second one, to run it again, and so on. There are several problems with this approach.

First, you need a really good naming convention for the data you are going to generate to make sure that you know exactly which parameters you set for each run. And coming up with a good naming scheme for data files is not trivial.

Second, even if your data file naming convention is good enough that you can easily reproduce the experiment, how can you be sure that the settings are exactly right? That you didn’t, perhaps, tweak just that little extra configuration file just to work around that little bug in the software?

Third, how will you reproduce those results? Even assuming that you ran all your simulations based on a given, well-known revision number in your VCS (you do use a VCS, don’t you?), you will still need to dive in the code and set those configuration parameters yourself. A tedious, error-prone process, even if you manage to keep them all to one source file.

I think a system like Git solves all these problems. Here is how I did it.

I needed to run 7 simulations with different parameters, based on a stable version of our software, say r1409 in our Subversion repository.

I’m using Git as a front-end to Subversion. I began by creating a local branch (something Git, not Subversion, will let you do):

$ git checkout -b simulations_based_on_r1409

This will create a new branch from the current HEAD. Now the idea is to make a local commit on that local branch for each different set of parameters. Here is how:

  1. Edit your source code so that all parameters are set right.
  2. Commit the changes on your local branch:
    $ git ci -am "With parameter X set to Y"
    [simulations_based_on_r1409 66cea68] With parameter X set to   
  3. Note the 7 characters (66cea68 above) next to the branch name. These are the first 7 characters of the SHA-1 hash of your entire project, as computed by Git.
  4. Run your simulation. Log the results, along with the short hash.
  5. Repeat the steps above for each different configuration you want to run the experiment with.

By the end of this process, you should have in your logbook a list of experimental results along with the short hash of the entire project as it looked during that experiment. It might, for instance, look something like this:



















Hash Parameter X Parameter Y Result
66cea68 23 42 1024
a4f683f etc etc etc

As you can see there are at least two reasons why it’s important to record the short hash:

  1. It will let you go back in time and reproduce an experiment exactly as it was when you ran it first.
  2. It will force you to commit all changes before running the experiment, which is a good thing.

I’ve been running a series of simulations using a variation on this process, whereby I actually run several simulations in parallel on my 8-core machine. For this to work you need to clone your entire project, once per simulation. Then for each simulation you checkout the right version of your project, and run the experiment.

Quite seriously, I would never have been able to do anything remotely like this with a centralized version control system. The possibility to create local branches and to commit to them is a truly awesome feature of distributed version control systems such as Git. I don’t suppose the Git developers had scientists and engineers in mind when they developed this system, but hey, here we are.

Are you a scientist or an engineer wishing to dramatically improve your way of working? Then run, do not walk, to read the best book on Git there is.

Installing ESP-r on Ubuntu 9.10

ESP-r, is an integrated modelling tool for the simulation of the thermal, visual and acoustic performance of buildings and the assessment of the energy use and gaseous emissions associated with the environmental control systems and constructional materials, in the words of its official website. In other words, it’s a computer program for modeling a building’s thermal and energy performance. It’s especially popular in Europe, particularly among academia.

Recently I wanted to install it on my laptop running Ubuntu 9.10 (Karmic Koala). The standalone installers provided on the main downloads website didn’t quite work, complaining about the lack of the libg2c library. Well of course it’s not available. It’s been obsoleted and is now deprecated.

Your best choice when installing ESP-r on Ubuntu is, quite frankly, to rebuild it from source. And it’s not complicated either. Here’s how I did it.

Check the project out from SVN:

$ svn co https://espr.svn.cvsdude.com/esp-r/trunk espr
$ cd espr/src

Ensure you have the required development libraries installed. In particular you will need libxml2-dev if you build with XML support and libX11-dev if you build the X version (which I recommend). Note that you really need the -dev packages, these contain header files required when compiling an application against those libraries.

Choose now an installation directory. Being the only user of my machine, I installed under ~/programs/esru, but you might want to install under /opt/esru. Now you should be able to build:

$ ./Install -d ~/programs/esru

ESP-r installation script.

Please consult the README file before commencing
installation. This script will rebuild the ESP-r
modules on your system. You can abort this process
at any time by pressing c.

Please answer the following questions. Default answers
are in []. To accept the default, press return.

Your computer identifies itself as Linux.
Is this information correct (y/n)? [y]

ESP-r can be built with the Sun Fortran 90, GNU or
intel compilers.
Compiler:
(1) Sun fortran 90 (cc and f90)
(2) GNU fortran (gcc 3.X and g77)
(3) Intel fortran (icc, icpc and ifort)
2

Install with experimental XML output support? This may
significantly increase simulation run-time. (y/n) [n]
y
XML output enabled for bps
Graphics library: [2]
(1) GTK graphics library
(2) X11 graphics library
(3) no graphics library (text-only application)
2

ESP-r can optionally retain debugging symbols and
object files for use with a debugging program such
as GDB.

Retain debugging symbols? (y/n) [n]

Install ESP-r database files? (y/n) [y]

Install training files? (y/n) [y]

Proceed with installation of esp-r modules (y/n) [y]?

Installing ESP-r system. This may take some time.
...

Once the build is complete (took me an hour on a Asus 1005HA netbook), you should be able to run ESP-r:

$ path/to/install/espr/bin/prj

Enjoy!

Why I’m disabling MathML for now

In a previous post I described how I tweaked my WordPress installation to support the display of MathML markup, for displaying mathematical equations.

One of the steps involved changing the content-type from application/html to application/xhtml+xml. That step was necessary, or else Firefox would simply not render the MathML markup properly.

Unfortunately, application/xhtml+xml is simply not supported on a host of other browsers, including Internet Explorer. Which means that this blog became unreadable overnight to anyone coming to it with anything else than Firefox.

This is why I’m disabling direct MathML support on this blog. If you’re interested you can view the original blog post on my blog’s old server.

There are, however, alternative (and arguably simpler) ways to display mathematics on the web, such as MathJax, or jsMath (a Javascipt library used on the Maths Q&A site MathOverflow

WordPress shortcode for syntax highlighting

There’s a nice feature in WordPress for including source code in your blog posts, but the Codex is not crystal-clear on how to activate it.

According to this article, for example, all you have to do is to insert a shortcode tag and anything that goes inside that tag will be automatically formatted.

But when I tried that on some Java code that I recently posted, it did not work. Only after some long work did I understand that in order to enable this nice feature you must install either the SyntaxHighlighter or the SyntaxHighlighter Plus plugin. They both provide this shortcode, but SyntaxHighlighter Plus seems more advanced. That’s the one I installed, and now it works perfectly:

<some>
  <xml>
    that is now nicely formatted
    and highlighted!
  </xml>
</some>
Reblog this post [with Zemanta]

Canonical data formats, middleware and GCC

These days I’m working on a middleware application that bridges a company’s ERP and its warehouses. The ERP posts messages in a given XML schema, our application reads these messages, transforms them into the schema understood by the warehouse management system, and uploads onthem on the warehouse’s FTP server.

We use XSLT to transform messages in one schema to messages in the other. In the example above, one XSL file can handle the whole transformation.

But what happens when you deal with more than one schema on either end? Suppose you have on the ERP side one schema for orders, one schema for defining the product catalogue, and so on. And on the warehouse side you might have more than one schema for different kinds of messages.

Say you end up with N schemata on the input and M on the output side, and suppose (for the sake of argument) that your application must handle every possible combination. If you use one XSL file per transformation, that’s NxM files. If the customer changes one schema on the input side, or adds one (and we have no control over that) then we must revise M files.

The classical solution to this combinatorial explosion is the Canonical Data Model messaging pattern. We have defined a common data format for our middleware application, and we transform all incoming messages to this common format before transforming them into the proper outgoing format.

With this solution, whenever a schema changes or is added we only need revise ONE XSL file. Pretty neat and innovative solution, right? I thought so too. Until I listened to this interview about the GCC internals.

The GCC can compile C, C++, Fortran, Ada, Java (and probably lots more languages) to an amazing number of platforms. How can it do this and avoid the combinatorial explosion when a language changes, or the definition of one platform changes?

Simple. It uses a canonical data format. More specifically, GCC’s frontend compiles the source code into an intermediate language-neutral and platform-neutral representation called GIMPLE. This representation is then translated by GCC’s backend into platform-specific code. If a language is modified, only the frontend must be revised. If a platform changes, only the backend must be revised.

The GCC folks (and probably many others) had been doing Canonical Data Format for decades before this pattern became recognized as such. And I thought we were being so clever…

Reblog this post [with Zemanta]

Remotely editing files as root with Emacs

I often need to edit files on remote machines or on embedded devices, that is, machines without a monitor and on which a proper editor might not necessarily be installed.

In the past that has always left me with the rather painful choice between vi and nano. Now I have never invested enough time in learning vi beyond the most basic editing commands. And nano is okayish for small edits but hopeless for larger ones.

So I was delighted to learn that you can edit files remotely through ssh with Emacs. If you want to remotely edit aFile on host aHost, open the following file:

/aHost:/path/to/aFile

The built-in Tramp package will take care of the rest. You can even use Dired remotely with this mechanism, an extremely powerful feature.

But what was missing for me was a painless way of editing remote files as root. The Tramp version that’s included in Emacs 22.2.1 was 2.0.57, with which I was unable to remotely edit files as root. The latest version of Tramp, 2.1.14, is in my humble opinion far easier to work with.

To install it, just follow the instructions. I created a directory ~/emacs into which I unzipped the Tramp distribution. I compiled it in place and did not bother installing it system-wide, being the only user of my system.

Then I added the following to my .emacs file:

;; Load most recent version of Tramp for proxy support
(add-to-list 'load-path "~/emacs/tramp/lisp/")
(require 'tramp)
(add-to-list 'Info-default-directory-list "~/emacs/tramp/info/")

With this in place, suppose you want to edit as root the files on aHost. The best is to add the following to your .emacs:

;; Setting for working with remote host
(add-to-list 'tramp-default-proxies-alist
'("aHost.*" "root" "/ssh:yourusername@%h:"))

Now editing remote files on aHost is easy, just open the following:

/sudo:aHost:/path/to/aFile

And that’s about it.

Schema validation with LXML on Ubuntu Hardy

LXML is an amazing Python module that picks up where the standard xml.dom(.minidom) left off.

It’s basically a set of wrapper code around the libxml2 and libxslt libraries, and provides functionality missing in Python’s standard library, including XML validation and XPaths.

On a project I’m currently working on I needed a good XML library for Python and ended up trying out lxml. But I simply could not get the schema validation to work, and after several wasted hours I understood that the default lxml that ships with Ubuntu Hardy (the distro I’m using) used the relatively old 1.3.6 python-lxml package.

I’m usually very reluctant to install anything as root that does not come from the “official” repository, but for lxml I made an exception and installed the python-lxml package from the upcoming Intrepid distribution.

Add the following line to your /etc/apt/sources.list file:

deb http://ch.archive.ubuntu.com/ubuntu intrepid main

Then run Synaptic as usual and install python-lxml version 2.1.1. To verify that it works fine, you can test schema validation thus:

>>>> from lxml import etree
>>>> schema_tree = etree.parse('path_to_schema.xsd')
>>>> schema = etree.XMLSchema(schema_tree)
>>>> doc = etree.parse('path_to_some_document')
>>>> schema.validate(doc)

That last command returns as a boolean the result of the validation.