Git and Scientific Reproducibility

I firmly believe that scientists and engineers—particularly scientists, by the way—should learn about, and use, version control systems (VCS) for their work. Here is why.

I’ve been a user of free VCSs for a while now, beginning with my first exposure to CVS at CERN in 2002, through my discovery of Subversion during my doctoral years at EPFL, culminating in my current infatuation with Git as a front-end to Subversion. I’m now a complete convert to that system and could not imaging working without it. Every week I discover new use cases for this tool that I had not thought about before (and that I suspect the Git developers didn’t, either).

This week I found such a use case for Git: enforcing scientific reproducibility. Let me explain. I’m currently working on prototype software written in MATLAB that implements some advanced algorithms for the smart, predictive control of heating in buildings. As part of that work we need to evaluate several competing algorithm designs, and try out different parameters for the algorithms.

The traditional way of doing this is, of course, to set all your parameters right in your code for the first simulation, to run it, then to set the parameters right for the second one, to run it again, and so on. There are several problems with this approach.

First, you need a really good naming convention for the data you are going to generate to make sure that you know exactly which parameters you set for each run. And coming up with a good naming scheme for data files is not trivial.

Second, even if your data file naming convention is good enough that you can easily reproduce the experiment, how can you be sure that the settings are exactly right? That you didn’t, perhaps, tweak just that little extra configuration file just to work around that little bug in the software?

Third, how will you reproduce those results? Even assuming that you ran all your simulations based on a given, well-known revision number in your VCS (you do use a VCS, don’t you?), you will still need to dive in the code and set those configuration parameters yourself. A tedious, error-prone process, even if you manage to keep them all to one source file.

I think a system like Git solves all these problems. Here is how I did it.

I needed to run 7 simulations with different parameters, based on a stable version of our software, say r1409 in our Subversion repository.

I’m using Git as a front-end to Subversion. I began by creating a local branch (something Git, not Subversion, will let you do):

$ git checkout -b simulations_based_on_r1409

This will create a new branch from the current HEAD. Now the idea is to make a local commit on that local branch for each different set of parameters. Here is how:

  1. Edit your source code so that all parameters are set right.
  2. Commit the changes on your local branch:
    $ git ci -am "With parameter X set to Y"
    [simulations_based_on_r1409 66cea68] With parameter X set to   
  3. Note the 7 characters (66cea68 above) next to the branch name. These are the first 7 characters of the SHA-1 hash of your entire project, as computed by Git.
  4. Run your simulation. Log the results, along with the short hash.
  5. Repeat the steps above for each different configuration you want to run the experiment with.

By the end of this process, you should have in your logbook a list of experimental results along with the short hash of the entire project as it looked during that experiment. It might, for instance, look something like this:



















Hash Parameter X Parameter Y Result
66cea68 23 42 1024
a4f683f etc etc etc

As you can see there are at least two reasons why it’s important to record the short hash:

  1. It will let you go back in time and reproduce an experiment exactly as it was when you ran it first.
  2. It will force you to commit all changes before running the experiment, which is a good thing.

I’ve been running a series of simulations using a variation on this process, whereby I actually run several simulations in parallel on my 8-core machine. For this to work you need to clone your entire project, once per simulation. Then for each simulation you checkout the right version of your project, and run the experiment.

Quite seriously, I would never have been able to do anything remotely like this with a centralized version control system. The possibility to create local branches and to commit to them is a truly awesome feature of distributed version control systems such as Git. I don’t suppose the Git developers had scientists and engineers in mind when they developed this system, but hey, here we are.

Are you a scientist or an engineer wishing to dramatically improve your way of working? Then run, do not walk, to read the best book on Git there is.

Resources for building simulation

About two weeks ago I posed the following question on the Bldg-sim mailing list:

Where can I find a list of publications relevant to the field of
building simulation? I’m particularly interested in refereed journals
and books.

The ensuing thread has been extremely helpful, in particular Shanta Tucker who pointed me to the IBPSA website. There, on the References link, you will find a fairly complete listing of articles, resources and books, including the full contents of every IBPSA conference paper. What’s not to like?

Trends in Smart Buildings Meeting, March 2009

Several home automation enthusiasts met again at LESO-PB to discuss recent developments in the field. There were four of us this time, Adil, Friedrich, David and yours truly.

20090323_3328

Friedrich openened the discussion by telling us about his ongoing work on the influence of light, especially its color, on human health. Early results suggest that proper daylighting control will not only help us save oodles of energy but will actually make us healthier. For more details we will, however, have to wait for his thesis to be complete.

Adil showed us his recent work on the large-scale physical modeling of cities. He showed us how, from publicly available data (including data from Google Earth) one could derive a fairly realistic model of a city’s impact on its environment.

He told us also that he was considering analyzing shadows on pictures from Google Earth to derive 3D models of entire cities. This idea has great potential, provided he finds a way around Google’s tendency to stitch together satellite images taken at different times of the day.

Google Escher

We also discussed the recently announced Google Powermeter project, whereby Google aggregates measurements remotely taken on your utility meters and presents the information to you. We were all amazed that Google managed to pull this one off (and frankly we have no idea how they do it) but some of us also expressed concern about privacy issues. How long will it be until we start getting email telling us that “People taking their baths while watching TV usually buy this-or-that book. Click here to buy it now” ?

20090323_3329

And finally the evening concluded with me asking the assembly for advice on some building simulation software issues I had been having lately. Friedrich, in particular, suggested I tried solving the problem in Fourier space instead of time space, something I would indeed never have thought about. I’ll definitely have a look and see if this could help for my open-source Heartbreak building simulation project.

Reblog this post [with Zemanta]

Article watch: Solar Energy vol 82 issue 11, 2008

From Solar Energy vol 82 issue 11, 2008:

Long-term performance calculations based on steady-state efficiency test results: Analysis of optical effects affecting beam, diffuse and reflected radiation, by Pedro Horta, Maria João Carvalho, Manuel Collares Pereira, Wildor Carbajal

There are a growing number of commercially available solar thermal collector types: flat plates, evacuated tubes with and without back reflectors and different tubular spacing or low concentration collectors, using different types of concentrating optics.

These different concepts and designs all compete to be more efficient or simply cheaper, easier to operate, etc. at ever higher temperatures, and to extend the use of solar thermal energy in other applications beyond the most common water heating purposes.

In view of the proper dimensioning of solar thermal systems and proper comparison of different collector technologies, for a given application, there is a growing need for existing and future simulation tools to be as accurate as possible in the treatment of these different collector types.

Collector heat losses are usually considered to be well determined, under variable operating conditions, through the use of the heat loss coefficients provided by efficiency curve parameters. Yet, the traditional approach to the optical efficiency fails to describe accurately the optical effects affecting the amount of radiation which actually reaches the absorber.

This paper develops a systematic approach to the proper handling of incident solar radiation, folding that with the information available from tests for determination of collector efficiency curve (steady-state) and the way the optics of different collector types uses incident solar radiation and transforms it into useful heat.

Conversion function between the Linke turbidity and the atmospheric water vapor and aerosol content, by Pierre Ineichen

This technical note presents a conversion function between the widely used Linke turbidity coefficient TL, the atmospheric water vapor and urban aerosol content. It takes into account the altitude of the application site.

The function is based on radiative transfer calculations and validated with the help of an independent clear sky model. Its precision is around 0.12 units of T

Trends in Smart Buildings Meeting, November 2008

We met again on November 3 to discuss recent events in the field of building simulation and automation. This time we were joined by Adil Rasheed, a PhD student at LESO-PB working on the so-called “meso scale modelling of urban heat island effect.” David Daum and Antoine Guillemin were the other participants, besides yours truly.

Antoine had brought a prototype aHeart central unit, the “brains” behind Adhoco’s home automation solution. He described it to the other participants, most of which had never seen it before.

He told us also about some of the newer features, such as the possibility for installers to specify their own custom rules through the web interface. Although difficult to implement, this was actually something that the market demanded specifically and that Adhoco had to offer.

20081103_2412

I had previously mentioned on this blog Marc Fleury’s OpenRemote project and asked the attendees if they had heard about it, but nobody had. Antoine knew about a certain Marc Fleury, working for a swiss company called Ergo3, makers of a home gateway device, but I’m pretty sure it’s not the same person.

David gave us his update on his project and explained a bit more how the IDA simulation engine worked. What I found particularly compelling was the way that IDA would adapt its simulation timesteps depending on the needs. We discussed the possibilities to use IDA for simulating the performance of Java-based controllers such as Adhoco’s, but we are not sure whether IDA can call Java code.

20081103_2413

Speaking of processes talking to each others, we briefly reviewed the canonical four ways that two processes can talk:

  1. Through the exchange of files, whether on the same filesystem or through FTP;
  2. Through a shared database;
  3. Through some form of remote procedure call (RPC), such as RMI in Javaland;
  4. Through a messaging solution, such as JMS.

But this review still didn’t help us understand how C code could call foreign code, such as Java or Python. The reverse is relatively straightforward, see eg this answer on StackOverflow.

20081103_2414

Thanks to everyone who participated, and see you around next time.

Article watch: Journal of Building Physics vol 32 nr 2

The following articles from the last issue of Journal of Building Physics will probably be of interest to building simulationists and automationists.

Accuracy of Energy Analysis of Buildings: A Comparison of a Monthly Energy Balance Method and Simulation Methods in Calculating the Energy Consumption and the Effect of Thermal Mass, by Timo Kalema, Gudni Jóhannesson, Petri Pylsy, and Per Hagengran.

The purpose of this article is to analyze the effects of thermal mass on heating and cooling energy in Nordic climate and for modern, well-insulated Nordic buildings. The effect of thermal mass is analyzed by calculations made by seven researchers and by seven different calculation programs. Six of these programs are simulation programs (Consolis Energy, IDA-ICE, SciaQPro, TASE, VIP, VTT House model) and one monthly energy balance method (maxit energy) based on the standard EN 832, which is the predecessor of ISO DIS 13790. It is purpose to evaluate the reliability of the monthly energy calculation method and especially its gain utilization factor compared with the simulation programs. In addition some sensitivity analysis concerning e.g., the effects of the size and the orientation of windows and the weather data on the energy consumption are made.The results show that the simplified standard methods of EN 832 and of ISO DIS 13790 generally give accurate results in calculating the annual heating energy, e.g., in the context of energy design and energy certification. However, the gain utilization factor of these standards is too low for very light buildings having no massive surfaces resulting in a too high energy consumption. The study shows, that the differences in input data cause often greater differences in calculation results than the differences between various calculation and simulation methods.

Article watch: Energy and Buildings vol 40 nr 12

The following articles from the last issue of Energy and Buildings are probably of interest to building simulationists and automationists.

Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption, by Alberto Hernandez Neto and Flávio Augusto Sanzovo Fiorelli.

There are several ways to attempt to model a building and its heat gains from external sources as well as internal ones in order to evaluate a proper operation, audit retrofit actions, and forecast energy consumption. Different techniques, varying from simple regression to models that are based on physical principles, can be used for simulation. A frequent hypothesis for all these models is that the input variables should be based on realistic data when they are available, otherwise the evaluation of energy consumption might be highly under or over estimated.

In this paper, a comparison is made between a simple model based on artificial neural network (ANN) and a model that is based on physical principles (EnergyPlus) as an auditing and predicting tool in order to forecast building energy consumption. The Administration Building of the University of São Paulo is used as a case study. The building energy consumption profiles are collected as well as the campus meteorological data.

Results show that both models are suitable for energy consumption forecast. Additionally, a parametric analysis is carried out for the considered building on EnergyPlus in order to evaluate the influence of several parameters such as the building profile occupation and weather data on such forecasting.

Comparison of thermal comfort algorithms in naturally ventilated office buildings, by Bassam Moujalled, Richard Cantina, and Gérard Guarracino.

With the actual environmental issues of energy savings in buildings, there are more efforts to prevent any increase in energy use associated with installing air-conditioning systems. The actual standard of thermal comfort in buildings ISO 7730 is based on static model that is acceptable in air-conditioned buildings, but unreliable for the case of naturally ventilated buildings. The different field studies have shown that occupants of naturally ventilated buildings accept and prefer a significantly wider range of temperatures compared to occupants of air-conditioned buildings. The results of these field studies have contributed to develop the adaptive approach. Adaptive comfort algorithms have been integrated in EN15251 and ASHRAE standards to take into account the adaptive approach in naturally ventilated buildings. These adaptive algorithms seem to be more efficient for naturally ventilated buildings, but need to be assessed in field studies. This paper evaluates different algorithms from both static and adaptive approach in naturally ventilated buildings across a field survey that has been conducted in France in five naturally ventilated office buildings. The paper presents the methodology guidelines, and the thermal comfort algorithms considered. The results of application of different algorithms are provided with a comparative analysis to assess the applied algorithms.

Dynamical building simulation: A low order model for thermal bridges losses, by Y. Gaoa, J.J. Rouxb, L.H. Zhaoc and Y. Jiang.

Thermal bridges losses represent an increasing part of heat losses owing to significant three-dimensional heat transfer characteristics in modern buildings, but one-dimensional models are used in most simulation software for thermal analyses to simplify the calculations.

State model reduction techniques were used to develop low-order three-dimensional heat transfer model for additional losses of thermal bridges, which is efficient and accuracy. Coupling this technique with traditional one-dimensional model for walls losses, it is possible to reduce a large amount of time simulations.

Low-order model was validated from frequency response and time-domain output. And the effect of this model was shown with its implementation in software “TRNSYS”.

Trends in Smart Buildings Meeting, October 2008

After a two-month hiatus, we resumed our monthly meetings at LESO-PB to discuss recent developments in building automation and simulation. Frédéric Haldi, David Daum and yours truly attended. We had a smaller group this time, but that turned out to be a good opportunity for going into more detail about some of the research that’s currently being done at LESO-PB.

20081006_2217

I had not been on Adhoco’s website for a while and I was recently surprised to see that their range of products had greatly increased in the past months. My opinion is of course completely biased, having contributed some source code to their main product, but I still wanted to mention it.

There was a paper recently in Building and Environment describing how the simulation program IDA had been coupled to a Genetic Algorithm optimization program in order to derive optimal parameters for a family house. Building parameter optimization is, of course, a key area of research for LESO-PB, but during my time there I’ve always felt they were a bit weak on the simulation end. So it will be very interesting to see if some of the current research tries to remedy this situation.

20081006_2218

David Daum pointed out (quite rightly, in my humble opinion) that there are few if any universally accepted guidelines on how the assessment of a building control algorithm should be carried out. All too often, researchers bury their readers under tons of equations and models and conclude by quoting a single number, such as “Our super-duper algorithm yielded 20% energy savings compared with the ultra-realistic user model that keeps all the heating turned on throughout the year in southern Greece.” We seldom see the assumptions being thoroughly documented, or how the energy demand evolves over time. Does the algorithm help equally well in summer and in winter? If not, why not?

Speaking of new developments at LESO, I don’t know any details, but I’ve heard that they now have a N-cluster of PCs (for a largish N), dedicated to running building simulations. Oh, how I wish I had another PhD to do…

20081006_2219

Much of the meeting was spent discussing Fred’s current research on window opening/closing by the building occupants. Perhaps some background is in order here. For the past five years or so LESO-PB has carried out research on modeling the behaviour of building occupants, in order to have more realistic models than the current ones. Jessen Page did his PhD thesis mostly on modeling the occupancy patterns, and Fred is working on modeling the way people interact with their environment, by opening/closing windows, using appliances, etc.

Fred explained to us that after analyzing the data that had been recorded on the LESO building for the past 7 years he concludes that the majority of window events happen immediately after the user enters the room or immediately before they leave it. And the event probability for these two kinds of events is correlated with the indoor temperature.

For intermediate events, that is, window openings/closings that happen while the user remains in the room, he found that the probability per unit of time correlates well with outdoor temperatures. The problem he’s now trying to solve is the exact relationship between outdoor temperature and window event probability.

We spent some time discussing this relationship, but until more browsers support MathML I won’t go into much detail.

That’s about all I remember from this evening. I’ve setup a mailing list for out meetings, smarrtbuildings-trends, and anyone interested is welcome to join us.

Article watch: Lighting Research and Technology vol 40 nr 3

Lighting Research and Technology vol 40 nr 3 has a couple of articles that sound interesting for anyone involved in visual comfort.

Proportions of direct and indirect indoor lighting — The effect on health, well-being and cognitive performance of office workers, by KI Fostervold and J. Nersveen.

Indirect lighting has been recommended as a way to accommodate lighting needs in offices. To investigate this recommendation, the effect of four ceiling-mounted lighting schemes providing inverse proportions of direct and indirect lighting were studied in ordinary office environments. The study used a 4×3 mixed randomised-repeated design. Dependent variables assessed subjective symptoms, subjective well-being and cognitive performance. Glare, a major contributor to visual strain was physically removed. Photometric measurements showed that proportions of direct and indirect lighting affect the luminous environment. Except for an association between reduced job stress severity and direct lighting, the results indicate that proportions of indirect and direct lighting do not affect the dependent variables. A main effect of the new lighting installation was revealed for subjective symptoms and cognitive performance.

I’m mentioning this one first because visual comfort was, after all, the main topic of my thesis, but also because I heard visual comfort was an active field of research at LESO-PB. One of the research projects that I’m aware of tries to optimize the placement of overhead luminaires in order to jointly optimize the energy consumption and the occupant’s visual comfort.

The other paper that might be worth a trip to the library is

Predicting discomfort glare from outdoor lighting installations, by JD Bullough, PhD, JA Brons, MSc, R. Qi, BEng and MS Rea, PhD.

In addition to sky glow and light trespass, discomfort glare from outdoor lighting installations is a growing concern to the public. A series of experimental investigations was performed to assess the relative impacts of light source photometric characteristics on subjective ratings of discomfort glare. The results converge, demonstrating the influence of light source illuminance, surround illuminance and ambient illuminance on subjective judgements of discomfort glare. A simple model relating these photometric quantities is proposed for making predictions of discomfort glare from outdoor lighting installations. This model can be readily incorporated into existing frameworks for evaluating light pollution as well as into lighting calculation software.

This article addresses apparently an issue that’s often overlooked in studies of visual discomfort, namely the influence of outdoor construction elements.

I only which I had time to read all this stuff…

Article watch: Energy and Buildings vol 40 nr 11

One article in the last issue of Energy and Buildings is of particular relevance to the field of home automation, particularly where daylight control is involved:

Simplified correlations of global, direct and diffuse luminous efficacy on horizontal and vertical surfaces, by A. De Rosaa, V. Ferraroa, D. Kaliakatsosa and V. Marinelli.

A simple calculation method to calculate the mean hourly diffuse illuminance on vertical surfaces for all sky, clear sky, intermediate and overcast sky conditions, developed in Arcavacata di Rende (Italy), was compared with experimental data obtained at Osaka (Japan), Vaulx-en-Velin (France) and Geneva (Switzerland). In spite of its simplicity, the method furnishes reasonably good predictions, in comparison with a more complex reference calculation method and can be proposed as a simplified tool for design purposes.