Floating-point woes, part 1

You may have several decades of programming experience, certain classes of problems seem to repeatedly cause endless confusion. Floating-point computation is one such class.

I’ve been doing a fair amount of numerical analysis these past few months, implementing floating-point calculations on embedded platforms. In the process I’ve stumbled across a few programming gotchas which I’d like to document in a (hopefully short) series of posts. I think that some of them are highly non-trivial, and that no amount of prior lecturing and/or reading (such as Goldberg’s excellent _What Every Computer Scientist Should Know About Floating-Point_ paper), there are still some issues that might surprise any programmer. I begin with a relatively simple and well-known example drawn from the programming class I teach.

(All the code examples in this series are available from git@github.com:dlindelof/fpwoes.git)

Termination criteria for a square-root calculating routine
——————————————————————–

Consider this program for calculating square roots by Newton’s method:

“sqrt1.c”:
#include
#include

#define EPS 0.000001

static float fabsf(float x) {
return x < 0 ? -x : x; } static int sqrt_good_enough(float x, float guess) { return fabsf(guess*guess - x) < EPS; } static float sqrt_improve(float guess, float x) { return (guess + x/guess)/2; } static float sqrt_iter(float x, float guess) { if (sqrt_good_enough(x, guess)) return guess; else return sqrt_iter(x, sqrt_improve(guess, x)); } float sqrt(float x) { return sqrt_iter(x, 1.0f); } int main(int argc, char** argv) { float x = atof(argv[1]); printf("The square root of %.2f is %.2f\n", x, sqrt(x)); return 0; } This program works well enough with arguments smaller than about 700. Above that value, one gets segmentation faults: $ ./sqrt1 600 The square root of 600.00 is 24.49 $ ./sqrt1 1000 Segmentation fault Debugging the progam under `gdb` shows that a stack overflow happened: $ gdb ./sqrt1 Reading symbols from /home/dlindelof/Dropbox/Projects/fpwoes/sqrt1...done. (gdb) run 1000 Starting program: /home/dlindelof/Dropbox/Projects/fpwoes/sqrt1 1000 Program received signal SIGSEGV, Segmentation fault. fabsf (x=Cannot access memory at address 0x7fffff7feff4 ) at sqrt1.c:6 6 static float fabsf(float x) { (gdb) bt #0 fabsf (x=Cannot access memory at address 0x7fffff7feff4 ) at sqrt1.c:6 #1 0x00000000004005aa in sqrt_good_enough (x=1000, guess=31.622776) at sqrt1.c:11 #2 0x0000000000400610 in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:19 #3 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #4 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #5 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #6 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #7 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #8 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #9 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 #10 0x000000000040063a in sqrt_iter (x=1000, guess=31.622776) at sqrt1.c:22 What's happening should be quite clear here. The best estimate for the square root of 1000 in single precision is 31.622776, whose square is 999.999939, differing from 1000 by about 6.1e-5, i.e. way more than the specified tolerance of 1e-6. But it is the best approximation that can be represented in single precision. In hex, it is `0x41fcfb72`. One least-significant bit (LSB) less, or `0x41fcfb71`, represents 31.6227741 whose square is 999.999817, off by 18.3e-5. One LSB more is `0x41fcfb73` or 31.6227779, whose square is 1000.00006, off by 6.1e-5, i.e. the same difference as for 31.622776. 31.622776 is therefore the best approximation to the square root of 1000 with single-point precision. To have the program terminate instead of entering an infinite loop we need to rethink the termination criteria. Obviously we cannot use an absolute difference between the square of our guess and the argument, for as we have just seen, for large values it might not be possible to get an answer whose square is close enough to the argument. We could imagine using a _relative_ difference instead: static int sqrt_good_enough(float x, float guess) { return fabsf(guess*guess - x) / x < EPS; } This gives us a second program `sqrt2.c`, which works much better: $ ./sqrt2 1000 The square root of 1000.00 is 31.62 $ ./sqrt2 5 The square root of 5.00 is 2.24 $ ./sqrt2 2 The square root of 2.00 is 1.41 $ ./sqrt2 1 The square root of 1.00 is 1.00 $ ./sqrt2 100000 The square root of 100000.00 is 316.23 $ ./sqrt2 10000000 The square root of 10000000.00 is 3162.28 $ ./sqrt2 1000000000 The square root of 1000000000.00 is 31622.78 $ ./sqrt2 0.5 The square root of 0.50 is 0.71 $ ./sqrt2 0.1 The square root of 0.10 is 0.32 $ ./sqrt2 0.01 The square root of 0.01 is 0.10 $ ./sqrt2 0.05 The square root of 0.05 is 0.22 Interestingly, Kernighan and Plauger (in their clasic _The Elements of Programming Style_) deal with a very similar problem in the first chapter, but introduce a slightly different termination criteria. Translated into C from the original Fortran, this would give: static int sqrt_good_enough(float x, float guess) { return fabsf(x/guess - guess) < EPS * guess; } I asked the good folks on www.stackoverflow.com why one termination criteria might be better than another. I've accepted the answer that explained that the termination criteria should match the update definition. In other words, you want the termination criteria to be equivalent to saying `|guess[n+1] - guess[n]| < guess[n] * EPS`. Given that `guess[n+1] = (guess[n] + x/guess[n])/2`, one obtains a termination criteria that reads `|x/guess[n] - guess[n]| < guess[n] * EPS` --- which is exactly the termination criteria preferred by K&P. Note furthermore that this criteria correctly handles the case when `y == 0.0`. However, the program will still eventually produce nonsense on an input of `0.0`. Quoth the debugger: #1 0x0000000000400636 in sqrt_iter (x=0, guess=0) at sqrt3.c:22 #2 0x0000000000400646 in sqrt_iter (x=0, guess=1.40129846e-45) at sqrt3.c:22 #3 0x0000000000400646 in sqrt_iter (x=0, guess=2.80259693e-45) at sqrt3.c:22 #4 0x0000000000400646 in sqrt_iter (x=0, guess=5.60519386e-45) at sqrt3.c:22 #5 0x0000000000400646 in sqrt_iter (x=0, guess=1.12103877e-44) at sqrt3.c:22 #6 0x0000000000400646 in sqrt_iter (x=0, guess=2.24207754e-44) at sqrt3.c:22 #7 0x0000000000400646 in sqrt_iter (x=0, guess=4.48415509e-44) at sqrt3.c:22 #8 0x0000000000400646 in sqrt_iter (x=0, guess=8.96831017e-44) at sqrt3.c:22 #9 0x0000000000400646 in sqrt_iter (x=0, guess=1.79366203e-43) at sqrt3.c:22 #10 0x0000000000400646 in sqrt_iter (x=0, guess=3.58732407e-43) at sqrt3.c:22 #11 0x0000000000400646 in sqrt_iter (x=0, guess=7.17464814e-43) at sqrt3.c:22 #12 0x0000000000400646 in sqrt_iter (x=0, guess=1.43492963e-42) at sqrt3.c:22 #13 0x0000000000400646 in sqrt_iter (x=0, guess=2.86985925e-42) at sqrt3.c:22 #14 0x0000000000400646 in sqrt_iter (x=0, guess=5.73971851e-42) at sqrt3.c:22 #15 0x0000000000400646 in sqrt_iter (x=0, guess=1.1479437e-41) at sqrt3.c:22 #16 0x0000000000400646 in sqrt_iter (x=0, guess=2.2958874e-41) at sqrt3.c:22 #17 0x0000000000400646 in sqrt_iter (x=0, guess=4.59177481e-41) at sqrt3.c:22 The termination criteria will eventually involve a `0.0/0.0` operation. There is no way out of this other than to explicitly handle that special case. The final program thus reads: `sqrt3.c` #include
#include

#define EPS 0.000001

static float fabsf(float x) {
return x < 0 ? -x : x; } static int sqrt_good_enough(float x, float guess) { return fabsf(x/guess - guess) < EPS * guess; } static float sqrt_improve(float guess, float x) { return (guess + x/guess)/2; } static float sqrt_iter(float x, float guess) { if (sqrt_good_enough(x, guess)) return guess; else return sqrt_iter(x, sqrt_improve(guess, x)); } float sqrt(float x) { if (x == 0.0) return 0.0; else return sqrt_iter(x, 1.0f); } int main(int argc, char** argv) { float x = atof(argv[1]); printf("The square root of %.2f is %.2f\n", x, sqrt(x)); return 0; } What's the most obscure bug involving floating-point operations you've ever been involved in? I'd love to hear from it in the comments below.

homeR: an R package for building physics

For the past few weeks we’ve been very busy here at Neurobat with the analysis of field tests results. In the process of doing that, we had to implement several functions in R that relate to building physics.

We thought it might be useful for the community to have access to those functions, so we decided to begin submitting them to CRAN. That’s why we’re pleased to announce the first release of the homeR package, which we will fill over time with such functions.

This first release, version 0.1, contains just `pmv`, a function to calculate the so-called Predicted Mean Vote, i.e. a measure from -3 (cold) to +3 (hot) of thermal comfort as a function of several variables, including air temperature, clothing and metabolism.

Here I show how with this function one can derive a contour plot showing, for given clothing and metabolism, the optimal indoor temperature (assuming 50% relative humidity). We’re basically going to solve `pmv(clo, met, temp, sat) = 0` equation for `temp` across a grid of `clo` and `met` values with the `uniroot` function.

> clo <- seq(0,2,length=21) > met <- seq(0.6,3.2,length=21) > zero.pmv <- function(clo, met) uniroot(function(temp) pmv(clo,met,temp,50), c(-10,40))$root > contourplot((outer(clo,met,Vectorize(zero.pmv))),
cuts=20,
aspect=”fill”,
panel=function(…) {
panel.grid(h=-1,v=-1,…)
panel.contourplot(…)
},
row.values=clo, column.values=met,
xlim=extendrange(clo), ylim=extendrange(met),
xlab=”[Clo]”, ylab=”[Met]”)

And here is the resulting plot:

Predicted Mean Vote contour plot
Predicted Mean Vote contour plot

As you can see, this is pretty similar to that sort of plots one finds in standard textbooks on the subject, such as Claude-Alain Roulet’s Santé et qualité de l’environnement intérieur dans les bâtiments:

PMV contour plot from textbook
PMV contour plot from textbook

Please give the `homeR` package a try, and give us your feedback. There’s only the `pmv` function in there at the time of writing but we plan to extend the package in the weeks to come.

Saikoro game engine in Lisp

Here at Neurobat we consecrate one day per sprint as a *Lab Day*, i.e. a day when, not unlike what Google does, we are free to work on whatever we want.

Today was Lab Day and I took the opportunity to brush up my Lisp skills by writing a game, inspiring myself heavily from Conrad Burski’s wonderful book Land of Lisp, which will teach you the necessary Lisp skills for the most important programming gig ever, that is, writing games.

I wrote a game engine for the Saikoro boardgame, a game simple enough to be amenable to the kind of AI described in the book. Without doing any major kind of optimization, the computer will play a perfect game on a 4 x 4 board by computing exhaustively a tree of all possible games from a starting position.

Here is what a typical game session looks like:

$ sbcl --load saikoro.lisp
; snip some noise
Current player: A
| A[ 0] 3[ 1] 3[ 2] 2[ 3] |
| 1[ 4] 4[ 5] 2[ 6] 2[ 7] |
| 3[ 8] 4[ 9] 4[10] 2[11] |
| 1[12] 3[13] 4[14] B[15] |
Choose your move: 
1. 0 -> 4 
2. 0 -> 10 
3. 0 -> 1 
4. 0 -> 5 
2
Current player: B
| 0[ 0] 3[ 1] 3[ 2] 2[ 3] |
| 1[ 4] 4[ 5] 2[ 6] 2[ 7] |
| 3[ 8] 4[ 9] A[10] 2[11] |
| 1[12] 3[13] 4[14] B[15] |
Current player: A
| 0[ 0] 3[ 1] 3[ 2] 2[ 3] |
| 1[ 4] B[ 5] 2[ 6] 2[ 7] |
| 3[ 8] 4[ 9] A[10] 2[11] |
| 1[12] 3[13] 4[14] 0[15] |
Choose your move: 
1. 10 -> 1 
2. 10 -> 7 
2
Current player: B
| 0[ 0] 3[ 1] 3[ 2] 2[ 3] |
| 1[ 4] B[ 5] 2[ 6] A[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| 1[12] 3[13] 4[14] 0[15] |
Current player: A
| 0[ 0] 3[ 1] 3[ 2] 2[ 3] |
| B[ 4] 0[ 5] 2[ 6] A[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| 1[12] 3[13] 4[14] 0[15] |
Choose your move: 
1. 7 -> 1 
1
Current player: B
| 0[ 0] A[ 1] 3[ 2] 2[ 3] |
| B[ 4] 0[ 5] 2[ 6] 0[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| 1[12] 3[13] 4[14] 0[15] |
Current player: A
| 0[ 0] A[ 1] 3[ 2] 2[ 3] |
| 0[ 4] 0[ 5] 2[ 6] 0[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| 1[12] B[13] 4[14] 0[15] |
Choose your move: 
1. 1 -> 6 
2. 1 -> 3 
3. 1 -> 2 
3
Current player: B
| 0[ 0] 0[ 1] A[ 2] 2[ 3] |
| 0[ 4] 0[ 5] 2[ 6] 0[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| 1[12] B[13] 4[14] 0[15] |
Current player: A
| 0[ 0] 0[ 1] A[ 2] 2[ 3] |
| 0[ 4] 0[ 5] 2[ 6] 0[ 7] |
| 3[ 8] 4[ 9] 0[10] 2[11] |
| B[12] 0[13] 4[14] 0[15] |
B wins!
* (quit)

As you can see the UI is very rudimentary for the moment but I wanted to concentrate on getting the AI right first. Next I plan to optimize the AI, and make it work with a non-exhaustive game tree so it would work on larger boards.

If you’re interested in trying it out, feel free to clone my repository.

Weird certificate verification error

I spent most of the day today debugging a very mysterious error we
encountered when trying to programmatically call a web service over SSL
from Java.

Here is the source code with which we managed to reliable reproduce
the error:

import javax.net.SocketFactory;
import javax.net.ssl.SSLSocketFactory;
import java.io.*;
import java.net.Socket;

public class SimpleSSLTest {
public static void main(String[] args) throws IOException {
try {
int port = 443;
String hostname = “somehost.com”;
SocketFactory socketFactory = SSLSocketFactory.getDefault();
Socket socket = socketFactory.createSocket(hostname, port);
InputStream in = socket.getInputStream();
OutputStream out = socket.getOutputStream();
PrintWriter pout = new PrintWriter(new BufferedWriter(new OutputStreamWriter(out)));
pout.println(“GET ” + “/” + ” HTTP/1.0″);
pout.println();
pout.flush();
BufferedReader bin = new BufferedReader(new InputStreamReader(in));
String inputLine;
while ((inputLine = bin.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
out.close();
} catch (IOException e) { throw e; }
}

The website, `somehost.com`, used a SSL certificate signed by our own
internal certificate authority. That authority’s certificate was
stored in a `cacerts` Java keystore. We run this code from the command
line thus:

$ java -Djavax.net.ssl.trustStore=cacerts -cp target/classes/ SimpleSSLTest

When we run this, the application bombs with an exception, the root
cause of which reads as follows:

Caused by: java.security.cert.CertPathValidatorException: CA key usage check failed: keyCertSign bit is not set
at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:153)
at sun.security.provider.certpath.PKIXCertPathValidator.doValidate(PKIXCertPathValidator.java:325)
at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:187)
at java.security.cert.CertPathValidator.validate(CertPathValidator.java:267)
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:261)
… 22 more

We’ve tried to wrap our heads around this problem the whole day and
could make neither head nor tail about it, especially as we didn’t get
this error at all when targeting another host, using another
certificate but signed by the same certificate authority.

As a last resort, I thought of checking exactly which version of Java
we were using. Turned out we were using OpenJDK, the version that
replaced Sun’s version in Ubuntu 10.4. Running the same code with
Sun’s Java SDK solved the problem, but we can’t confidently state that
we understand what was wrong. Perhaps a bug in OpenJDK’s
implementation of JSSE. Who knows.

If you’ve run into the same problem, feel free to leave a comment. I’d
be interested to hear if (and how) you’ve solved it.

MATLAB’s inane idea of time

MATLAB seems to have a very peculiar notion on how to represent dates
and times. Yesterday I spent a wonderful couple of hours debugging
some code that’s supposed to compute the sun’s position, most of which
could have been avoided if the MATLAB designers had followed a simple
convention used by, I believe, most computing platforms.

In MATLAB, dates and times are represented internally by a so-called
*serial date number*, defined as the number of time units counted
since a given reference date. If you are like me you will, I suppose,
assume that this reference date is the standard UNIX *epoch*,
i.e. midnight, January 1st, 1970. Well you’re only about two millenia
off—the reference date in MATLAB is the hypothetical (and
non-existent) date of midnight, January 1st, 0000. Never mind that
there never was a year 0000—the calendar goes straight from 1 BC to
1 AD.

And if you *really* are like me you will of course assume that the
unit of time in which this serial date number is counted is seconds,
or at least milliseconds. Wrong again—MATLAB choosed *days* as its
fundamental unit of time. And of course, Octave was forced to follow
MATLAB’s choice:

octave:4> format long
octave:5> now
ans = 734313.962094548

Besides making it much more difficult to make MATLAB interoperate
with, say, Java libraries, there are several problems with this
approach (documented in Octave’s help file, haven’t checked in
MATLAB):

1. The Julian calendar is ignored, so anything before 1582 will be
wrong;
1. Leap seconds are ignored. In other words, MATLAB ignores days that
happened to be 86401 seconds long (yes, there are).

When working with timeseries data, in particular climate data, I
always try to count time from the UNIX epoch—ideally as the number
of seconds from the epoch, the way `date(1)` works when called with
the `+%s` format argument:

18:08:49@netbook$ date +%s
1277309333
18:08:53@netbook$ date +%s
1277309336

In Java, `System.currentTimeMillis()` will return the number of
milliseconds since the epoch:

scala> System.currentTimeMillis
res0: Long = 1277395240485

In R, converting a `DateTime` object to numeric yields the number of
seconds:

> as.numeric(Sys.time())
[1] 1277310008

In short, every computing platform I’ve touched in the recent weeks
represents time starting from the standard UNIX/POSIX epoch, and
always do so in a unit related to seconds. In other words, there is no
justification for MATLAB’s decision to represent time since year 0000,
and even less for doing so in number of days. I don’t mean to bash
MATLAB (well… a bit, maybe). I just regret that anytime I need
MATLAB to interoperate with some other code, I need to include a factor
of 86400 and shift everything by 1970 years.

Alternatives to Java for home automation devices

I think there’s no escaping this simple fact: the Java Virtual Machine (JVM) is definitely here to stay, and all the evidence shows that it has tremendous potential for being used in home automation devices.

I still remember from a previous life when we hunted for a JVM implementation that would run with a minimal footprint in, if I remember correctly, a 16 MB flash drive with about the same amount of RAM. It was compliant with Java 1.3 at that time (around 2003), and I remember we were having issues with its total lack of logging libraries.

Things have changed a lot since then, but the JVM is still a superb platform for future home automation devices. It’s just such an ubiquitous and comfortable environment to work with; the code you compile and unit-test on your laptop will be the same one being later executed on whatever water heating controller you’re gonna use. And any platform that’s more or less compliant with the official Java specs will come preloaded with plenty of libraries for your everyday needs.

Still, I’m not writing this to encourage you to learn or even perfect yourselves at Java. Java is actually a rather weak and inexpressive language. Instead, remember that JVM-the-platform is not quite the same thing as Java-the-programming-language. You compile Java code into bytecode that then gets executed by a JVM, but who said you needed to compile Java code? There are, indeed, about 200 different languages that can be compiled to Java bytecode, many of which are arguably far better, more powerful, and more expressive languages than Java.

One such language is Scala, which I’ve been studying for a while now, and I’m impressed by the ease with which this language allows you to create Domain Specific Languages (DSLs), i.e. highly expressive pseudo-languages that apply to a very specialized field or domain. I would not be surprised to hear, five years from now, that about 10% of all bytecode being run in the world were in fact compiled from Scala.

If you’re into writing embedded code for home automation devices, I strongly encourage you to learn at least one JVM language in addition to Java. I think you would be surprised how easier and faster it can be to write controller code in a programming language that lets you think at the right level of abstraction—which Java never let me do.

Event rate of arrival analysis with R

Here is a very common problem: suppose you’re give a series of event timestamps. The events can be anything—website logins, persons entering a building, anything that recurs regularly in time but whose rate of arrival is not known in advance. Here is, for example, such a file which I had to analyze:

05.02.2010 09:00:18
05.02.2010 09:00:18
05.02.2010 09:00:21
05.02.2010 09:00:23
05.02.2010 09:00:24
05.02.2010 09:00:29
05.02.2010 09:00:29
05.02.2010 09:00:30
05.02.2010 09:00:35

and so on for several thousand lines. Your task is to anlyze this data and to derive “interesting” statistics from it. How do you do that?

My initial reaction was, hey let’s try to derive the rate of arrival per second over time. Histograms are one way of doing this, except that histograms are known for the dramatically different results they can yield for different choices of bin width and position. So instead of histograms, I tried doing this with so-called density plots.

That, it turns out, was a terrible idea. I think I must have spent a day and a half figuring out how to use R’s density function and its arguments, especially the bandwidth parameter. There are two problems with density: 1) it yields a density, which means that you have to scale it if you want to obtain a rate of events; 2) it’s an interpolation of a sum of kernels, which has the unfortunate side-effect of yielding a curve whose integral is not necessarily unity.

In the morning of the second day, I realized I had been solving the wrong problem. I’m not really interested in knowing the rate of arrival. What I really need to know is how many items is my system handling simultaneously. Think about that for a second. If your data represent the visitors to your website, you really don’t want to know how many visitors come per second if you don’t know how much time the server needs to serve each one. In other words, if you want to make sure the server never melts down, you need to know how many users are served concurrently by the server, and to know that you also need to know how much time is needed for each request.

Or again, if you’re designing a building or a space to which people are supposed to come, be served somehow, and then leave, you really don’t need to know how many people will come per hour; you need to know how many people will be in the building at the same time.

There is something called Little’s Law which states this a bit more formally. Assuming the system can serve the requests (or people, or jobs) without any pileup, then N=t× Λ, where N is the number of requests being served concurrently, t is the time spent on each request, and Λ is the request rate. Now it should be obvious that if you know t, the data will give you N from which you can derive Λ (if you want).

Here’s how I did it in R. Suppose the data comes in a zipped data.zip file, with timestamps formatted as above. Then:

library(lattice) # always, always, always use this library
# Get raw data as a vector of DateTime objects
data <- as.POSIXct(scan(unzip("data.zip"), what=character(0), sep="\n"), format="%d.%m.%Y %T") # Turn it into a dataframe which will be easier to use data.lt <- as.POSIXlt(data) data.df <- data.frame(time=data, sec=jitter(data.lt$sec, amount=.5), min=data.lt$min, hour=data.lt$hour) data.df$timeofday <- with(data.df, sec+60*min+3600*hour) rm(data, data.lt) Note that for this example we'll assume all events happened on the same day. Now here's the idea. We're going to build a counter that counts +1 for each event and -1 when that event has been served. In R, we can do that with the cumsum function. For example, suppose we have a series of ten events spaced apart according to a Poisson distribution with mean 4:

> x <- cumsum(rpois(10,4)) > x
[1] 6 9 14 20 26 33 37 38 38 44

Suppose each event takes 6 seconds to serve, and build a structure holding x, the coordinates of the original events and of their completion times, and y, the running counter of the number of events being served:

> temp <- list(x=c(x, x+kLatency),y=c(rep(1,length(x)), rep(-1, length(x)))) > reorder <- order(temp$x) > temp$x <- temp$x[reorder] > temp$y <- cumsum(temp$y[reorder]) If you now plot the temp structure here is what you would get:

With all this in place, we have now everything we need to go ahead. The little script above can go into its own function or it can be defined as xyplot‘s panel argument:

kLatency = 4 # seconds
xyplot(1~timeofday,
data.df,
main = paste(“Concurrent requests assuming”, kLatency, “seconds latency”),
xlab = “Time of day”,
ylab = “# concurrent requests”,
panel = function(x, darg, …) {
temp <- list(x = c(x, x + kLatency), y = c(rep(1,length(x)), rep(-1, length(x)))) reorder <- order(temp$x) temp$x <- temp$x[reorder] temp$y <- cumsum(temp$y[reorder]) panel.lines(temp, type="s") }, scales = list(x = list(at = seq(0, 86400, 7200), labels=c(0,"","", 6,"","", 12,"","", 18,"","",24)), y = list(limits = c(-1, 20))) ) Unfortunately I cannot show you the results here, but this analysis showed me immediately when and how often the webserver would be under its most heavy load, and directly informed our infrastructure needs.

Looking up an EJB from a Web Service under JBoss 4.x

EJB injection in Web Services does not work with JBoss (yet), so when you want to use an EJB from your @WebService annotated POJO you have no choice but to look it up yourself.

This can get a little tricky, because each J2EE container can use its own JNDI naming convention when registering the EJB in the global naming context. Here I document how I configured JBoss to register my EJB with a name that I choose, and how I mapped a proper EJB name to that JNDI name.

Naming the EJB

Give your EJB a reasonably unique name in its annotation:

@Stateless(name=”MyServiceBean”)
@Local(SomeInterface.class)
public class ServiceBean implements SomeInterface {

}

Tell JBoss what JNDI name to register it under

Include the following in your jboss.xml file:




MyServiceBean
myapp/MyJNDIName


Tell JBoss to map an EJB name to the JNDI name

In your WAR, configure your jboss-web.xml file to include this:



ejb/MyEJBName
myapp/MyJNDIName

Lookup the EJB from your Web Service

You can now lookup the EJB directly from your Web Service:

@WebService
public class MyWebService {
private SomeInterface getMyBean() {
try {
return (SomeInterface) new InitialContext().lookup(“java:comp/env/ejb/MyEJBName”);
} catch (NamingException e) {
throw new EJBException(e);
}
}
}

5 Java logging frameworks for building automation software

Back in 2005, when I was writing Java software for an embedded home automation controller, we ran into a little unforeseen problem. The embedded virtual machine we were using implemented only the Java 1.3 API, and so did not offer any logging facilities (which only Java 1.4 started having).

We ended up using the logging implementation from the GNU Classpath project, but choosing a good logging framework in future projects will make a crucial difference for embedded applications that will run for months, if not years, unattended.

Here I recap the most popular Java logging frameworks of the day, and their relevance to the field of building automation.

Java Logging API
The default logging framework, a default implementation of which comes with the JRE. Expect this to be available on any platform that implements at least Java 1.4. A good, default choice for many applications, but very limited in its functionalities. Probably the best choice when memory and/or disk space is a critical issue.
Log4j
Arguably the most popular logging framework for Java, with equivalent implementations for many other languages. Extremely flexible and easy to configure. If your application runs on a “real” machine then you would be wise to choose this framework, or the more recent Logback (see below). If you run on an embedded platform, the choice will be more difficult and require careful thought. You can also use Chainsaw, a graphical tool for parsing Log4j logfiles if they are formatted in XML, which should probably never be done on an embedded system. The development of Log4j seems, however, to be stuck on version 1.2.
Logback
Intended as the successor of Log4j, and written by the same author. I don’t have much experience with it but it’s probably a smart move to get to know it.
Jakarta Commons Logging
Not a logging framework per se, but a logging framework wrapper. Certain kinds of applications, such as libraries, should avoid any tight coupling with any particular logging framework and instead use a framework wrapper such as JCL. There will, however, be a (small) memory penalty, which should be evaluated if the application runs on an embedded platform.
SLF4J
The author of Logback and Log4j wrote also the Simple Logging Facade for Java, another logging framework wrapper that solves several issues and problems with JCL. I cannot see how a building automation application could be concerned with these kinds of classloading problems, but I like the idea of statically binding the wrapper with the logging framework, something which JCL did not do. Also, it sort of lazily evaluates the log strings, so you avoid the little performance hit when debugging is turned off (an important factor for embedded systems). You should probably prefer this wrapper over JCL, especially if Logback takes off and eventually replaces Log4j.

When I hear the word “Entity” I reach for my thesaurus

According to Eric Evans, the author of Domain-Driven Design:
Tackling Complexity in the Heart of Software
, one of your first goals as domain analyst is to define what he calls an Ubiquitous Language, i.e., a common vocabulary that your technical team and your business sponsor will agree on, and will use to communicate. Having such a common vocabulary will prevent misunderstandings and will probably help you name things in your program, such as class and method names.

But what do you do when the business people seem unsure themselves of how to define certain concepts in their own domain? How do you even detect such situations?

I’ve found that certain words, when they begin to show up too often in conversations between the technical team and the business people, are good indicators that something needs to be clarified. Words such as object, element, perform, or do. But one word in particular seems to be particularly difficult to clarify once it has become accepted by business people: entity.

I was working on a business application that involved, well, entities that sold widgets in the world on behalf of a certain company. I asked the business people to help me understand what an entity was, or what exactly their responsibilities were.

So is, for instance, an entity associated with a country? Well no, because entity 33 is responsible for more than one country. Ok, so is an entity responsible for a geographic zone, including one or more countries? Well no, because see here, entity 12 handles all the sales in country Foo but only B2B sales in country Bar. The best definition they seemed to be able to come up with was that an entity was something that sold widgets to some people. Not exactly helpful.

Then I asked them a simple question. “Could you give me a dump from your database of the names of all your entities?” Sure enough, they did. And when I read the list of entities, I saw that they all were names of companies incorporated in their respective countries. “Say”, I asked, “aren’t these, like, subsidiaries?”

Subsidiaries they were indeed. It turned out that the company had segmented the market in each country in B2B and B2C products, and the company’s history explained why a given subsidiary got a given market segment in a given country, even in another country than its own.

This realization led to an immediate simplification of our class diagrams, and a lot of class renaming. But the resulting code is now much, much clearer, something even the business people admit.

It almost always pays off to work on redefining vague or incomplete terms. It will help you as a developer, and it might well help your customer too.